Presentation slides. Click here 

Video on youtube.



  1. Ge, Huang, Jin, Yuan COLT 2015. "Evading saddle points: online stochastic gradient descent for tensor decomposition."
  2. Jin, Ge, Netrapalli, Kakade, Jordan ICML'17. How to escape saddle points efficientlyBlog post.
  3. 2nd order black box methods for deep learning. Paper 1 by Agrarwal et al and  Paper 2 by Carmon et al.
  4. Blog post by Arora and Ma: Framework for analysing nonconvex optimization. (describes "measure of progress.")
  5. Nonblack box analyses of simpler problems (subcases of simple neural nets). Topic modeling (Arora, Ge, Moitra), Sparse coding, Matrix Completion,
  6. Ge, Jin, Zheng. No spurious local minima in  nonconvex low rank problems: A unified geometric analysis.
  7. Analyses of multilayer linear nets.
  8. Hazan's tutorial on optimization in ML.


  1. Zhang, Bengio, Hardt, Recht, Vinyals. Understanding deep learning requires rethinking generalization.  Belkin et al'18 To understand deep learning we need to understand kernel learning.
  2. Blog post 1 by Arora. Generalization and Deep Nets: An Introduction.
  3. Blog post 2: Proving generalization bounds for deep nets via compression.
  4. (Not) Bounding the true error. Langford and Caruana
  5. Various generalization bounds. Bartlett-Mendelson'02, Neyshabur et al'17, Neyshabur et al'18, Arora et al'18.
  6. Chaudhari et al. Entropy SGD biasing gradient descent into wide valleys.
  7. Morcos et al. On the role of single directions for generalization.

Expressibility/role of depth

  1. Eldan, Shamir COLT'16  Power of depth for feedforward neural networks. Telgarsky COLT'16 Benefits of depth in neural networks.
  2. Telgarsky   lecture notes.
  3. Arora, Cohen, Hazan ICML'18. Optimization of deep nets: implicit acceleration by overparametrization. 

Unsupervised learning, GANs

  1. Representation learning: A new review and some perspectives, by Bengio, Courville, Vincent 2012.
  2. Blog post by Arora and Risteski. Unsupervised learning: one notion or many?  (Explains possible gap in thinking of unsupervised learning as distribution learning.)
  3. Goals and Principles of Representation Learning, blog post by Ferenc Huszar.
  4. NIPS'16 tutorial on GANs by Ian Goodfellow. (Good survey circa 2016.)
  5. Generalization and equilibrium in GANs, Arora et al ICML'18. Blog post.
  6. Do GANS learn the distribution? Some theory and empirics. Arora, Zhang, Risteski ICLR'18. Blog post A  and Blog post B.

Deep learning-free Text embeddings

  1. Arora Introductory post and Post 2 describing connection to compressed sensing
  2. Wieting et al.'16. Toward Universal Paraphrasitic Sentence Embeddings.
  3. Our various linear text embeddings. SIF (Simple but tough) ICLR'17, DisC via compressed sensing ICLR'18, A la Carte ACL'18.
  4. Above embeddings are inspired by theory of word embeddings, described in this Blog Post  papers. Paper 1 TACL'16 (Rand-Walk model), and Paper 2 TACL'18 (How polysemy relates to word embeddings; Short answer: linearly).