Toward Theoretical Understanding of Deep Learning: Slides and Bibliography

Princeton University Computer Science Arora Research Group

ICML 2018 Tutorial

Toward theoretical understanding of deep learning
Sanjeev Arora

Presentation slides. Click here

Video on youtube.

Bibliography

Optimization

Ge, Huang, Jin, Yuan COLT 2015. "Evading saddle points: online stochastic gradient descent for tensor decomposition."
Jin, Ge, Netrapalli, Kakade, Jordan ICML'17. How to escape saddle points efficiently. Blog post.
2nd order black box methods for deep learning. Paper 1 by Agrarwal et al and Paper 2 by Carmon et al.
Blog post by Arora and Ma: Framework for analysing nonconvex optimization. (describes "measure of progress.")
Nonblack box analyses of simpler problems (subcases of simple neural nets). Topic modeling (Arora, Ge, Moitra), Sparse coding, Matrix Completion,
Ge, Jin, Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis.
Analyses of multilayer linear nets.
Hazan's tutorial on optimization in ML.

Generalization

Zhang, Bengio, Hardt, Recht, Vinyals. Understanding deep learning requires rethinking generalization. Belkin et al'18 To understand deep learning we need to understand kernel learning.
Blog post 1 by Arora. Generalization and Deep Nets: An Introduction.
Blog post 2: Proving generalization bounds for deep nets via compression.
(Not) Bounding the true error. Langford and Caruana
Various generalization bounds. Bartlett-Mendelson'02, Neyshabur et al'17, Neyshabur et al'18, Arora et al'18.
Chaudhari et al. Entropy SGD biasing gradient descent into wide valleys.
Morcos et al. On the role of single directions for generalization.

Expressibility/role of depth

Eldan, Shamir COLT'16 Power of depth for feedforward neural networks. Telgarsky COLT'16 Benefits of depth in neural networks.
Telgarsky lecture notes.
Arora, Cohen, Hazan ICML'18. Optimization of deep nets: implicit acceleration by overparametrization.

Unsupervised learning, GANs

Representation learning: A new review and some perspectives, by Bengio, Courville, Vincent 2012.
Blog post by Arora and Risteski. Unsupervised learning: one notion or many? (Explains possible gap in thinking of unsupervised learning as distribution learning.)
Goals and Principles of Representation Learning, blog post by Ferenc Huszar.
NIPS'16 tutorial on GANs by Ian Goodfellow. (Good survey circa 2016.)
Generalization and equilibrium in GANs, Arora et al ICML'18. Blog post.
Do GANS learn the distribution? Some theory and empirics. Arora, Zhang, Risteski ICLR'18. Blog post A and Blog post B.

Deep learning-free Text embeddings

Arora Introductory post and Post 2 describing connection to compressed sensing
Wieting et al.'16. Toward Universal Paraphrasitic Sentence Embeddings.
Our various linear text embeddings. SIF (Simple but tough) ICLR'17, DisC via compressed sensing ICLR'18, A la Carte ACL'18.
Above embeddings are inspired by theory of word embeddings, described in this Blog Post papers. Paper 1 TACL'16 (Rand-Walk model), and Paper 2 TACL'18 (How polysemy relates to word embeddings; Short answer: linearly).