- Ge, Huang, Jin, Yuan COLT 2015. "Evading
saddle points: online stochastic gradient descent for
- Jin, Ge, Netrapalli, Kakade, Jordan ICML'17. How to escape saddle points efficiently. Blog post.
- 2nd order black box methods for deep learning. Paper 1 by
Agrarwal et al and Paper 2 by
Carmon et al.
- Blog post by Arora and Ma: Framework for analysing nonconvex optimization. (describes "measure of progress.")
- Nonblack box analyses of simpler problems (subcases of
simple neural nets). Topic modeling (Arora, Ge, Moitra),
Sparse coding, Matrix Completion,
- Ge, Jin, Zheng. No spurious
local minima in nonconvex low rank problems: A
unified geometric analysis.
- Analyses of multilayer linear nets.
- Hazan's tutorial
on optimization in ML.
- Zhang, Bengio, Hardt, Recht, Vinyals. Understanding
deep learning requires rethinking generalization.
Belkin et al'18 To understand
deep learning we need to understand kernel learning.
- Blog post 1 by Arora. Generalization and Deep Nets: An Introduction.
- Blog post 2: Proving
generalization bounds for deep nets via compression.
- (Not) Bounding the true error. Langford and Caruana
- Various generalization bounds. Bartlett-Mendelson'02,
et al'18, Arora et al'18.
- Chaudhari et al. Entropy SGD biasing gradient descent into wide valleys.
- Morcos et al. On the role of
single directions for generalization.
Expressibility/role of depth
- Eldan, Shamir COLT'16 Power of depth for feedforward neural networks. Telgarsky COLT'16 Benefits of depth in neural networks.
- Telgarsky lecture
- Arora, Cohen, Hazan ICML'18. Optimization of deep nets: implicit acceleration by overparametrization.
Unsupervised learning, GANs
learning: A new review and some perspectives, by
Bengio, Courville, Vincent 2012.
- Blog post by Arora and Risteski. Unsupervised
learning: one notion or many? (Explains
possible gap in thinking of unsupervised learning as
and Principles of Representation Learning, blog post
by Ferenc Huszar.
tutorial on GANs by Ian Goodfellow. (Good survey
and equilibrium in GANs, Arora et al ICML'18. Blog
GANS learn the distribution? Some theory and empirics.
Arora, Zhang, Risteski ICLR'18. Blog
post A and Blog
Deep learning-free Text embeddings
- Arora Introductory post and Post 2 describing connection to compressed sensing
- Wieting et al.'16. Toward Universal
Paraphrasitic Sentence Embeddings.
- Our various linear text embeddings. SIF
(Simple but tough) ICLR'17, DisC
via compressed sensing ICLR'18, A la Carte
- Above embeddings are inspired by theory of word
embeddings, described in this Blog
Post papers. Paper 1 TACL'16
(Rand-Walk model), and Paper 2
TACL'18 (How polysemy relates to word embeddings; Short