#### Presentation slides.
Click here

#### Video should be available from
conference site at some point.

### Bibliography

#### Optimization

- Ge, Huang, Jin, Yuan COLT 2015. "Evading
saddle points: online stochastic gradient descent for
tensor decomposition."

- Jin, Ge, Netrapalli, Kakade, Jordan ICML'17. How to escape saddle points efficiently. Blog post.
- 2nd order black box methods for deep learning. Paper 1 by
Agrarwal et al and Paper 2 by
Carmon et al.

- Blog post by Arora and Ma: Framework for analysing nonconvex optimization. (describes "measure of progress.")
- Nonblack box analyses of simpler problems (subcases of
simple neural nets). Topic modeling (Arora, Ge, Moitra),
Sparse coding, Matrix Completion,

- Ge, Jin, Zheng. No spurious
local minima in nonconvex low rank problems: A
unified geometric analysis.

- Analyses of multilayer linear nets.

- Hazan's tutorial
on optimization in ML.

#### Generalization

- Zhang, Bengio, Hardt, Recht, Vinyals. Understanding
deep learning requires rethinking generalization.
Belkin et al'18 To understand
deep learning we need to understand kernel learning.

- Blog post 1 by Arora. Generalization and Deep Nets: An Introduction.
- Blog post 2: Proving
generalization bounds for deep nets via compression.

- (Not) Bounding the true error. Langford and Caruana
- Various generalization bounds. Bartlett-Mendelson'02,
Neyshabur et
al'17, Neyshabur
et al'18, Arora et al'18.

- Chaudhari et al. Entropy SGD biasing gradient descent into wide valleys.
- Morcos et al. On the role of
single directions for generalization.

#### Expressibility/role of depth

- Eldan, Shamir COLT'16 Power of depth for feedforward neural networks. Telgarsky COLT'16 Benefits of depth in neural networks.
- Telgarsky lecture
notes.

- Arora, Cohen, Hazan ICML'18. Optimization of deep nets: implicit acceleration by overparametrization.

#### Unsupervised learning, GANs

- Representation
learning: A new review and some perspectives, by
Bengio, Courville, Vincent 2012.

- Blog post by Arora and Risteski. Unsupervised
learning: one notion or many? (Explains
possible gap in thinking of unsupervised learning as
distribution learning.)

- Goals
and Principles of Representation Learning, blog post
by Ferenc Huszar.

- NIPS'16
tutorial on GANs by Ian Goodfellow. (Good survey
circa 2016.)

- Generalization
and equilibrium in GANs, Arora et al ICML'18. Blog
post.

- Do
GANS learn the distribution? Some theory and empirics.
Arora, Zhang, Risteski ICLR'18. Blog
post A and Blog
post B.

#### Deep learning-free Text embeddings

- Arora Introductory post and Post 2 describing connection to compressed sensing
- Wieting et al.'16. Toward Universal
Paraphrasitic Sentence Embeddings.

- Our various linear text embeddings. SIF
(Simple but tough) ICLR'17, DisC
via compressed sensing ICLR'18, A la Carte
ACL'18.

- Above embeddings are inspired by theory of word
embeddings, described in this Blog
Post papers. Paper 1 TACL'16
(Rand-Walk model), and Paper 2
TACL'18 (How polysemy relates to word embeddings; Short
answer: linearly).