General Theory
- On the difficulty of training recurrent neural networks
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
- The Landscape of Deep Learning Algorithms
- The loss surface of deep and wide neural networks
- The loss surfaces of multilayer networks
Spectral Decompositions in DNN
- Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections
- Low Rank Matrix Factorization for Deep Neural Network training with High Dimensional output targets
Optimization
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Self-Normalizing Neural Networks