
Sparse approximation in learning via neural ODEs
We consider the continuoustime, neural ordinary differential equation (...
read it

DepthAdaptive Neural Networks from the Optimal Control viewpoint
In recent years, deep learning has been connected with optimal control a...
read it

Control On the Manifolds Of Mappings As a Setting For Deep Learning
We use a controltheoretic setting to model the process of training (dee...
read it

Spline parameterization of neural network controls for deep learning
Based on the continuous interpretation of deep learning cast as an optim...
read it

Limitations of Lazy Training of Twolayers Neural Networks
We study the supervised learning problem under either of the following t...
read it

A Principle of Least Action for the Training of Neural Networks
Neural networks have been achieving high generalization performance on m...
read it

Lorenz System State Stability Identification using Neural Networks
Nonlinear dynamical systems such as Lorenz63 equations are known to be c...
read it
Largetime asymptotics in deep learning
It is by now wellknown that practical deep supervised learning may roughly be cast as an optimal control problem for a specific discretetime, nonlinear dynamical system called an artificial neural network. In this work, we consider the continuoustime formulation of the deep supervised learning problem, and study the latter's behavior when the final time horizon increases, a fact that can be interpreted as increasing the number of layers in the neural network setting.When considering the classical regularized empirical risk minimization problem, we show that, in long time, the optimal states converge to zero training error, namely approach the zero training error regime, whilst the optimal control parameters approach, on an appropriate scale, minimal norm parameters with corresponding states precisely in the zero training error regime. This result provides an alternative theoretical underpinning to the notion that neural networks learn best in the overparametrized regime, when seen from the large layer perspective. We also propose a learning problem consisting of minimizing a cost with a state tracking term, and establish the wellknown turnpike property, which indicates that the solutions of the learning problem in long time intervals consist of three pieces, the first and the last of which being transient shorttime arcs, and the middle piece being a longtime arc staying exponentially close to the optimal solution of an associated static learning problem. This property in fact stipulates a quantitative estimate for the number of layers required to reach the zero training error regime. Both of the aforementioned asymptotic regimes are addressed in the context of continuoustime and continuous spacetime neural networks, the latter taking the form of nonlinear, integrodifferential equations, hence covering residual neural networks with both fixed and possibly variable depths.
READ FULL TEXT
Comments
There are no comments yet.