Agazzi Andrea
SpecialityUniversità di Pisa, IT
Convergence and optimality of wide RNNs in the mean-field regimeRecurrent neural networks (RNNs) are a family of neural network architectures that is traditionally used to learn from data with a time-series structure. As the name suggests, these networks have a recurrent structure, i.e., for each timestep of the predictor the (hidden) state of the network is fed back to the model as an input, allowing it to "store" information about previous timesteps.
In this talk, we extend a series of results on the training of wide neural networks in the so-called "mean-field" regime to the RNN structure. More specifically, we prove that the gradient descent training dynamics of Elman-type RNNs converge in an appropriate sense, as the width of the network diverges, to a set of "mean-field" ODEs. Furthermore we prove that, under some conditions on the data and the initialization of the network, the fixed points of such limiting, "mean-field" dynamics are globally optimal.
This is joint work with Jianfeng Lu and Sayan Mukherjee.