dopetalk

Mind and Body => Neuroscience => Topic started by: Chip on May 04, 2018, 12:34:24 PM

Title: Unreasonable effectiveness of learning neural networks
Post by: Chip on May 04, 2018, 12:34:24 PM
source: http://www.pnas.org/content/pnas/113/48/E7655.full.pdf

Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes [2016}

Introduction

In artificial neural networks, learning from data is a computationally
demanding task in which a large number of connection
weights are iteratively tuned through stochastic-gradient-based
heuristic processes over a cost function. It is not well understood
how learning occurs in these systems, in particular how
they avoid getting trapped in configurations with poor computational
performance. Here, we study the difficult case of networks
with discrete weights, where the optimization landscape is
very rough even for simple architectures, and provide theoretical
and numerical evidence of the existence of rare—but extremely
dense and accessible—regions of configurations in the network
weight space. We define a measure, the robust ensemble (RE),
which suppresses trapping by isolated configurations and ampli-
fies the role of these dense regions. We analytically compute the
RE in some exactly solvable models and also provide a general
algorithmic scheme that is straightforward to implement: define
a cost function given by a sum of a finite number of replicas of
the original cost function, with a constraint centering the replicas
around a driving assignment. To illustrate this, we derive several
powerful algorithms, ranging from Markov Chains to message
passing to gradient descent processes, where the algorithms target
the robust dense states, resulting in substantial improvements
in performance. The weak dependence on the number of
precision bits of the weights leads us to conjecture that very
similar reasoning applies to more conventional neural networks.
Analogous algorithmic schemes can also be applied to other
optimization problems.

Significance

Artificial neural networks are some of the most widely used
tools in data science. Learning is, in principle, a hard problem
in these systems, but in practice heuristic algorithms often
find solutions with good generalization properties. We propose
an explanation of this good performance in terms of a
nonequilibrium statistical physics framework: We show that
there are regions of the optimization landscape that are both
robust and accessible and that their existence is crucial to
achieve good performance on a class of particularly difficult
learning problems. Building on these results, we introduce a
basic algorithmic scheme that improves existing optimization
algorithms and provides a framework for further research on
learning in neural networks.

see the source link for the continuation of the article
SimplePortal 2.3.6 © 2008-2014, SimplePortal