Abstract
Understanding the influence of activation functions on the
learning behaviour of neural networks is of great practical interest. The
GELU, being similar to swish and ReLU, is analysed for soft committee
machines in the statistical physics framework of off-line learning. We find
phase transitions with respect to the relative training set size, which are
always continuous. This result rules out the hypothesis that convexity is
necessary for continuous phase transitions. Moreover, we show that even
a small contribution of a sigmoidal function like erf in combination with
GELU leads to a discontinuous transition.
learning behaviour of neural networks is of great practical interest. The
GELU, being similar to swish and ReLU, is analysed for soft committee
machines in the statistical physics framework of off-line learning. We find
phase transitions with respect to the relative training set size, which are
always continuous. This result rules out the hypothesis that convexity is
necessary for continuous phase transitions. Moreover, we show that even
a small contribution of a sigmoidal function like erf in combination with
GELU leads to a discontinuous transition.
Original language | English |
---|---|
Title of host publication | Proceedings ESANN 2023 |
Subtitle of host publication | European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning |
Editors | Michel Verleysen |
Publisher | i6doc.com publication |
Pages | 435-440 |
Number of pages | 6 |
ISBN (Print) | 978-2-87587-088-9 |
Publication status | Published - 2023 |
Event | European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning: ESANN 2023 - Bruges, Belgium Duration: 4-Oct-2023 → 6-Oct-2023 |
Conference
Conference | European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning |
---|---|
Country/Territory | Belgium |
City | Bruges |
Period | 04/10/2023 → 06/10/2023 |