Layered Neural Networks with GELU Activation, a Statistical Mechanics Analysis

Frederieke Richert, Michiel Straat, Elisa Oostwal, Michael Biehl*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

114 Downloads (Pure)

Abstract

Understanding the influence of activation functions on the
learning behaviour of neural networks is of great practical interest. The
GELU, being similar to swish and ReLU, is analysed for soft committee
machines in the statistical physics framework of off-line learning. We find
phase transitions with respect to the relative training set size, which are
always continuous. This result rules out the hypothesis that convexity is
necessary for continuous phase transitions. Moreover, we show that even
a small contribution of a sigmoidal function like erf in combination with
GELU leads to a discontinuous transition.
Original languageEnglish
Title of host publicationProceedings ESANN 2023
Subtitle of host publicationEuropean Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
EditorsMichel Verleysen
Publisheri6doc.com publication
Pages435-440
Number of pages6
ISBN (Print)978-2-87587-088-9
Publication statusPublished - 2023
EventEuropean Symposium on Artificial Neural Networks, Computational Intelligence and
Machine Learning: ESANN 2023
- Bruges, Belgium
Duration: 4-Oct-20236-Oct-2023

Conference

ConferenceEuropean Symposium on Artificial Neural Networks, Computational Intelligence and
Machine Learning
Country/TerritoryBelgium
CityBruges
Period04/10/202306/10/2023

Fingerprint

Dive into the research topics of 'Layered Neural Networks with GELU Activation, a Statistical Mechanics Analysis'. Together they form a unique fingerprint.

Cite this