$\color{black}\rule{365px}{3px}$
Gaussian Error Linear Unit (GELU) is an activation function originally introduced by Dan Hendrycks and Kevin Gimpel in the paper “Gaussian Error Linear Units (GELUs)”. It has become popular in modern neural networks (e.g., many Transformer-based models such as BERT use GELU) due to its smoothness and empirically good performance.
Definition
Gaussian Error Linear Unit (GELU) is defined as:
$$ GELU(x) = xP(X ≤ x) = xΦ(x) = x · \frac{1}{2}\left[1 + erf(x/\sqrt(2)\right] $$

Approximation
If greater feedforward speed is worth the cost of exactness, GELU can be approximated by
$$ 0.5x(1 + \tanh[ \sqrt{2/π}(x + 0.044715x^3)]) $$
or
$$ xσ(1.702x)
$$
How Authors Came Up With GELU