Activation Functions

rememberme
3 min readOct 27, 2021

--

Let say activation function models a part of human cognition. It help us “value” a given output, i.e. should we ignore it, act on it or transform it . We register (acknowledge, respond or act) based of certain characteristics in our environment e.g. if we are driving a truck and our eyes see an overhead pass, our brain estimate the height of the overhead pass, then it applies and binary step function to signal whether to drive under the overhead pass or stop the truck.

Binary Step Function

Binary Step Function

f(x) = 1, x>=0

f(x)= 0, x

LINEAR ACTIVATION FUNCTION

e.g during driving a car, if the car Infront of us make a sudden stop, depending on distance and speed we apply the proportional brake. Good to know but we don’t use it much often

Linear Activation function

F(x) = mx

SIGMOID ACTIVATION FUNCTION

Sigmoid Activation Function

f(x) = 1/e-x

The widely used activation function because of its uniform non linearity.

e.g. if we are using IQ to determine if a person can able get 4 year degree . we can know that chance of that are very high if iq is more then 110… and very little if iq is less than 90. most of our day to day life experiences are non linear. for example crop production based on weather.

Suggested Usage — for output layer in logistic regression model

TANH FUNCTION (Hyperbolic Tangent function)

f(x) = 2sigmoid(2x)-1

Similar to sigmoid with more gradient.

Life is a balance, doing something more can have negative consequence. if we want to model effect of vitamin C. we know if consume little we have negative consequences but if we increase it, it does not increase our well being.

RELU FUNCTION (rectified liner unit)

f(x) = max(0,x)

Helps to control number of neuron that are activated at the same time.

ReLu Comes in many flavor

Leaky ReLUs, Parametric ReLUs, Gaussian Error Linear Unit (GELU), SiLU,Softplus, ELU

SWISH FUNCTION

f(x) = x * sigmoid(x)

SOFTMAX ACTIVATION FUNCTION

Widely used in output layer for multiclass classifiers problems

Recommendations regarding choosing the right activation function

Sigmoid function gives better results in classification problems

Sigmoid and tanh functions are susceptible to vanishing gradient problem and should be avoided for hidden layer

Relu is a goto activation function for hidden layer. if you suspect dead neurons try leaky Relu

Relu is never used in output layer

References:

https://www.ijeast.com/papers/310-316,Tesma412,IJEAST.pdf

--

--

No responses yet