Activation Functions
Let say activation function models a part of human cognition. It help us “value” a given output, i.e. should we ignore it, act on it or transform it . We register (acknowledge, respond or act) based of certain characteristics in our environment e.g. if we are driving a truck and our eyes see an overhead pass, our brain estimate the height of the overhead pass, then it applies and binary step function to signal whether to drive under the overhead pass or stop the truck.
Binary Step Function
f(x) = 1, x>=0
f(x)= 0, x
LINEAR ACTIVATION FUNCTION
e.g during driving a car, if the car Infront of us make a sudden stop, depending on distance and speed we apply the proportional brake. Good to know but we don’t use it much often
F(x) = mx
SIGMOID ACTIVATION FUNCTION
f(x) = 1/e-x
The widely used activation function because of its uniform non linearity.
e.g. if we are using IQ to determine if a person can able get 4 year degree . we can know that chance of that are very high if iq is more then 110… and very little if iq is less than 90. most of our day to day life experiences are non linear. for example crop production based on weather.
Suggested Usage — for output layer in logistic regression model
TANH FUNCTION (Hyperbolic Tangent function)
f(x) = 2sigmoid(2x)-1
Similar to sigmoid with more gradient.
Life is a balance, doing something more can have negative consequence. if we want to model effect of vitamin C. we know if consume little we have negative consequences but if we increase it, it does not increase our well being.
RELU FUNCTION (rectified liner unit)
f(x) = max(0,x)
Helps to control number of neuron that are activated at the same time.
ReLu Comes in many flavor
Leaky ReLUs, Parametric ReLUs, Gaussian Error Linear Unit (GELU), SiLU,Softplus, ELU
SWISH FUNCTION
f(x) = x * sigmoid(x)
SOFTMAX ACTIVATION FUNCTION
Widely used in output layer for multiclass classifiers problems
Recommendations regarding choosing the right activation function
Sigmoid function gives better results in classification problems
Sigmoid and tanh functions are susceptible to vanishing gradient problem and should be avoided for hidden layer
Relu is a goto activation function for hidden layer. if you suspect dead neurons try leaky Relu
Relu is never used in output layer
References: