Advanced Learning Algorithm

Introduction to Neural Networks, Activation function and it's types.

Sep 08, 2024

Introduction

hey there, this is misba I am exploring ML and I’m eager to share everything I’m learning along the way!

Let’s get started

Neural Networks

It’s an algorithm that tries to mimic the brain 🧠. It is a computational model composed of interconnected artificial neurons, which work together to process information and make predictions. neural networks are used in climate change, medical imaging, online advertising, product recommendations, and many application areas of machine learning using neural networks.

Let’s first understand the few terms of neural network architecture

Input: It is the data that is fed into the model for learning and training purposes.
weight: weight helps organize the variables by importance and impact of contribution. It represents the strength of the connection between two neurons.
Activation function: The role of Activation function is to determine whether or not a specific neuron should be activated. This decision is based on whether or not the neuron’s input will be important to the prediction process
Bias: It is a constant that is added to the input to shift the activation function.

now you ask how it works. let’s dive in

How does a Neural Network work??

Neurons: In a neural network, the artificial neurons are the building blocks. Each neuron receives inputs, performs computations, and produces an output
Connections: Neurons are connected to each other through weighted connections. These connections allow information to flow from one neuron to another.
Layers: Neurons are organized into layers. the input layer receives initial data and the output layer produces the final prediction, and there can be one or more hidden layers in between .hidden layers are the main computation layer of the network and crucial for learning complex patterns in data, without hidden layers the neural network can only learn linear relationships between input and output data.
Computation: Each neuron takes the weighted sum of inputs, applies an activation function to it, and produces an output. The activation function introduces non-linearity and helps the neural network learn complex patterns
Training: Neural Networks learn by adjusting the weights of the connections based on a process called training, During training, the network is presented with the labeled examples, and it adjusts its weights to minimize the difference between its predictions and the true labels.
Prediction: Once the neural network is trained, it can make predictions on new, unseen data by passing the data through the network and obtaining the output from the output layer

Example:

The below image shows how data flows from input through hidden layers to produce a classification output.

As you can see this example is of recognizing an image

Here’s the input layer, one hidden layer, and the output layer.

We start by feeding the image "dog" into the system. The system then extracts various features from the input image. These features are passed from the input layer to the hidden layer and eventually to the output layer. The output layer estimates the probability of the input image being a "dog" based on the extracted features.

Introduction to Activation Functions in Neural Networks

As you have learned in the working of neural networks the activation function introduces non-linearity and helps machines learn complex patterns

let’s understand the Activation function in more detail

The activation function is the mathematical function that determines the output of a neuron in a neural network

It takes the weighted sum of the inputs and applies a non-linear transformation to produce the output.

Why do we need an Activation Function in a neural network?

Introducing Non-linearity:
Activation functions introduce non-linearity into the neural network. without non-linearity, a neural network would simply be a linear combination of its inputs, which limits its ability to learn complex patterns and relationships in the data.
Decision Making:
→ It helps in decision-making by determining whether a neuron should be activated or not based on the input it receives.
→ The function applies a rule or a threshold to decide whether the neuron should fire or remain inactive.
→ This decision-making capability allows the neural network to make predictions and classify inputs into different categories
Gradient propagation:
→ Gradient propagation is a method for optimizing neural networks. It involves calculating how sensitive the network's loss is to changes in its parameters and then updating those parameters in a direction that reduces the loss.
→ Activation functions play a crucial role in gradient propagation during the training of a model. different activation functions have different derivatives and your choice of activation function affects how the gradients flow through the network. It is important to choose a differentiable activation function to ensure smooth gradient propagation and efficient training.
Normalization:
→ Normalization in neural networks is a method used to scale the input of the data to a specific range, typically between [0,1] or [-1,1]. This helps to improve and speed up the training process by preventing weights from becoming too large or too small.
→ Activation function helps in normalizing the outputs of the neural network.

Types of Activation Functions:

There are several types of activation functions used in neural networks.

let’s understand the different types of activation functions

Sigmoid Function:
\(Sigmoid(X) = \frac{1}{1+ e^{-x}}\)
→ The sigmoid function is also known as the logistic function.
→ while sigmoid functions can be used in hidden layers, they're generally not the preferred choice in modern neural network architectures due to problems they caused during training like vanishing gradient.
→ The output range is between 0 and 1.
→ It is commonly used in binary classification problems
ReLU(Rectified linear unit) Function:
\(ReLU(x) = max(0,x)\)
→ ReLU function in simple terms returns the input value if it’s positive ,and returns 0 if it’s negative
→ They are widely used in deep learning models.
→They help in mitigating the vanishing gradient problem.
Tanh(hyperbolic tangent):
\(tanh(x) = \frac{e^x -e^{-x}}{e^x + e^{-x}}\)
→the tanh functions are similar to sigmoid functions but have a range between -1 and 1.
→ They are often used in hidden layers of a neural network to introduce non-linearity and handle inputs with negative values
Softmax function:
\(softmax(X) = \frac{e^x_i}{\Sigma(e^x_j)}\)
→ Output range is between 0 and 1.
→ They are commonly used in the output layer of a neural network for multi-class classification problems.
→ They normalize the outputs into probabilities, where each output represents the probability of the input belonging to a specific class

Types of Neural Networks

Here are a few types of neural networks:

ANN(Artificial Neural Network): The basic type of neural network, inspired by biological neurons. It consists of interconnected nodes organized in layers
Deep Neural Network(DNN):
→ An ANN with multiple hidden layers between input and output layers. the depth allows the network to learn complex patterns
Convolutional Neural Networks(CNNs):
→ They can input images, identify the objects in a picture, and differentiate them from one another. specialize in grid-like data such as images.
→ The real-world applications include pattern recognition, image recognition, and object detection.
Steps in Constructing CNNs
→ The design comprises three primary layers. The initial one is the convolutional layer, where the majority of the processing occurs. The next layer is the pooling layer, which decreases the number of parameters in the input. Finally, the fully connected layer classifies the feature extracted from the preceding layers.
Recurrent Neural Networks(RNNs):
→ RNNs are a type of artificial neural network architecture designed to handle sequential data.
→ These neural networks can translate language, speech recognition, natural language processing, and image captioning.
→ RNNs can maintain information about previous inputs and use that information to inform their current output
→ There are several different types of RNN architectures and one of them is Long Short-Term Memory Networks(LSTM). LSTM network is a complex form of RNN that addresses the vanishing gradient problem. It is designed with gates(input, forget, output) that regulate the flow of information while processing the sequence
Generative Adversarial Networks(GAN):
→ GANs can generate new data sets that share the same statistics as the training dataset and often pass as actual data.
→ These networks are used to create art. Examples are image generation, data augmentation(generating additional training data to improve the performance of ML models), style transfer (it transfers the style of one image to another), etc.

So here we are taking off. Thank you for reading the article, hope it added some value
If you have any feedback or questions, drop them in the comments and I'll get back to you.
Do follow me on X , connect with me on Linkedin, and subscribe to read more post like this. Let's stay in touch (: !!

Misba Writes

Discussion about this post

Ready for more?