Lesson 1.4: Neural Networks
Neural networks are machine learning models that mimic the complex functions of the human brain. These models consist of interconnected nodes or neurons that process data, learn patterns, and enable tasks such as pattern recognition and decision-making.
Neural networks are capable of learning and identifying patterns directly from data without pre-defined rules. These networks are built from several key components:
- Neurons: The basic units that receive inputs, each neuron is governed by a threshold and an activation function.
- Connections: Links between neurons that carry information, regulated by weights and biases.
- Weights and Biases: These parameters determine the strength and influence of connections.
- Propagation Functions: Mechanisms that help process and transfer data across layers of neurons.
- Learning Rule: The method that adjusts weights and biases over time to improve accuracy.
Learning in neural networks follows a structured, three-stage process:
- Input Computation: Data is fed into the network.
- Output Generation: Based on the current parameters, the network generates an output.
- Iterative Refinement: The network refines its output by adjusting weights and biases, gradually improving its performance on diverse tasks.
Layers in Neural Network Architecture
Input Layer:
- This is where the network receives its input data. Each input neuron in the layer corresponds to a feature in the input data.
- The input layer consists of neurons (or nodes), each representing a feature of the input data.
- For example, if the input is an image with 784 pixels (e.g., a 28x28 grayscale image), the input layer will have 784 neurons, each representing one pixel value.
- The input layer does not perform any computation; it simply passes the data to the first hidden layer.
Hidden Layers:
- These layers perform most of the computational heavy lifting. A neural network can have one or multiple hidden layers. Each layer consists of units (neurons) that transform the inputs into something that the output layer can use.
- Each hidden layer consists of neurons that perform two key operations:
- Weighted Sum: Compute the weighted sum of inputs from the previous layer
- Activation Function: Apply a non-linear activation function (e.g., ReLU, sigmoid, tanh) to introduce non-linearity
- The output of each neuron in the hidden layer is passed to the next layer.
- Multiple hidden layers allow the network to learn hierarchical features:
- Early layers learn simple patterns (e.g., edges in an image).
- Deeper layers learn complex patterns (e.g., shapes, objects).
- Each hidden layer consists of neurons that perform two key operations:
Output Layer:
- The final layer produces the output of the model. The format of these outputs varies depending on the specific task (e.g., classification, regression). The output layer consists of neurons that represent the final output of the network.
- The number of neurons in the output layer depends on the task:
- Binary Classification: 1 neuron (outputs a probability between 0 and 1).
- Multi-Class Classification: neurons (one for each class, outputs probabilities for each class).
- Regression: 1 neuron (outputs a continuous value).
- The output layer computes the weighted sum of inputs from the last hidden layer and applies an appropriate activation function:
- Sigmoid: For binary classification (outputs a probability between 0 and 1).
- Softmax: For multi-class classification (outputs probabilities for each class).
- Linear/Identity: For regression (outputs a continuous value).
- The number of neurons in the output layer depends on the task:
Working of Neural Networks
Forward Propagation
When data is input into the network, it passes through the network in the forward direction, from the input layer through the hidden layers to the output layer. This process is known as forward propagation. Here’s what happens during this phase:
- Linear Transformation: Each neuron in a layer receives inputs, which are multiplied by the weights associated with the connections. These products are summed together, and a bias is added to the sum. This can be represented mathematically as:
- where
- represents the weights,
- represents the inputs, and
- is the bias is then passed through an activation function. The activation function is crucial because it introduces non-linearity into the system, enabling the network to learn more complex patterns. Popular activation functions include ReLU, sigmoid, and tanh.
- Activation: The result of the linear transformation (denoted as ) is then passed through an activation function. The activation function is crucial because it introduces non-linearity into the system, enabling the network to learn more complex patterns. Popular activation functions include ReLU, sigmoid, and tanh.
Backpropagation
After forward propagation, the network evaluates its performance using a loss function, which measures the difference between the actual output and the predicted output. The goal of training is to minimize this loss. This is where backpropagation comes into play:
- Loss Calculation: The network calculates the loss, which provides a measure of error in the predictions. The loss function could vary; common choices are mean squared error for regression tasks or cross-entropy loss for classification.
- Gradient Calculation: The network computes the gradients of the loss function with respect to each weight and bias in the network. This involves applying the chain rule of calculus to find out how much each part of the output error can be attributed to each weight and bias.
- Weight Update: Once the gradients are calculated, the weights and biases are updated using an optimization algorithm like stochastic gradient descent (SGD). The weights are adjusted in the opposite direction of the gradient to minimize the loss. The size of the step taken in each update is determined by the learning rate.
Iteration
This process of forward propagation, loss calculation, backpropagation, and weight update is repeated for many iterations over the dataset. Over time, this iterative process reduces the loss, and the network’s predictions become more accurate.
Through these steps, neural networks can adapt their parameters to better approximate the relationships in the data, thereby improving their performance on tasks such as classification, regression, or any other predictive modeling.