Neural Networks
Definition
Neural Network
A Neural Network is a parameterized function approximator composed of layers of linear transformations followed by non-linear activation functions. They are designed to learn complex, non-linear mappings from inputs to outputs using the principles of gradient-based optimization.
Mathematical Formulation
A single hidden layer network can be represented as:
where:
- — Weight matrix and bias vector of layer
- — Non-linear activation functions (e.g., ReLU, Sigmoid, Tanh)
- — Input vector
Universal Approximation Theorem
Universal Approximation
A feedforward network with a single hidden layer and a finite number of neurons can approximate any continuous function on compact subsets of , provided the activation function is non-constant, bounded, and monotonically-increasing.
Training (Backpropagation)
Neural networks are trained using Stochastic Gradient Descent (SGD) or variants like Adam. The gradients are calculated using Backpropagation, which is an application of the Chain Rule of calculus to compute the partial derivatives of a loss function with respect to every weight in the network.
Key Concepts
- Activation Functions: Introduce non-linearity (e.g., ).
- Layers: Layers between input and output are “hidden layers.” Deep networks have many.
- Weights: The parameters that the model “learns.”
Connections
- Used in: Deep Reinforcement Learning
- Optimization: Gradient Descent, Adam, RMSProp
- Prevention of Overfitting: Regularization
- Variant: Convolutional Neural Networks (for vision), Transformers (for sequence)