Really briefly introduction to Machine Learning

What is neural network ?

A neural network is a computational model made of interconnected layers of neurons that learns to approximate a function
mapping inputs to outputs by adjusting its parameters (weights and biases) through training.

What does a neural network consist of ?

Usually Neural networks consists of "layers of neurons" (input, hidden, output) where each neuron performs a simple computation. Layers are connected and information flows through them in a forward direction (or both directions for recurrent models).

How are Neural networks trained ?

Training uses backpropagation to compute gradients of the loss with respect to weights and biases (working backward through layers). Then an optimizer (SGD,RMSProp,Adam..) uses this gradient (vector of derivatives) to update params on each model's layer.

Typical Layer structure

Input Layer - filled with input data(inputs)
Dense Layer - hidden layers between input and output, it hold weights and biases, and it
actually calculates the outputs. Regularization methods (like L2 penalty) are often applied to Dense layers.
Activation Layer - Activation functions (ReLU,Softmax , etc..) are applied to Dense Layer output after linear transformation and "transform" the output values.
Dropout Layer - additional layer that can "turn off" some neurons to prevent certain neurons from having too much influence on the model's output (used only for training process)
Output Layer - hold output data, count of neurons depends of model type, for Categorical model each neuron corresponds to one class,
for Binary model we have only one neuron in the output layer(use Sigmoid activation function at last step),
for regression model only one output neuron.

Additional ML tools

Optimizer - An optimizer updates model parameters (weights and biases) using computed gradients.
Loss - function that measure model's output deviations from correct("ground truth") predictions
and used in backpropagation to update model's parameters (weights and biases)
Examples: - Regression loss - LossMSE, LossMAE
- Categorical loss - BinaryCrossentropy, CategoricalCrossentropy
Metrics - uses to measure model's perfomance, doesn't impact to the model
- Accuracy - metric function that measure model accuracy (how well the model works) (don't impact to the model's parameters)

Example model architectures

input_layer -> dense_layer -> ReLU -> dropout_layer -> 
dense_layer -> ReLU -> dropout_layer -> dense_layer -> activation_softmax

loss_function: Loss_CategoricalCrossentropy()
optimizer:     Optimizer_Adam()
accuracy:      Accuracy_Categorical()

Categorical model Tasks: Suitable for multi-class classification tasks, such as image classification with multiple categories.

input_layer -> dense_layer -> ReLU -> dropout_layer -> 
dense_layer -> ReLU -> dropout_layer -> dense_layer -> activation_sigmoid

loss_function: Loss_BinaryCrossentropy()
optimizer:     Optimizer_Adam()
accuracy:      Accuracy_Binary()

Binary model Tasks: Suitable for binary classification tasks, such as spam detection or yes/no questions.

input_layer -> dense_layer -> ReLU -> dropout_layer -> 
dense_layer -> ReLU -> dropout_layer -> dense_layer -> activation_linear

loss_function: Loss_MeanSquaredError()
optimizer:     Optimizer_Adam()
accuracy:      Accuracy_Regression()

Regressional model Tasks: Suitable for regression tasks, such as predicting continuous values like house prices or stock prices.

Tests

I'm finally compared my neural network NNFS version vs same architecture on PyTorch,
both use 15 epochs for training and tested on shuffled data.

input_layer         -> dense_layer(784,256) -> ReLU   -> 
dropout_layer(0.2)  -> dense_layer(256,256) -> ReLU   -> 
dropout_layer(0.3)  -> dense_layer(256,10)  -> activation_softmax

loss_function: Loss_CategoricalCrossentropy()
optimizer:     Optimizer_Adam()
accuracy:      Accuracy_Categorical()

Model architecture

NNFS accuracy: 0.8594
PyTorch accuracy: 0.8643

Such good performance for DIY model, the difference only 0.49%

NNFS test
Label                               | Accuracy
--------------------------------------------------
T-shirt/top                         | 0.849
Trouser                             | 0.948
Pullover                            | 0.746
Dress                               | 0.89
Coat                                | 0.781
Sandal                              | 0.944
Shirt                               | 0.58
Sneaker                             | 0.943
Bag                                 | 0.972
Ankle boot                          | 0.941
Mean accuracy                       | 0.8594

NNFS test on MNIST Fashion

PyTorch test
Label                               | Accuracy
--------------------------------------------------
T-shirt/top                         | 0.876
Trouser                             | 0.962
Pullover                            | 0.764
Dress                               | 0.867
Coat                                | 0.87
Sandal                              | 0.929
Shirt                               | 0.481
Sneaker                             | 0.956
Bag                                 | 0.976
Ankle boot                          | 0.962
Mean accuracy                       | 0.8643

PyTorch test on MNIST Fashion

Refs

It's such funny but gemini help me tweak NNFS to almost working state

Loss IBM