Layer of Neurons

Often neural networks have more than one neuron - layer of neurons.


  • Layer - is group of neurons
  • Each neuron in layer get same input, as other neurons in this layer, but contains own weight and bias
  • Layer output - set of neurons outputs

Tensors

Tensor object - object that can be represented as an array

  • In context of ML we can treat tensors as array.

Array - ordered homologous container for numbers (as ML context). Vector - just list in ML context. (but directed list of values actually)

Dot product (Or vector multiplication)

->a * ->b = a_1*b_1+. . .+a_n*b_n Both vectors must have SAME size.

def DotProduct(a:List,b:List):
    # ->a * ->b = a_1*b_1+. . .+a_n*b_n
    return sum([a[i]*b[i] for i in range(len(a))]) if len(a)==len(b) else 0

NumPy better in perfomance

bc NumPy's dot is implemented in C and uses SIMD instructions.

Dot product is most commonly used method for calculation neuron outputs

layer_output = np.dot(weights,inputs) + biases
# ->inputs*->weights[0] + biases[0], ->inputs*->weights[1] +
# biases[1], ->inputs*->weights[2] + biases[2]

Batch of data

Neural networks often receive data with batches . example of batch:

input data: inputs = [1,2,3]
shape:      (4,)
type:       1D array, Vector

Matrix Product

Matrix - list of vectors

  • Matrix product - matrix multiply

b = np.array(b).T transposition (Just turn matrix)

| 1, 2 |            | 1, 3, 5 |
| 3, 4 | -- T -- >  | 2 ,4, 6 |
| 5, 6 |

Layers

Input Hidden  Output
a1  --> b1   -\
a2  --> b2   --|--> c1
a3  --> b3   -/
  • We can have multiple hidden layers.

NNFS dataset

x[:,0] - selects all rows (:) and the first column (0) of the array x

array x[:5,0]
 [[0.         0.        ]
 [0.00299556 0.00964661]
 [0.01288097 0.01556285]
 [0.02997479 0.0044481 ]
 [0.03931246 0.00932828]]

select 1st column only with x[:5,0]
[0.         0.00299556 0.01288097 0.02997479 0.03931246]
  • Dense layer is Full connected layer
  • Pre-trained model may have existing weights, but in other case weights is random before model training
  • Often biases initialized with zero values
  • In general, neural networks work best with values between -1 and +1
  • For model easier to work with magnitudes closer to each others

Activation functions

  1. Hidden layers activation function
  2. Output layer activation function

Existing activation functions

  • AF - Activation Function

Step AF

Simple yes/no If weights *inputs + bias > 0 fn will return 1 otherwise 0

y = 1, x >  0
y = 0, x <= 0

This activation function has been used historically in hidden layers, but nowadays, it is rarely a choice.

Linear AF

y=x\large y = x A linear function is simply the equation of a line. This activation function is usually applied to the last layer’s output in the case of a regression model — a model that outputs a scalar value instead of a classification.

Sigmond AF

y=11+e(x)\large y = \frac{1}{1 + e^(-x)}

Step AF provide so little information. This function return value in gap [0,1] The Sigmoid function, historically used in hidden layers, was eventually replaced by the Rectified Linear Units activation function (or ReLU).

The Rectified Linear (ReLU)

y={xx>00x0y = \begin{cases} x & x > 0 \\ 0 & x \le 0 \end{cases}

This simple yet powerful activation function is the most widely used activation function at the time of writing for various reasons — mainly speed and efficiency. Wiki

Softmax AF

y=exy = e^x The softmax function is a mathematical tool that converts a vector of real numbers into a probability distribution.

inputs:      [4.8, 1.21, 2.385]
exp values:  [121.51041752   3.35348465  10.85906266] # interim step
norm values [0.89528266 0.02470831 0.08000903]

It also adds stability to the result as the normalized exponentiation is more about the difference between numbers than their magnitudes. Simple implementation

inputs = [4.8,1.21,2.385]
exp_vals = []
res = []
E = 2.71828182846
for i in inputs: exp_vals.append(E ** i)
for i in exp_vals: res.append(i / sum(exp_vals))
# [0.8952826639573506, 0.024708306782070668, 0.08000902926057876]

np.sum(axis=1/0,keepdims=True/False) keepdims - save shape of array or not axis

  • 1 - x axis
  • 0 - y axis

Calculating Network Error with Loss

#loss
math.log(1)    # 0
math.log(0.8)  # 0.22
math.log(-.01) # -4.60

When the confidence level equals 1, meaning the model is 100% “sure” about its prediction, the loss value for this sample equals 0.

The natural log represents the solution for the x-term in the equation ex = b; for example, ex = 5.2 is solved by log(5.2).

Average value

np.mean(array) is same as sum(array)/len(array)

In machine learning, a gradient refers to the vector of partial derivatives of a function

Backpropagation - indicates impact of each weight onto result of the function output.

Derivative of ReLU and chain rule

dvalue = 1. #next layer derivative
drelu_dz = dvalue * ReLU(z) 

#   ,where z in chain rule of current layer
#   z = sum(x0*w0+ .. xn*wn, b )

Cross-Entropy loss

Cross-Entropy loss - measurement how "wrong" a ML model prediction are. Commonly it used in the classification tasks.

np.eye - return arran N*N, filled ones on the diagonal and zeros everywhere else. For example

1 0 0
0 1 0
0 0 1

argmax - return index of max value by, for axis=1 argmax return index of max element in the row,respectively for axis=0 argmax return index of max element in the column

Optimizers

Running evaluation on the training data at the end of the training process will return the final accuracy and loss.

Save Model themself

  • copy.copydeep() - recursively copy all model's objects
  • copy.copy() - copy only top layer of objects

When we save Model themself we have to clear some generated data, like dinputs,dweights, accumulated data and layer's data inputs,outputs. but we keep trained weights and biases in the model, and all model's objects, like optimizer, accuracy, loss function. Later we can load whole model (with saved weights/biases) from the file.

Refs