Copyright © infotec016 . Powered by Blogger.

Friday, May 5, 2023

Parameter vs Hyperparameter


A parameter is a variable that is learned during the training of the neural network, such as the weights and biases of each neuron. These values are updated during training to minimize the loss function.

On the other hand, a hyperparameter is a setting that is chosen before training begins and is not learned during training, such as the learning rate, number of hidden layers, or choice of the activation function. These values can have a significant impact on the performance of the neural network, and they are typically set through trial and error or other optimization techniques.`

Setting the hyperparameters

  1. Grid search: This is a brute-force approach where you try out all possible combinations of hyperparameter values in a predefined range. While this can be time-consuming, it ensures that you test all possible combinations and find the optimal set of hyperparameters.

  2. Random search: This approach randomly samples hyperparameters from predefined ranges, and then trains the model with each combination of hyperparameters. While it may not guarantee to find the best hyperparameters, it can be more efficient than grid search.

  3. Bayesian optimization: This method uses probabilistic models to select the most promising hyperparameters for the next evaluation, based on the performance of previous evaluations. This method can be more efficient than grid search and random search, especially for high-dimensional hyperparameters.

  4. Expert knowledge: Sometimes, domain experts may have knowledge about the problem and the characteristics of the data that can help in selecting appropriate hyperparameters. This approach can be especially useful in situations where computational resources are limited.

Once the optimal set of hyperparameters is identified, they can be used to train the neural network, and the resulting model can be evaluated on a test dataset to ensure its effectiveness.


Grid search and random search are two of the most commonly used techniques for hyperparameter tuning in deep learning, especially when there are a limited number of hyperparameters to optimize. Grid search can be exhaustive, but it can also be time-consuming and computationally expensive, especially when the number of hyperparameters is high. On the other hand, random search can be more efficient in some cases, since it randomly samples hyperparameters from predefined ranges, which can lead to faster convergence to the optimal set of hyperparameters.

Bayesian optimization is also gaining popularity in deep learning for hyperparameter tuning, especially when the number of hyperparameters is high. This approach can be more efficient than grid search and random search, since it uses probabilistic models to select the most promising hyperparameters for the next evaluation, based on the performance of previous evaluations.


There are different methods and techniques to ensure that the parameters and hyperparameters of a neural network are in reasonable values. Here are some common approaches:

  1. Grid search and random search: These are two popular methods for hyperparameter tuning. Grid search exhaustively searches a pre-defined range of hyperparameters, while random search randomly samples hyperparameters from a given distribution. Both methods can help to find the best combination of hyperparameters that result in a good performance of the model.

  2. Cross-validation: Cross-validation is a technique to evaluate the performance of a model and to estimate its generalization error. It involves splitting the dataset into multiple subsets, training the model on a subset, and evaluating its performance on the remaining subsets. Cross-validation can help to identify overfitting or underfitting problems and to adjust the hyperparameters accordingly.

  3. Regularization: Regularization is a technique used to prevent overfitting of the model. It adds a penalty term to the loss function, which encourages the model to have smaller weights and biases. Regularization techniques such as L1 and L2 regularization can help to keep the weights and biases in reasonable values.

  4. Visual inspection: Sometimes, it is helpful to visualize the weights and biases of the neural network to get a sense of whether they are in reasonable values or not. For example, if the weights are extremely large or small, it may indicate a problem with the model architecture or hyperparameters.

  5. Gradual tuning: It is recommended to start with default hyperparameters and gradually adjust them based on the performance of the model. This approach can help to avoid setting extreme values for the hyperparameters and to find a reasonable range of values that result in a good performance of the model.

Overall, monitoring the parameters and hyperparameters of a neural network is an iterative process that involves tuning and adjusting the values based on the performance of the model.


adjusting the hyperparameters is also an important part of training a neural network. The choice of hyperparameters can significantly affect the performance of the model, and so it's important to carefully choose them. Some common hyperparameters include the learning rate, number of hidden layers, number of neurons in each layer, activation functions, and regularization strength. These hyperparameters are usually set based on trial and error or using techniques such as grid search or random search.


After training finishes, the weights in each layer of the neural network will be defined. During the training process, the weights are adjusted in each iteration to minimize the loss function, and at the end of training, the weights that result in the lowest loss are kept as the final weights. These final weights will be used for making predictions on new data.


Number of epochs

The number of epochs in a neural network is a hyperparameter that determines how many times the entire training dataset will be used to update the weights and biases of the neural network.

The number of epochs is often set based on the complexity of the problem, the size of the dataset, and the convergence rate of the network during training. In general, increasing the number of epochs can improve the performance of the network, but only up to a certain point, after which the performance may start to deteriorate due to overfitting.

A common practice is to monitor the loss function on a validation set during training, and stop training when the validation loss stops improving. This is known as early stopping, and it helps to prevent overfitting.

The number of epochs can be set manually by the user based on their experience and understanding of the problem, or it can be determined automatically using techniques such as grid search, random search, or Bayesian optimization.

Derivation of Backprogation


 

  1. Forward Pass: The input example is fed through the neural network one layer at a time, with each layer computing a weighted sum of its inputs, applying an activation function to this sum, and passing the result to the next layer of neurons. This process continues until the output layer of the network is reached, at which point the predicted output of the network is obtained.

  2. Cost Function: The cost function measures the difference between the predicted output of the network and the true output for the given input example. There are many different types of cost functions that can be used, but the most common is the mean squared error (MSE), which is simply the average of the squared differences between the predicted output and the true output.

  3. Backward Pass: The gradient of the cost function with respect to the weights of the network is calculated using the chain rule of calculus. This involves computing the derivative of the cost function with respect to the output of each neuron in the network, and then propagating this error backwards through the network using the chain rule. This results in a set of gradients that can be used to adjust the weights of the network.

  4. Update Weights: The weights of the network are adjusted in the direction of the negative gradient of the cost function, using an optimization algorithm such as stochastic gradient descent. This involves updating each weight by a small amount proportional to the gradient of the cost function with respect to that weight. This process is repeated for each input example in the training set, and the weights are adjusted after each iteration.

By repeating these four steps over many iterations, the backpropagation algorithm can learn to adjust the weights of the network in order to minimize the cost function and produce accurate predictions for new input examples.

Saturday, June 18, 2022

Proxy Server (Act on behalf of another)


Proxy server, which connects a computer network and the internet, retrieves the data on the internet on behalf of the user. It hides the IP address of a connected device in the public network and retrieve the web content/ data with the identity of its ip address.

Benefits...

  • Privacy: It allows the user to surf the internet anonymously
  • Speed: Proxy stores the web page in centralized cache database. So for the next time users can get it from the proxy server and no need to go for the internet.
  • Saves Bandwidth:Tthrough caching 
  • Activity loging: Keeps the record of users activity and block websites.

No encryption machanisms in Proxy.


Monday, January 6, 2020

Data mining Vs Machine learning


In data mining data is stored electronically and the search is automated or at least augmented by computer. Patterns in data can be sought automatically, identify, validated and used for prediction

Machine learning is associated with a computer program which can modify its parameters during  a training or learning phase in which it is provided with examples from a particular domain.
This program should be able to retain the domain knowledge and use it for future predictions.

Monday, December 9, 2019

Compiler Design - Definitions


Lexical analysis
Lexical analyzer reads the source program character by character and returns the tokens of the source program. The tokens represent pattern of characters which have the same meaning such as identifiers, operators, numbers and etc.

Syntax analysis
Syntax analyzer re-combine the tokens and create a graphical representation of the syntactic structure (syntax tree). In addition to that, it rejects the invalid string by reporting the syntax error.
In syntax tree the terminals are at the leave nodes and the inner nodes are non-terminals

Context-free grammar
Context-free grammar is a set of rules for putting strings together and so correspondent to a language.

Parse tree
Parse tree is a ordered rooted tree that graphically represents the semantic of a string derived from a context free grammar.

Top-Down approach
It starts from the start symbol(root) and goes down to leaves using production rules.

Bottom-Up approach
It starts from the leave and proceeds upwards to the root which is  a starting symbol.

Left most derivation
A left most derivation is obtained by applying the production to the left most variable or left most non terminal in each step.

Right most derivation
A right most derivation is obtained by applying the production to the right most variable or right most non terminal in each step.

Ambiguous grammar
A grammar is said to be ambiguous if any string generated by it has more than one parse tree or derivation tree or syntax tree or left most derivation or right most derivation.

Unambiguous grammar
A grammar is said to be unambiguous if a string generated by it has exactly one parse tree or derivation tree or syntax tree or left most derivation or right most derivation.

Left recursion
A production of a grammar is said to have left recursion if the left most variable or non-terminal of the right hand side is same as the variable or non-terminal of the left hand side.

Right recursion
A production of a grammar is said to have right recursion if the right most variable or non-terminal of the right hand side is same as the non-terminal or variable of left hand side.

Associativity
If an operand has operators on both side, then the side on which the operator takes the operand is associativity of that operator.

Precedence
Precedence determines which part operator is performed first in an expression with more than one operators with different precedence level.

Left factorization
In left factorizing, it is not clear that which of the two alternative production to use expand a non-terminal that is A -> ab/ac.

Thursday, December 5, 2019

BST - Binary Search Tree


For any node x,
    the keys in left sub tree of x are at most x.key and
    the keys in right sub tree of x are at least x.key.



Here the fisrt is BST but in the second 11 should not be in the left side of 10.

Binary Search Tree Property
   Let x be a node in BST.
   Let y be a node in left sub tree of x and
   Let z be a node in the right sub tree of y then
   y.key <= x.key and z.key > = x,key
Binary search average case-> O(lg n)
                       worst case -> O(n)
                       best case -> O(1)

Inorder tree walk
left --- root --- right

INORDER-TREE-WALK(x)
if(≠  NIL)
       INORDER-TREE-WALK(x.left)
       print x.key
       INORDER-TREE-WALK(x.right)


Preorder tree walk
root --- left --- right

PREORDER-TREE-WALK(x)
if(≠  NIL)
       print x.key
       PREORDER-TREE-WALK(x.left)
       PREORDER-TREE-WALK(x.right)


Postorder tree walk
left --- right --- root

POSTORDER-TREE-WALK(x)
if(≠  NILL)
       POSTORDER-TREE-WALK(x.left)
       POSTORDER-TREE-WALK(x.right)
       print x.key




INORDER TRAVERSAL
P, H, A, K, C, T, Q, M, R

PREORDER TRAVERSAL
C, H, P, K, A, M, T, Q,  R

POSTORDER TRAVERSAL
P,A, K, H, Q, T, R, M, C

If x is root of an n-node subtree, then the call INORDER-TREE-WALK(x) takes time O(n)

Searching

x- current node , k- search key
TREE-SEARCH(x, k)
if ( x == NIL or k == x.key)
       return x
if( k < x.key)
       return TREE-SEARCH(x.left, k)
else
       return TREE-SEARCH(x.right, k)

ITERATIVE TREE-SEARCH(x, k)
while( x  NIL or k  x.key)
         if( k < x.key)
                x = x.left
          else
                 x = x.right           return x

The running time of tree-search is O(h),  where h is height of tree.

Minimum


TREE-MINIMUM(x)
while(x.left  NIL)
      x = x.left
return x

Maximum
     
TREE-MAXIMUM(x)
while(x.right  NIL)
      x  =  x.right
return x

The running time of minimum and maximum is O(h),  where h is height of tree.



Successor

TREE-SUCCESSOR(x)
if(x.right  NIL)
      return TREE-MINIMUM(x.right)
y = x.p
while( y    NIL and x == y.right)
       x = y
       y = y.p
return y

Maximum
   
TREE-PREDECESSOR(x)
if(x.left    NIL)
       return TREE-PREDECESSOR(x.left)
y = x.p
while( y    NIL and x == y.left)
        x = y
        y = y.p
return y

Simply, Inorder successor of the node x is the smallest value node of its right subtree and the inorder predecessor of the node x is the largest value of its right subtree.






INORDER SUCCESSOR OF C -> T

INORDER PREDECESSOR OF C -> K

Insertion

The new element will be inserted by considering left and right subtree values. 
T- binary tree, z-new node


TREE-INSERT(T, z)
y = NIL
x = T.root
while( x  NIL)
       y = x
       if( z.key < x.key)
              x = x.left
        else
              x = x.right
z.p = y
if( y == NIL)
    T.root = z
else if( z.key < y.key)
     y.left = z
else
     y.right = z

Deletion

Deleting a node z from a binary search tree T has 3 cases:
If z has no children, then simply remove it by replacing with NIL
If z has only one child, then remove the z by replacing the child node of z with the z.
If z has two children then find the successor(y) and replace it with z. Original right and left subtrees of z becomes y's new subtrees.



TRANSPLANT(T, u, v)
if (u.p == NIL)
     T.root = v
else if (u == u.p.left)
      p.p.left = c
else if (u == u.p.right)
      u.p.right == v
if (v  NIL)
      v.p = u.p


TREE-DELETE(T, z)
if (z.left == NIL)
       TRANSPLANT(T, z, z.right)
else if(z.right == NIL)
       TRANSPLANT(T, z, z.left)
else
       y = TREE-MINIMUM(z.right)
       if (y.p  z)
             TRANSPLANT(T, y, y.right)
              y.right = z.right
              y.right.p = y
       TRANSPLANT(T, z, y)
        y.left = z.left
        y.left.p = y

Randomly build binary search tree

Wednesday, November 27, 2019

Machine - learning methods


Classification learning

classification learning is supervised.(Predicting a discrete class)
label is provided with actual outcome.


Association learning

detecting association between features.
it can be applied if no class is specified and any kind of structure is considered as "interesting".

Clustering

Finding group of items that are similar.
It is unsupervised


Numeric prediction

Variant of classification learning where class is numeric