Information Technology

Friday, May 5, 2023

Neural Network self references

In a neural network, each neuron has a set of parameters associated with it, which typically include a weight for each input connection and a bias term.

The weight parameter determines the strength of the connection between a neuron's input and its output, and is typically learned through a process called backpropagation during training. The bias term represents the neuron's inherent "activation level" and is also learned through training.

Together, the weight and bias parameters determine how the neuron responds to its inputs and how it contributes to the overall behavior of the neural network.

Why do we use linear equations in neural networks?

In the context of neural networks, the reason we use a linear equation for computing the weighted sum of the inputs is that this allows us to model linear relationships between the inputs and outputs. For example, in a simple regression problem where we are trying to predict a continuous output variable based on a set of input features, a linear model can often provide a good approximation of the underlying relationship between the inputs and outputs.

However, for more complex problems where the relationships between the inputs and outputs are nonlinear, we need to use more sophisticated models that can capture these nonlinearities. This is where more advanced neural network architectures, such as those with multiple layers or with nonlinear activation functions, come into play. These models allow us to capture more complex and nuanced relationships between the inputs and outputs, and can often achieve better performance than linear models.

most neural networks used in practice are not purely linear

most neural networks used in practice are not purely linear but rather employ some form of nonlinearity in their computations. This is because many real-world problems that we want to solve with neural networks have nonlinear relationships between the inputs and outputs.

For example, in a classification task where we want to predict the class label of an input data point, the relationship between the input features and the output class labels is often nonlinear. In order to capture these nonlinearities, we use nonlinear activation functions in the neurons of the neural network.

Common activation functions used in neural networks include the sigmoid function, the hyperbolic tangent (tanh) function, and the rectified linear unit (ReLU) function, among others. These activation functions introduce nonlinearity into the neural network computations and allow us to model complex and nonlinear relationships between the inputs and outputs.

Why do we use activation functions in NN?

Nonlinearity: Activation functions introduce nonlinearity into the computations of neural networks, allowing them to model complex nonlinear relationships between the inputs and outputs. Without activation functions, neural networks would be limited to linear transformations of the inputs, which would be unable to model many real-world problems.
Mapping to a range: Activation functions are often designed to map the output of a neuron to a specific range or set of values. For example, the sigmoid function maps its inputs to a range between 0 and 1, which is useful in binary classification problems where we want to predict the probability of a data point belonging to a particular class.
Smoothness: Activation functions can be designed to be smooth and differentiable, which is important for efficient training of neural networks using techniques like gradient descent. The derivatives of the activation functions are used in the backpropagation algorithm to compute the gradients of the loss function with respect to the weights and biases of the network.
Sparsity: Certain activation functions, such as the ReLU function, can induce sparsity in the activations of the neurons, which can improve the efficiency and interpretability of the neural network.

Overall, activation functions play a critical role in the computations of neural networks and are essential for enabling them to model complex nonlinear relationships between the inputs and outputs.

Nonlinear activation functions are typically used in neural networks to introduce nonlinearity into the computations of the network, even when the underlying functions being modeled are linear. This is because linear functions alone are often insufficient to model complex relationships between the inputs and outputs.

For example, consider a neural network that is being used to model a simple linear regression problem, where we want to predict a continuous output variable based on a set of input features. In this case, the underlying relationship between the inputs and the output is linear, and we could in principle use a linear activation function (i.e., no activation function at all) for the output neuron. However, in practice, we often use a nonlinear activation function such as the ReLU or sigmoid function for the output neuron, even though they are not strictly necessary for this problem. This is because using a nonlinear activation function can improve the network's ability to generalize to new data and can prevent overfitting.

Similarly, in the hidden layers of a neural network, nonlinear activation functions are used to introduce nonlinearity into the computations of the network and enable it to model complex relationships between the inputs and outputs, even when the underlying functions being modeled are linear. Without these nonlinear activation functions, the neural network would be limited to linear transformations of the inputs, which would be unable to capture many real-world phenomena.

why we are not commonly using a non-linear function as a base for a neural network?

we want the overall function computed by the network to be differentiable so that we can use gradient-based optimization techniques like backpropagation to train the network. Nonlinear functions can be highly non-smooth, which makes it difficult to use them for training neural networks.

Instead, we typically use a linear function as the base for a neural network and introduce nonlinearity through the use of activation functions in the neurons. The linear function allows us to compute a weighted sum of the inputs, which is then transformed by the activation function to introduce nonlinearity into the computation.

The use of a linear function as the base of a neural network also has the advantage of making the overall function computationally efficient and easy to optimize. Linear functions are simple and easy to compute, which makes training a neural network with a linear base more efficient than using a highly complex non-linear function as the base.

Overall, the combination of a linear function as the base and non-linear activation functions in the neurons allows us to create a powerful and flexible function approximator that can model a wide range of complex relationships between the inputs and outputs.

Several steps we can take to increase the chances of getting good performance from a neural network:

Use a large and diverse dataset for training: The more data you have, the better your neural network will be able to generalize to new examples. It's important to use a dataset that is representative of the problem you're trying to solve, and that includes a wide range of examples to cover the variability of real-world scenarios.
Choose an appropriate neural network architecture: Different neural network architectures are better suited for different types of problems. For example, convolutional neural networks are often used for image recognition tasks, while recurrent neural networks are often used for sequence modeling tasks. It's important to choose an architecture that is appropriate for the problem you're trying to solve.
Tune the hyperparameters: Hyperparameters are the settings that control the behavior of the neural network, such as the learning rate, batch size, and regularization strength. It's important to tune these hyperparameters to ensure that the neural network is performing optimally.
Regularize the model: Regularization techniques such as L1 and L2 regularization can be used to prevent overfitting, which occurs when the neural network becomes too complex and starts to memorize the training data instead of learning to generalize to new examples.
Monitor the performance during training: It's important to monitor the performance of the neural network during training to detect any issues early on. This can be done by measuring the loss on a validation set, or by monitoring other metrics such as accuracy or F1 score.
Test the neural network on a held-out test set: Once the neural network has been trained, it's important to test it on a held-out test set to evaluate its performance on new examples that were not seen during training. This will give you an estimate of how well the neural network is likely to perform in the real world.

By following these steps, you can increase the chances of getting good performance from a neural network, but it's important to keep in mind that there is always a trade-off between model complexity and generalization, and there may be limits to how well a neural network can perform on a given task.

Bias in Neural Network

In a neural network, bias is a term that represents the ability of a neuron to activate even when there is no input.

Mathematically, bias is a constant term that is added to the weighted sum of inputs of a neuron before applying the activation function. The bias term allows the neuron to have some activation even when all the input values are zero.

The bias term is an important component of a neural network, as it allows the network to learn more complex and nuanced patterns in the data. Without the bias term, the network would only be able to model linear relationships between the input and output.

The bias term is a learnable parameter, which means that its value is updated during training along with the weights of the network. The network learns the optimal value of the bias term which allows it to make accurate predictions on the training data.

In summary, bias is a term in a neural network that allows neurons to activate even when there is no input, and it is a learnable parameter that is updated during training.

Typically in the training process of a neural network, the bias values are initialized randomly, just like the weights. The neural network then learns the optimal values of the bias terms through the process of backpropagation, where the gradient of the loss function with respect to the bias terms is calculated and used to update their values.

The choice of how to initialize the bias terms can have an impact on the performance of the neural network, so it is important to choose an appropriate initialization method based on the specific problem being solved.

research

No Comments

Parameter vs Hyperparameter

2:53 PM mayuravaani

A parameter is a variable that is learned during the training of the neural network, such as the weights and biases of each neuron. These values are updated during training to minimize the loss function.

On the other hand, a hyperparameter is a setting that is chosen before training begins and is not learned during training, such as the learning rate, number of hidden layers, or choice of the activation function. These values can have a significant impact on the performance of the neural network, and they are typically set through trial and error or other optimization techniques.`

Setting the hyperparameters

Grid search: This is a brute-force approach where you try out all possible combinations of hyperparameter values in a predefined range. While this can be time-consuming, it ensures that you test all possible combinations and find the optimal set of hyperparameters.
Random search: This approach randomly samples hyperparameters from predefined ranges, and then trains the model with each combination of hyperparameters. While it may not guarantee to find the best hyperparameters, it can be more efficient than grid search.
Bayesian optimization: This method uses probabilistic models to select the most promising hyperparameters for the next evaluation, based on the performance of previous evaluations. This method can be more efficient than grid search and random search, especially for high-dimensional hyperparameters.
Expert knowledge: Sometimes, domain experts may have knowledge about the problem and the characteristics of the data that can help in selecting appropriate hyperparameters. This approach can be especially useful in situations where computational resources are limited.

Once the optimal set of hyperparameters is identified, they can be used to train the neural network, and the resulting model can be evaluated on a test dataset to ensure its effectiveness.

Grid search and random search are two of the most commonly used techniques for hyperparameter tuning in deep learning, especially when there are a limited number of hyperparameters to optimize. Grid search can be exhaustive, but it can also be time-consuming and computationally expensive, especially when the number of hyperparameters is high. On the other hand, random search can be more efficient in some cases, since it randomly samples hyperparameters from predefined ranges, which can lead to faster convergence to the optimal set of hyperparameters.

Bayesian optimization is also gaining popularity in deep learning for hyperparameter tuning, especially when the number of hyperparameters is high. This approach can be more efficient than grid search and random search, since it uses probabilistic models to select the most promising hyperparameters for the next evaluation, based on the performance of previous evaluations.

There are different methods and techniques to ensure that the parameters and hyperparameters of a neural network are in reasonable values. Here are some common approaches:

Grid search and random search: These are two popular methods for hyperparameter tuning. Grid search exhaustively searches a pre-defined range of hyperparameters, while random search randomly samples hyperparameters from a given distribution. Both methods can help to find the best combination of hyperparameters that result in a good performance of the model.
Cross-validation: Cross-validation is a technique to evaluate the performance of a model and to estimate its generalization error. It involves splitting the dataset into multiple subsets, training the model on a subset, and evaluating its performance on the remaining subsets. Cross-validation can help to identify overfitting or underfitting problems and to adjust the hyperparameters accordingly.
Regularization: Regularization is a technique used to prevent overfitting of the model. It adds a penalty term to the loss function, which encourages the model to have smaller weights and biases. Regularization techniques such as L1 and L2 regularization can help to keep the weights and biases in reasonable values.
Visual inspection: Sometimes, it is helpful to visualize the weights and biases of the neural network to get a sense of whether they are in reasonable values or not. For example, if the weights are extremely large or small, it may indicate a problem with the model architecture or hyperparameters.
Gradual tuning: It is recommended to start with default hyperparameters and gradually adjust them based on the performance of the model. This approach can help to avoid setting extreme values for the hyperparameters and to find a reasonable range of values that result in a good performance of the model.

Overall, monitoring the parameters and hyperparameters of a neural network is an iterative process that involves tuning and adjusting the values based on the performance of the model.

adjusting the hyperparameters is also an important part of training a neural network. The choice of hyperparameters can significantly affect the performance of the model, and so it's important to carefully choose them. Some common hyperparameters include the learning rate, number of hidden layers, number of neurons in each layer, activation functions, and regularization strength. These hyperparameters are usually set based on trial and error or using techniques such as grid search or random search.

After training finishes, the weights in each layer of the neural network will be defined. During the training process, the weights are adjusted in each iteration to minimize the loss function, and at the end of training, the weights that result in the lowest loss are kept as the final weights. These final weights will be used for making predictions on new data.

Number of epochs

The number of epochs in a neural network is a hyperparameter that determines how many times the entire training dataset will be used to update the weights and biases of the neural network.

The number of epochs is often set based on the complexity of the problem, the size of the dataset, and the convergence rate of the network during training. In general, increasing the number of epochs can improve the performance of the network, but only up to a certain point, after which the performance may start to deteriorate due to overfitting.

A common practice is to monitor the loss function on a validation set during training, and stop training when the validation loss stops improving. This is known as early stopping, and it helps to prevent overfitting.

The number of epochs can be set manually by the user based on their experience and understanding of the problem, or it can be determined automatically using techniques such as grid search, random search, or Bayesian optimization.

research

No Comments

Derivation of Backprogation

2:27 PM mayuravaani

Forward Pass: The input example is fed through the neural network one layer at a time, with each layer computing a weighted sum of its inputs, applying an activation function to this sum, and passing the result to the next layer of neurons. This process continues until the output layer of the network is reached, at which point the predicted output of the network is obtained.
Cost Function: The cost function measures the difference between the predicted output of the network and the true output for the given input example. There are many different types of cost functions that can be used, but the most common is the mean squared error (MSE), which is simply the average of the squared differences between the predicted output and the true output.
Backward Pass: The gradient of the cost function with respect to the weights of the network is calculated using the chain rule of calculus. This involves computing the derivative of the cost function with respect to the output of each neuron in the network, and then propagating this error backwards through the network using the chain rule. This results in a set of gradients that can be used to adjust the weights of the network.
Update Weights: The weights of the network are adjusted in the direction of the negative gradient of the cost function, using an optimization algorithm such as stochastic gradient descent. This involves updating each weight by a small amount proportional to the gradient of the cost function with respect to that weight. This process is repeated for each input example in the training set, and the weights are adjusted after each iteration.

By repeating these four steps over many iterations, the backpropagation algorithm can learn to adjust the weights of the network in order to minimize the cost function and produce accurate predictions for new input examples.

research

No Comments

Saturday, June 18, 2022

Proxy Server (Act on behalf of another)

9:27 PM mayuravaani

Proxy server, which connects a computer network and the internet, retrieves the data on the internet on behalf of the user. It hides the IP address of a connected device in the public network and retrieve the web content/ data with the identity of its ip address.

Benefits...

Privacy: It allows the user to surf the internet anonymously
Speed: Proxy stores the web page in centralized cache database. So for the next time users can get it from the proxy server and no need to go for the internet.
Saves Bandwidth:Tthrough caching
Activity loging: Keeps the record of users activity and block websites.

No encryption machanisms in Proxy.

Crypto / cybersecurity

No Comments

Monday, January 6, 2020

Data mining Vs Machine learning

6:40 PM mayuravaani

In data mining data is stored electronically and the search is automated or at least augmented by computer. Patterns in data can be sought automatically, identify, validated and used for prediction

Machine learning is associated with a computer program which can modify its parameters during a training or learning phase in which it is provided with examples from a particular domain.

This program should be able to retain the domain knowledge and use it for future predictions.

difference

No Comments

Monday, December 9, 2019

Compiler Design - Definitions

5:16 PM mayuravaani

Lexical analysis
Lexical analyzer reads the source program character by character and returns the tokens of the source program. The tokens represent pattern of characters which have the same meaning such as identifiers, operators, numbers and etc.

Syntax analysis
Syntax analyzer re-combine the tokens and create a graphical representation of the syntactic structure (syntax tree). In addition to that, it rejects the invalid string by reporting the syntax error.
In syntax tree the terminals are at the leave nodes and the inner nodes are non-terminals

Context-free grammar
Context-free grammar is a set of rules for putting strings together and so correspondent to a language.

Parse tree
Parse tree is a ordered rooted tree that graphically represents the semantic of a string derived from a context free grammar.

Top-Down approach
It starts from the start symbol(root) and goes down to leaves using production rules.

Bottom-Up approach
It starts from the leave and proceeds upwards to the root which is a starting symbol.

Left most derivation
A left most derivation is obtained by applying the production to the left most variable or left most non terminal in each step.

Right most derivation
A right most derivation is obtained by applying the production to the right most variable or right most non terminal in each step.

Ambiguous grammar
A grammar is said to be ambiguous if any string generated by it has more than one parse tree or derivation tree or syntax tree or left most derivation or right most derivation.

Unambiguous grammar
A grammar is said to be unambiguous if a string generated by it has exactly one parse tree or derivation tree or syntax tree or left most derivation or right most derivation.

Left recursion
A production of a grammar is said to have left recursion if the left most variable or non-terminal of the right hand side is same as the variable or non-terminal of the left hand side.

Right recursion
A production of a grammar is said to have right recursion if the right most variable or non-terminal of the right hand side is same as the non-terminal or variable of left hand side.

Associativity
If an operand has operators on both side, then the side on which the operator takes the operand is associativity of that operator.

Precedence
Precedence determines which part operator is performed first in an expression with more than one operators with different precedence level.

Left factorization
In left factorizing, it is not clear that which of the two alternative production to use expand a non-terminal that is A -> ab/ac.

No Comments

Thursday, December 5, 2019

BST - Binary Search Tree

12:31 PM mayuravaani

For any node x,
the keys in left sub tree of x are at most x.key and
the keys in right sub tree of x are at least x.key.

Here the fisrt is BST but in the second 11 should not be in the left side of 10.

Binary Search Tree Property
Let x be a node in BST.
Let y be a node in left sub tree of x and
Let z be a node in the right sub tree of y then
y.key <= x.key and z.key > = x,key
Binary search average case-> O(lg n)
worst case -> O(n)
best case -> O(1)

Inorder tree walk
left --- root --- right

INORDER-TREE-WALK(x)
if(x ≠ NIL)
INORDER-TREE-WALK(x.left)
print x.key
INORDER-TREE-WALK(x.right)

Preorder tree walk
root --- left --- right

PREORDER-TREE-WALK(x)
if(x ≠ NIL)
print x.key
PREORDER-TREE-WALK(x.left)
PREORDER-TREE-WALK(x.right)

Postorder tree walk
left --- right --- root

POSTORDER-TREE-WALK(x)
if(x ≠ NILL)
POSTORDER-TREE-WALK(x.left)
POSTORDER-TREE-WALK(x.right)
print x.key

INORDER TRAVERSAL
P, H, A, K, C, T, Q, M, R

PREORDER TRAVERSAL
C, H, P, K, A, M, T, Q, R

POSTORDER TRAVERSAL
P,A, K, H, Q, T, R, M, C

If x is root of an n-node subtree, then the call INORDER-TREE-WALK(x) takes time O(n)

Searching

x- current node , k- search key

TREE-SEARCH(x, k)
if ( x == NIL or k == x.key)
return x
if( k < x.key)
return TREE-SEARCH(x.left, k)
else
return TREE-SEARCH(x.right, k)

ITERATIVE TREE-SEARCH(x, k)
while( x ≠ NIL or k ≠ x.key)
if( k < x.key)
x = x.left
else
x = x.right return x

The running time of tree-search is O(h), where h is height of tree.

Minimum

TREE-MINIMUM(x)
while(x.left ≠ NIL)
x = x.left
return x

Maximum

TREE-MAXIMUM(x)
while(x.right ≠ NIL)
x = x.right
return x

The running time of minimum and maximum is O(h), where h is height of tree.

Successor

TREE-SUCCESSOR(x)
if(x.right ≠ NIL)
return TREE-MINIMUM(x.right)
y = x.p
while( y ≠ NIL and x == y.right)
x = y
y = y.p
return y

Maximum

TREE-PREDECESSOR(x)
if(x.left ≠ NIL)
return TREE-PREDECESSOR(x.left)
y = x.p
while( y ≠ NIL and x == y.left)
x = y
y = y.p
return y

Simply, Inorder successor of the node x is the smallest value node of its right subtree and the inorder predecessor of the node x is the largest value of its right subtree.

INORDER SUCCESSOR OF C -> T

INORDER PREDECESSOR OF C -> K

Insertion

The new element will be inserted by considering left and right subtree values.
T- binary tree, z-new node

TREE-INSERT(T, z)
y = NIL
x = T.root
while( x ≠ NIL)
y = x
if( z.key < x.key)
x = x.left
else
x = x.right
z.p = y
if( y == NIL)
T.root = z
else if( z.key < y.key)
y.left = z
else
y.right = z

Deletion

Deleting a node z from a binary search tree T has 3 cases:
If z has no children, then simply remove it by replacing with NIL
If z has only one child, then remove the z by replacing the child node of z with the z.
If z has two children then find the successor(y) and replace it with z. Original right and left subtrees of z becomes y's new subtrees.

TRANSPLANT(T, u, v)
if (u.p == NIL)
T.root = v
else if (u == u.p.left)
p.p.left = c
else if (u == u.p.right)
u.p.right == v
if (v ≠ NIL)
v.p = u.p

TREE-DELETE(T, z)
if (z.left == NIL)
TRANSPLANT(T, z, z.right)
else if(z.right == NIL)
TRANSPLANT(T, z, z.left)
else
y = TREE-MINIMUM(z.right)
if (y.p ≠ z)
TRANSPLANT(T, y, y.right)
y.right = z.right
y.right.p = y
TRANSPLANT(T, z, y)
y.left = z.left
y.left.p = y

Randomly build binary search tree

1 Comment

Information Technology

Tech

About Me

Friday, May 5, 2023

Neural Network self references

Parameter vs Hyperparameter

Derivation of Backprogation

Saturday, June 18, 2022

Proxy Server (Act on behalf of another)

Monday, January 6, 2020

Data mining Vs Machine learning

Monday, December 9, 2019

Compiler Design - Definitions

Thursday, December 5, 2019

BST - Binary Search Tree

Popular Posts

Blog Archive