AI Wiki.

A beginner’s guide to important topics in AI.


It is the machine learning task of identifying and segmenting different point-clouds and points of interest in a 3D image. It requires 3D Deep Learning approaches like 3D-CNNs and utilization of techniques like voxelization of point cloud data. It is also more computationally expensive than 2D semantic segmentation.


It is a specialized application of 3D semantic segmentation where as 3D shape is broken down into composite parts. It requires detailed structural information and high voxel resolution - making it even more resource-exhaustive than 3D semantic segmentation. It requires use of novel CNN architectures to operate efficiently.


It is a subset of pose estimation tasks that require estimating the position of an object in a 3D space in addition to its 3D orientation. The technology works by first establishing the 3D object position in a 2D projection, then using a Perspective-n-Point algorithm to compute its pose parameters. It has applications in advanced robotics.


Accuracy is the degree of closeness of predicted value to the actual value.It is denoted as, Accuracy=(TP+TN)/(TP+FP+FN+TN)


Human activity recognition tasks are a time series prediction problem that use streams of sensor data, split them into windows and classify these videos broadly into categories. Deep learning solves these tasks with architectures like convolutional neural networks (CNN) and long short term memory networks(LSTM).


Activation functions are scalar-to-scalar function, yielding the neuron’s activation.They act like gate which determines whether the features from previous layer should pursue through the nodes.These activation function make complex boundary decisions for features by using a combination of weights and biases on input data


Adaptive Boosting is a machine learning meta-algorithm used in conjunction with other types of algorithm in order to improve the performance. They can be used for binary as well as multi-class classification without reducing into binary class problems.Adaptive Boost is used for feature selection, dimensionality reduction and thereby improving the execution time.


AdaDelta is an extension of AdaGrad in which it only keeps the most recent history rather than accumulating all the gradients for optimization.


AdaGrad is an algorithm for gradient based optimization.It is an adaptive learning rate method in which it adaptively uses sub-gradient methods to dynamically control the learning rate of an optimization algorithm.It never increase the learning rate beyond a base learning rate.


An algorithm for first-order gradient-based optimization in which the learning rates are derived from estimates of lower-order momentum.


Adjusted Rand Index is preferred when the partitions compared have different number of clusters.The adjusted random index has a maximum value of 1 when the clustering are identical and its expected value in the case of random labeling independently of the number of clusters is 0.


It is a subcategory of face recognition tasks that require a person to be identified regardless of their age. This is important for applications like passport face matching, missing individual identification, etc. It can be done by learning ageing subspace independently of identity subspace which simplifies the recognition system.


It is a subset of video activity recognition that identifies suspicious activities. As it is impossible to cover all "normal" activities and annotate them, this task requires a deep multiple instance ranking framework that can be trained using weakly labelled videos. It employs few-shot techniques and hence care must be taken not to overfit to training data.


Area under the curve can be interpreted as the probability that a classifier model will rank a positive random example higher than a negative random example.A model whose predictions are 100% true will have an AUC equal to 1.0 and model with 100% wrong prediction will have an AUC of 0.0.


The science and engineering of making intelligent machines that can solve generic problems.It mainly aims at implementing human intelligence in machines and to create expert systems.


Association rules are used to uncover the relationship between the data that seems unrelated in a relational database.The Association rules are like if/then statements which consists of an antecedent and a consequent. Former is the item found in data and later is the item found in combination with antecedent. Association rule mining is a procedure which is meant to find patterns, correlations,associations from data found in various kinds of databases.Association rule uses the criteria such as support and confidence to identify the important relationships among the data.Support defines how frequently the if/then relationships appears in database and confidence indicates the number of times these relationships is found to be true.


Attentive Segmentation networks are architectures that use Attentive Graph Neural Networks for few-shot learning in Image segmentation tasks. These networks have the advantage of being less likely to overfit compared to graph neural networks while retaining the ability to learn from a few samples.


This is a statistical test for measuring stationarity in time series data. Most commonly it is tested for a 95% confidence interval(p value less than 0.05). Establishing stationarity with transformations is useful as machine learning models are more predictive when working with stationary data.


It is a measure of similarity of a time series data with a lagged version of itself. The value of autocorrelation ranges from -1 to 1, representing perfect negative to perfect positive correlation. Autocorrelation only measures linear relationships and ignores non-linear relationships.


Autoencoder neural network is an unsupervised learning algorithm used to learn compressed representation of datasets, typically for dimensionality reduction.The input is compressed into a latent-space representation and then output is reconstructed from this representation.Two common variants of autoencoders are compression autoencoders and denoising autoencoders.


Backpropogation is a method used in artificial neural network for reducing the error in neural network is often used with gradient descent optimization algorithm to adjust the weight of the neuron by calculating the gradient of the loss function.The error is calculated at the output and is propagated backwards through the network layers.


Also known as bootstrap aggregating, in this approach multiple models are trained on randomized sub-samples of training data. The predictions from these models are combined using weighted averages to achieve as little error as possible. Random forest is an example of a common algorithm that utilizes Bagging.


A text pre-processing algorithm used in NLP and for retrieving information.A text document is represented as an unordered collection of words and is mapped to an index of the sparse vector such that it can be processed by the ML algorithms . A sparse vector has an index for every words in the vocabulary. Hence the bag of word representation of text defines the number of occurrence of word in a document.


A Bayesian Neural Network is a combination of a probabilistic model with neural networks by associating every element with a probability distribution. a Bayesian Neural Network is very effective at avoid overfitting over traditional methods like regularization and dropout. It is generally used on smaller datasets as it is computationally expensive.


Biases are scalar values attached to the neurons and added to the input in which is used to adjust the output. They allow the network to try new interpretations or behaviour. Bias are modified throughout the learning process.


It is an extension of cubic interpolation for interpolating data points on a 2D grid.It guarantees the continuous first derivatives as well as cross-derivative but the second derivative could be discontinuous. Bicubic interpolation is chosen over bilinear interpolation in image processing as the images resampled are smoother and have fewer artifacts.


It is a tool used to identify the words that appear consecutively within a document. By calculating the frequency of words and their appearance in the context of other words the collocation is found and is then filtered to obtain useful terms. Then each n-gram of word is scored in accordance with the association measure to determine whether the n-gram is a collocation.


It is an extended form of linear interpolation for interpolating functions with two variables on a rectilinear 2D grid.At first the linear interpolation is performed in one direction and then in other direction. Bilinear interpolation is a re-sampling technique which finds application in image processing.


Balanced Iterative Reducing and Clustering using Hierarchies is a hierarchical clustering algorithm which performs clustering over large datasets.It can perform clustering incrementally without providing the whole dataset in advance.Is also has an added advantage because of its ability to efficiently use the full memory to derive the finest sub-clusters, thereby minimizing the I/O costs. The clustering can be achieved in three steps by building a clustering feature tree and then clustering the CF tree leaves by performing hierarchical clustering and the learned model can be used to cluster the observations.


BK-tree is a metric tree specifically adapted to discrete metric spaces which finds application in approximate string matching in a dictionary.The auto correct features in various software are are implemented based on this data structure.BK tree consists of nodes and edges.Every node in the BK tree have exactly one child with same edit distance and every insertion in the BK tree start from the root node.


It is a common type of ensemble application of machine learning. In boosting, weak models (low accuracy) are chained together in series and trained on the prediction error of the ensemble. This enables the weak models to focus on unique, hard-to-classify instances better.


Bootstrap is a technique for model validation in which statistical accuracy is assessed.This is the process of selecting datasets randomly from training data with replacement such that the sample size is same as the training set. This process is repeated until k number of bootstrap dataset is obtained.Then the model is refitted against each bootstrap datasets and the performance is examined.


Clustering Large Applications based upon Randomized Search is an efficient and effective algorithm for spatial data mining.It is a medoid-based algorithm in which a representative item or medoid is chosen for each cluster rather than the means of the items. CLARANS applies the strategy to search in a certain graph, to find the k medoids from n objects.A node in the graph represents a set of objects as selected medoids.Two neighbor nodes are randomly chosen in each iteration and a local optima is obtained if the choice is worse.CLARAN has got two parameters such as maxNeighbor and Number of local minima.Higher the value of maxNeighbor, closer is CLARANS to PAM(Partition Around Medoids).


It is a technique that visualizes the structure of distance-like data as a geometrical picture.Also known as principal coordinates analysis.When a input matrix of dissimilarities between pairs of items is given, MDS finds a set of points in the low dimensional space which approximates the dissimilarities.when the input matrix is of Euclidean type MDS is equivalent to PCA.


It is the algorithmic procedure of assigning a input object represented by feature vector into a category or class.For example, a classification model can determine whether an email belongs to a spam or non-spam class.


It is the idea that a particular item is consistently classified into the same group each time it is assessed by a model. It is measured (for all classes) by metrices like marginal classification index and the kappa coefficient.


Clustering is the task of grouping observations with similar features into a cluster or subset.It is a common technique for statistical data analysis used in many fields such as machine learning, pattern recognition,etc.


Models create a mathematical relationship between input features and predictions. Concept drift occurs when the relationship between input features and target results changes in real life. This can be detected using statistical methods like McDiarmid Drift Detection or by periodic retraining.


Conditional generative adversarial network use class label information, allowing them to conditionally generate data of a specific class.In CGAN both the generator and discriminator are conditioned on some data y which can be class label or data from some other modality.


Conditional random field is a sequence modeling technique used to predict sequence of labels for sequence of input is a type of discriminative undirected probabilistic graphical model which applies the principle of logistic regression by extending the algorithm by applying feature functions as sequential inputs. CRF is usually trained by maximum likelihood learning.CRF is widely used in NLP.


A classification matrix is an error matrix which defines the performance of a classification model.It summarizes the performance in a table of two columns and two rows based on the values of true positives, true negatives, false positives and false negatives.


The main objective of Convolutional neural network is to learn the higher order features in the data via convolution.The CNN transform the input data through all connected layers into a set of class scores given by the output layer.CNN comprises of input layer, a feature extraction layer such as a convolution layer and a pooling layer and a classification layer.


It is a measure of drift detection that compares the angles between the probability distributions of training data and live data. This is derived mathematically from the dot product of two distribution vectors.


Cover tree is a type of data structure designed to facilitate the process of nearest neighbor search especially in spaces with small intrinsic dimension.A cover tree on a dataset is a leveled tree where each level is indexed by an integer scale which decreases as the tree is descended.


Cross validation is a model validation technique for assessing the ability of a model to predict on new data.The model is first trained using the subset of the data set and then evaluated using the test data set .Multiple rounds of cross validation are performed using different partition of sample data in order to reduce the variability.Then the validation results are averaged over the rounds to obtain the effective model’s predictive performance.


Cubic spline interpolation is a special case of spline interpolation in which a smoother interpolating polynomial is obtained which has smaller error compared to other interpolating polynomial.Spine interpolation is a form of interpolation in which interpolant is a piece-wise polynomial called spline function.It uses a low degree polynomial for spline to reduce the interpolation error.


A cumulative distribution function is a mathematical (and graphical) representation of a variable. It shows the total number (or fraction) of a variable for each value lower than x. The slope of a Cumulative Distribution Function approaches 0 as x tends to infinity.


Data augmentation is a set of techniques that increase the relevant data for training of a neural network. This is especially important for image and video classification data and solves challenges like lack of sufficient data as well as morphed data. Data augmentation can be used offline (pre-made) or online (on mini-batches as model is trained) for small and larger datasets respectively.


Data drift is the change in data distribution over time. This can either be gradual, abrupt or cyclical. Data drift is detected using statistical testing methods like Wasserstein Distance for cumulative distribution functions, Kolmogorov-Smirnov test, Incremental Kolmogrov-Smirnov Test, etc.


It is an algorithm used for generating date and time.Generally the date attributes are represented as long values which is considered insignificant in data mining.So for analysis features such as year,month day is extracted.


Density-Based Spatial Clustering of Applications with Noise is a density based data clustering algorithm which finds core samples of high density and expands clusters from them.DBScan requires two parameters in which the neighborhood radius and the number of minimum points required to form a cluster must be specified. It first analyse the neighborhood of a data point and if it meets the requirement, a cluster is formed and the neighborhood is also added, else it will be labeled as noise.The number of the clusters doesn’t need to know priori.


Deception detection is the task of identifying lying, misleading and other deceptive behaviour. Deep learning approaches this problem by using convolutional neural network classifiers to analyze microexpressions in human faces that are correlated to deception. These methods can achieve high accuracy in practice.


Decision tree builds classification and regression models in the form of a tree structure.It is learned by a process called recursive partitioning in which training sets are splitted based on an attribute value test.The tree is built from a root node and involves partitioning of data into subsets with similar values. The process is completed when the subset at a node has the same value of target variable.Decision tree can handle both categorical and numerical data.Hence the tree methods can be used for data-mining tasks.


Deconvolutional networks has multiple stacked deconvolutional layer where each layer is trained on the input of the previous layer.It performs an inverse convolutional model.It maps features pixels when modeling images, thus enables us to generate images as output from neural networks.


Denoising autoencoder is a variation of autoencoder. It is a technique used for feature selection and extraction when the input is corrupted. Denoising autoencoder solves the problem associated with corrupted data by randomly turning the input values to zero.Thus it reconstructs the data from an input of corrupted data.


DBNs comprises of layers of Restricted Boltzmann Machines which extracts higher-level features from raw input vectors and a feed forward network.The layers of RBMs are stacked for the pretrain phase and is trained in a greedy manner, then a feed-forward network is used for fine-tune phase.DBNs are used to recognize, cluster and generate images, video sequences and motion-captured data.


DCGAN is a variant of generative adversarial network which is used to generate new content.DCGAN architecture uses a CNN architecture on the discriminative model and for generator, convolutions are replaced with up-convolutions.


Deep Learning is a subset of machine learning. It deals with set of algorithms which is a combination of math and code that loosely resembles the structure and function of neurons in the human brain. Deep Learning can achieve Human-level accuracy on tasks like image recognition, voice recognition and predictive analytics. It is basically machine perception. Deep neural networks maps inputs to outputs byfinding correlation between two sets of data through some hidden layers between input and output layers which consists of some hyper-parameters.


Deep Feature synthesis is a family of algorithms that combine input data (primitives) from multiple tables across a database to create new features that are a stack of the original primitives. It simplifies the task of working with data from multiple families of tables in a relational database considerably.


Density Clustering algorithm is a special case of Kernel Density Estimation used for clustering.Data points are clustered based on the local maximum of the estimated density function.The parameters associated with DENCLUE are sigma, which is a smooth parameter and parameter m specifies the number of samples used in the iteration. DENCLUE generally doesn’t work well on high dimensional data since data in high dimensional space looks uniformly distributed.


Denoising in visual media like image and video is a significant challenge in computer vision tasks to improve the accuracy of these systems. Denoising techniques vary widely - from traditional methods like sparse representation, variational regularization, transforms, and domain filtering to deep learning based methods like Convolutional Neural Networks and Multi Layer Perceptrons.


Deterministic Annealing is a technique used for non convex optimization problem of clustering.By extending the soft clustering to the annealing process, it avoids the concept of local minima of cost function.The annealing process starts with high temperature and for each iteration the centroids vectors are updated until convergence is reached. As the temperature is lowered the vector will split and the no of phases corresponds to the number of clusters.Further decrease of temperature beyond a critical value will leads to more splitting until the vectors are separate.


It is a sub-task of depth perception with the goal of identifying the structure of a 3D scene using video footage. Deep learning improves upon traditional method limitations like assumption of invariant illumination and gives efficient results when trained on image synthesized training data and using an unsupervised network for final estimation.


Depth estimation(especially monocular/single-image depth estimation) in 2D images has wide applications in computer vision like robotics, photo editing, and autonomous driving. Effective depth estimation can be achieved using convolutional neural network architectures that use upscaling and unpooling layers with pre-trained ResNet networks.


It is the tasking of creating high resolution depth maps from low-resolution maps. While there are many traditional methods, deep learning methods that use guided convolutional layers to identify edges and upsample effectively.


Dimensionality reduction is also known as feature extraction.It can be defined as the process of transforming data in high dimensional space to a fewer dimension space.Dimensionality reduction techniques can be linear as well as non-linear.


Domain Adaptation is a type of Transfer Learning that utilizes labelled data in a one domain to train models for tasks in another domain. It is divided into instance-based and feature-based adaptation depending on the treatment of data.


It is an advanced application of domain adaptation that takes information from multiple sources into a model so that it can handle unseen target data better.


Drift monitoring is the practice of tracking changes in model behaviour over time - either from changes in distribution of input data (called data drift) or changes in their relation to final model predictions (called concept drift). Drift monitoring and retraining is key to long-term usability of a model in production.


Dropconnect is a regularization technique in which randomly selected weights within the network are set to zero so that a network with better generalization capability can be obtained.


It is a regularization technique used to improve the training of neural network.A mechanism in which randomly selected subset of activations are set to zero within each layer during training so that network become less sensitive towards specific weights of neurons.Hence result in a network with better generalization and less likely to overfit the training data.


It is a subset of image segmentation that focuses on isolating 3d objects moving in an image plane. These tasks are more important in large-scale videos that encompass many objects like crowds or public spaces. By combining gemoetric cues for static/dynamic sorting and using deep learning for the rest, it is possible to improve upon resource and computational efficiency in these tasks.


Feature selection in machine learning can be done by integrating ensemble learning methods such as random forest, gradient boosted trees and AdaBoost.An ensemble will combine the outputs of multiple models and the method importance will return the scores of the feature selection for which, higher is the score better the feature will be.Among other feature selection methods it has got a great advantage of its ability to handle stability issues .


Ensemble methods refers to the practice of collectivizing the predictions of multiple models to create final predictions. Common methods are boosting (where multiple weak models are combined in series), bagging (combining aggregate predictions of multiple similar models trained on different subsets of data), and voting (averaging predictions from multiple different models trained on full data).


This is a learning problem that causes gradients to enlarge exponentially with each iteration of training. This causes the training error to swing wildly and can result in models being unable to learn after further iterations. This can be corrected using different architectures, regularization and learning rates.


Face alignment tasks focus on identifying the structure and geometric alignment of human faces and usually reorienting them into standard positions. It has wide-ranging applications as a pre-processing technique for face recognition, security and other fields.


It is the machine learning task of differentiating between genuine faces and images of faces and other such counterfeit methods. These methods are important for effective facial recognition security. Deep learning architectures created by combining CNN and RNN models are generally used for anti-spoofing purposes.


Face detection is a broader scope of computer vision problem than face recognition and is easier to design for using the same principles. Face detection is also widely used with many applications like photography, emotion capture, facial expression recognition, etc.


It is a slew of super resolution techniques that are designed to improve the quality of human facial imagery from a low-quality image. This is done by first using linear models to learn the relationship between high-resolution images and their down-sampled versions, and then modeling the difference between artificially generated high resolution image and original image.


It is a subset of image generation techniques that use neural architectures like Generative Adversarial Networks to artificially create faces by using training data. It has applications in video and game design for creating realistic worlds without using templated character models.


Face recognition is one of the most widely-publicised applications o deep learning. Commercial applications have achieved high accuracy using advanced convolutional neural net architectures like Multi-task Cascade CNNs. It has wide-ranging applications like security, photo-tagging etc.


Facial action units are regions of face that indicate fine-grained changes in facial expressions. Detecting these is crucial to important applications like micro-expressions and measuring human emotions. Attention mechanisms and relation learning are some of the novel approaches to solve these challenges.


It is the set of tasks that require recognizing attributes and facial features like beards, glasses, etc on faces. Accurate models can be created for such tasks by utilizing multicropping and pretraining on facial verification. Attribute detection helps with annotation-based search, especially for surveillance footage.


Facial landmark detection is the challenge of identifying important points on faces to determine expressions, head movements and sentiments. While older approaches relied on constrained local models or regression-based methods, advancements in deep learning enable better quality outcomes with algorithms like HRNets.


Fallout or false positive rate is the Type I error.It is the ratio of incorrect positive predictions to the total number of negatives. FPR = (FP/N) = (FP/(FP+TN)) It can also be calculated as 1-specificity. The best FPR is 0.0 and the worst value is 1.0.


False discovery rate can be defined as expected proportion of TYPE I errors.It is defined as the ratio of false positive to the sum of false and true positives. FDR = (FP/FP+TP)


Features are individual measurable property of an event which is represented as a numeric feature vector. Feature Engineering is an essential part of building intelligent system. It is the process of creating feature vectors using domain knowledge of data that makes the algorithm work.


Feature drift is a slightly neglected form of drift akin to concept drift. It occurs when some features in training data become irrelevant to overall model predictions. This can be monitored by mapping correlation of features to predictions over time.


Feature selection is the process of selecting a subset of relevant features for building model.It enhances the performance of machine learning models by avoiding the curse of dimensionality and enhancing the generalization capability by reducing overfitting. Feature selection algorithm can be categorized as feature ranking and subset selection.Feature ranking is the method in which feature are selected based on a score whereas subset selection produces an optimal subset of features. For larger set of features subset selection can be achieved by implementing genetic algorithms.


A feed forward neural network is an artificial neural network in which the input signals flow in only one direction such as the connection between nodes do not form a cycle.


It is the machine learning task of classifying images by using only a few points of training data for each category. This is particularly valuable for tasks which have very limited training data. However, this system is prone to overfitting and requires heavy prepreocessing and combination of transfer based learning methods for effectiveness.


Also called k-shot learning. It is a set of deep learning techniques that enable models to learn from minimal information during training. It can be divided into three problem perspectives - data (augmenting supervised space), model(reducing hypothesis space with experience), and algorithm(utilizing prior knowledge to find best hypotheses). Relevant learning problems are weakly supervised learning, Imbalanced learning, Transfer learning, and Meta-learning.


These are machine learning tasks that achieve a high level of distinction between very similar object classes like identifying an individual among species of an animal or a species in its genus. These classes have very little inter-class variance, making this task very challenging. It is done by applying semantic part localization techniques on high-quality image datasets.


Fisher’s Linear Discriminant is a linear classifier that measures the ratio of variance for class labeling.It finds a linear combination of features that can be used for dimensionality reduction before later classification.This method projects high-dimensional data onto a line and classification is done in this single-dimensional space.


Frequent item set play a vital role in data mining tasks that finds interesting patterns from databases.This algorithm perform the task to find all the common sets of items,defined as those itemsets exists at least minimum amount of times.There are several algorithm which finds frequent itemsets by building prefix trees.One such algorithm is FP-growth algorithm.The idea behind the FP-growth algorithm is based on the recursive elimination scheme.


Traditional F-score can be defined as the weighted average of precision and recall.F1 score reaches its best value at 1 and worst at 0.A positive real β has been included to calculate in terms of the ratio of weighted importance on either recall or precision.


Gaussian process is a stochastic process for regression.It consists of a collection of random variable indexed by time or space with normal distribution.Gaussian process is an extension of multivariate Gaussian with an infinite sized collection of real valued variables.The reliability of the regression depends upon the covariance function. It can also be used as a prior probability distribution over functions in Bayesian inference for regression analysis.A machine learning algorithm with Gaussian process uses kernel function to measure the similarity between point or a matrix algebra is used to calculate prediction using the technique of kriging.


Generalized Hebbian algorithm is a linear feed forward neural network model for unsupervised is an adaptive method to find the eigen vectors of the covariance vector corresponding to largest k. it finds application in principal component analysis. The learning is a single layer learning process in which the changes in synaptic weight depends on the response of input and outputs of that layer. It has predictable trade off between learning speed and the accuracy of convergence which can be set by learning rate η.Practically the learning rate is set to a small constant value.


The generative network in GANs generate data with a special kind of layer called deconvolutional layer and the discriminator network evaluates discriminates between instances from the data.It is the discriminator that decides whether the instance of data belong to actual training dataset.


The Genetic algorithm is an heuristic optimization method used to generate solutions using techniques inspired by natural evolution.GA algorithm involves the process of fitness assignment, selection, recombination and mutation process for each individual and the best features are selected based on the value of selection error. It is one of the most advanced method used to select the most useful features from a large set of features but it requires a lot of computation.


G-Means is another extended variation of K-Means in which the number of clusters are determined based on the normality test.It takes a hierarchical approach to detect the number of clusters.G-means run k-means repeatedly with increasing value of k to test whether the data in the neighborhood of a cluster centroid look Gaussian and if not the cluster splits.


Gradient Boosting is a machine learning technique used for classification as well as regression which produces a prediction model in the form of an ensemble of weak prediction models.It is typically used in conjunction with decision trees.Gradient boosting involves a loss function to be optimized, a weak learner to make predictions and an additive model to add weak learners to minimize the loss functions. The models generalization capability can be improved by implementing regularization techniques such as choosing an optimal number of gradient boosting iterations, the shrinkage parameter that controls the learning rate and the sampling rate for stochastic tree boosting.


Gradient descent is a first order iterative algorithm used to update the parameters of the model.It finds the optimal values of parameters of a function that minimizes the loss function.


When forecasting with time series data, it is usually important to determine if a particular external event is influencing the time series data. This can be done using a Granger causality test.


The Growing Neural Gas is an incremental algorithm where the information about the number of clusters will not be provided priori.It is capable of continuous learning and unlike neural gas, the parameters will not change over time.It can add or delete nodes during algorithm execution based on local error measurements.It produces a graph that describes the topology of trained data and each vertex corresponds to a neuron in which data has been mapped.


It is a sub-category of pose estimation that is focused on hands. It has applications like sign language detection, translation and generation, gaming, animations etc. It is done using generative methods like cylindrical and polygon models, and discriminative methods like classifiers.


Another type of activation function in which the independent variables of magnitude greater than 1 is assumed as 1 and less than -1 as -1. The function can be mathematically expressed as, f(x) = { 1 (if x>1), -1, (if x<-1), x (otherwise)


Heterogeneous face recognition is a subcategory of face recognition tasks that are designed to work with facial imagery taken in different contexts and domains. This has wide applications from a law enforcement perspective - e.g. police sketch to face recognition. This requires specific techniques to detect and account for difference between modalities to be effective.


Some time series data have an increasing variance over time. This is called heteroscedasticity. Modern AI algorithms like LSTM are better equipped to handle heteroscedastic data over traditional statistical time series forecasting.


A hidden Markov model is similar to a dynamic Bayesian network.It models the system by assuming it as a Markov process with hidden states.As like hidden Markov model the state will not be directly visible in HMM but the output, dependent on the state will be visible.Hence the sequence of output tokens obtained from HMM gives information about the states. It finds application in reinforcement learning and pattern recognition.


Hierarchical clustering is a clustering algorithm that build hierarchy of clusters like a tree either by splitting or merging them successively.The Agglomerative Clustering performs clustering in a bottom up approach in which each observation starts in its own cluster and later the clusters are successively merged together based on the linkage criteria.


This is a challenging task that predicts human poses and has specialized applications in robotics and sports. Human pose forecasting can be done by capturing human pose estimation to train sequences for architectures like RNNs and LSTMs that work well with sequence data for prediction.


Hyperparameter contains the variables that govern the training process.Hyperparameter optimization or tuning is the process of choosing a set of optimal hyperparameters for a learning algorithm such that it minimizes the loss function.


It is an image classification task that uses hyperspectral imaging (electromagenetic spectrum imaging) data to create classification maps that can identify constituent materials. These tasks enable advanced scientific research like environmental remote sensing, surface mineral maps, vegetation identification, etc.


Image augmentation is a subset of data augmentation techniques for image data. In addition to creating more data, image augmentation also prepares models to deal with noisy data in real-life settings. Common techniques include flipping, rotation, and translation while more advanced techniques utilize conditional Generative Adversarial Networks.


Drift can be monitored in image data as well as numerical data. By taking images as 3D matrices of RGB values, they can be subject to the same data tests as numerical data to compare live images against reference images.


It is the process of generating images by using existing datasets. Image generation is achieved by using Generative Adversarial Networks and their derivative neural architectures. This is used for applications like designing and engineering.


IIt is a type of image modification task that fills in missing parts of an image to create a complete image. This is enhanced with machine learning and has applications like restoring artwork and enhancing images. It has sub-classifications like face inpainting, photograph inpainting etc that deal with specific problem spaces.


It is an image manipulation problem where machine learning is used to learn the relation between input and output images. This relation is used to modify other images in a similar patter to original training set. For e.g. it is used widely in photo morphing applications like converting photos to painting-styles, changing seasons on nature scenes, etc.


It is a derived form of the Kolmogorov-Smirnov Test that utilizes the concept of a sliding window of last n instances of data. This method takes less time than the Kolmogorov-Smirnov Test while being just as effective.


It is specific subset of image segmentation task that requires specifically categorizing each instance of an object in images. It is distinct from semantic segmentation in the sense that semantic segmentation identifies all pixels by class to which they belong (e.g. cars in a street) while instance segmentation recognizes specific instances of an object from each other (each different car).


It is the process of estimating the value of a function for an intermediate value of the independent variable.Generally, it is the approximation of a complicated function by a simple function.


Isometric Mapping is one the earliest approach for manifold learning.Isomap is a widely used low dimensional embedding method which extends multidimensional scaling by incorporating the geodesic distances induced by neighborhood graph.Geodesic distance is the sum of edge weights along the shortest path between two nodes.The Isomap algorithm comprises of determining the neighborsof each point, constructing a neighborhood graph,computing the shortest path between two nodes and then computing the lower-dimensional embedding.The connectivity of data point in the neighborhood graph is defined as its nearest k Euclidean neighbors in the high dimensional space.


Kruskal’s multidimensional scaling is non-parametric relationship between the a non-metric dissimilarities MDS in which and the Euclidean distances between items, and the location of each item in the low-dimensional space is found using isotonic regression. The non-metric MDS algorithm comprises of finding the optimal monotonic transformation of proximities and the optimally arranging the configuration.


Principal component analysis can be implemented for non linear mapping of data by using kernel trick.Since large data set may yield a large kernel, the datasets are clustered and the kernel is populated with the means of those clusters.This may also lead to large kernel however only the top P eigen values and eigen vectors of kernel is computed.


A K-dimensional tree is a data structure used for organizing points in a space with k dimensions.It finds application in range search and nearest neighbor search.KD tree is a binary search tree where data in each node is a k-dimensional point in space which recursively partitions the parameter space along the data axes.A non leaf node in KD tree splits the space into two half spaces in which pints with smaller value than the node will be placed in the left subtree and larger to the right subtree.The process is repeated until the last trees are composed of only one element.


Keyword extraction is the process of extracting relevant keywords from the document that best describes the subject of the document. This algorithm relies on co-occurrence statistical information.


K-Means is a unsupervised learning algorithm which find applications in clustering problems.The algorithm work iteratively to assign each data point to one of the cluster based on the features.The observation with nearest mean belong to the same cluster. The number of clusters will be set priori and based on the centroids of clusters which hold the collection of feature values, the new data can be labeled. Finding an exact solution to the K-means problem is NP-hard, a standard approach to find an approximate solution is usually employed.


Kolmogorov-Smirnov Test is a statistical method to check if a dataset is sampled from a reference data set. This test can be used for drift monitoring. If live data matches the requirement of a K-S test, it indicates that there is little data drift.


Kriging interpolation is a method of interpolation in which interpolated values are modeled by using Gauss-Markov Theorem based on the assumptions on covariances. They are implemented on data points which are irregularly distributed in space.Kriging can either be an interpolation or a fitting method.The main objective of kriging is to estimate the value of an unknown real-valued function.


It is a method of drift monitoring by comparing the probability distributions of training and live data. Like Wasserstein test, it measures the distance between probability distribution functions of data.


L1 regularization adds a penalty equal to the sum of the absolute value of the coefficients so that network parameter can be shrinked to zero from getting too big in one dimension. L2 regularization adds a penalty equal to the sum of the squared value of the coefficients and force the parameters to be relatively small. Hence by adding a regularization term The network generalization ability is improved thereby preventing the overfitting issues.


A stemming algorithm in which a single table of rule is used for stemming the suffixes.Each rule specify the removal and replacement of an ending . Lancaster stemmer is very strong and aggressive and its implementation allows the user to use customized rules.


Laplace interpolation is a specialized interpolation method for retrieving missing data on a 2D grid. It generates the smoothest interpolant by solving a very sparse linear equations.


Laplacian Eigenmap uses a spectral technique to obtain the low dimensional representation of dataset by preserving the information about local neighborhood in a certain sense.It is insensitive to outliers and noise.The representation map can be viewed as a individual approximation to a continuous map generated from the geometry of manifold. As Laplacian Eigenmap only uses the local distances it is not prone to short circuiting.


Least Absolute Shrinkage and Selection Operator is a penalized regression method used in machine learning for selecting the subset of variables.As the name indicates, it is a shrinkage and variable selection method for linear regression models.The objective of regression is to obtain a subset of predictors and thereby minimize the prediction error. This can be achieved by imposing a constraint on to a model parameter that causes the regression coefficients of the variable to shrink towards zero and then the variables only with non zero regression coefficients are associated with the response variable.


It is defined as the number of neurons in a given layer.For input layer, the number of neurons will be equal to the number of features of input vector and for output, this will be either a single neuron or a number of neuron matching the number of predicted classes.


The issue of function being zero when it is less than 0 is mitigated by leaky relu.It will have a small negative slope of around 0.01.It can be denoted as f(x) = { x (if x>0), 0.01x (otherwise)


It is the rate at which the hyperparameters must be adjusted during optimization in order to minimize the error of neural network prediction.


Leave-one-out cross validation is one of the model validation technique in which a single observation from the original dataset is used for validation and the remaining as training set.The process is repeated until each observation is used once as a validation set.LOOCV is considered computationally expensive as it requires number of models equal to the number of training dataset.


It is basically an identity function where, f(x)=x. It implies that the dependent variable has a direct or proportional relationship with the independent variable.Therefore the input signal passes through the function unchanged.


LDA is based on Bayes decision is preferred when classification is to be done among multiple classes.It assumes a linear relationship between dependent and independent variable considering independent variables have equal variance in each class.LDA is commonly used as a dimensionality reduction technique as a pre-processing step in pattern recognition.The objective is to project feature space onto a lower-dimensional space with good class-separability in order to reduce computational costs.


Linear interpolation is the method of constructing new data points within the range of a discrete set of known data points using linear polynomials.It is also known as Lerp.Lerp is quick and easy but not precise and also the interpolant is not differentiable at the control points.


Linear Regression is a linear approach to model the relation between dependent and independent variables using linear predictor function.Ordinary Least Squares is a method for determining the unknown parameters of a linear regression model.The model specification is that the dependent variable is a linear combination of parameters. OLS chooses the parameters by the principle of least squares and thereby minimizing the sum of square of the difference between values predicted by the model and the true value of the dependent variable. The OLS technique can be applied on different frameworks depending on the nature of the data and the task to be performed.Once the model is constructed the goodness of fit of the model and the significance of the parameters estimated is confirmed.


Locally Linear Embedding has faster optimization compared to Isomap algorithm.It projects the data to a lower-dimensional space preserving the distance within the local neighborhoods.LLE comprises of stages such as computing the neighbors of each data points, weight matrix construction and embedding is encoded in the eigen vectors corresponding to largest eigen values.


Locality-sensitive hashing is an algorithm which finds application in nearest neighbor search used to identify the duplicate or similar documents.It performs probabilistic dimension reduction of data by hashing the input data points into a bucket such that the near data points are mapped in the same bucket while the data points that are far from each other are likely to be in different buckets.Thus making it easier to identify the observation with various degrees of similarity.


Logistic regression is a generalized linear model mainly used for binomial regression.The odds of a certain event occurring is predicted by converting a dependent variable into a logit variable and the logit of the probability of success is then fitted to the predictors.This can be used for categorical prediction by setting a cutoff value.


Loss function is the measure of how good a prediction model does in terms of being able to predict the expected outcome.It is the aggregate of the difference between actual and predicted output over entire dataset.


LSTM networks are recurrent neural network composed of LSTM units.A LSTM unit consist of a cell, an input gate, an output gate and a forget gate where the cell remember the value over arbitrary time interval and gates regulate the flow of information into and out of the cell.They are suited for classifying, processing and making predictions based on time series data.


Machine learning is an application of Artificial Intelligence which aims at getting machines, the ability to learn or generalize from experience without being explicitly programmed. The basic premise is to build a general model that enables to produce sufficiently accurate predictions on new cases. Machine learning Tasks :
➢ Supervised learning: A set of features and their labels will be fed and the objective is to learn from its mapping.
➢ Unsupervised learning: A set of features without any labels will be fed and the goal is to learn from its feature and discover hidden patterns in the data.
➢ Reinforcement learning: Learning by interacting with the environment in which goal is performed.


Machine learning pipelines are systems that automate machine learning deployment workflows like data collection, feature engineering, model predictions, and prediction delivery. Machine learning pipelines are fundamental to MLOps practice and usually the first step in an MLOps platform.


Manifold Learning is a non linear dimensionality reduction technique used to uncover the manifold structure in datasets in order to find a low-dimensional representation of the data.The idea for this algorithm is based on the concept that the dimensionality of datasets is only artificially high.Some prominent approach for manifold learning are LLE, Laplacian eigen maps, LTSA etc.


MaxEnt classifier is a discriminative classifier widely used in Natural Language Processing.It is a technique based on the principle of Maximum Entropy for learning probability distributions of data.As it does not make much assumptions about the features it can be used when there is no prior information about the data is available.In NLP applications the document is represented by a sparse array and determine whether the word exists in the document and categorize it to a given class.


Minimum Entropy Clustering is a iterative algorithm in which the clustering criterion is based on conditional entropy H(C|x).Conforming to Fano’s inequality the cluster label(C) can be estimated with minimum probability error if the conditional entropy is small.The criteria can be generalized by replacing it with Havrda-Charvat’s structural α -entropy.


MAD is the average of all residuals.It gives the average distance between each data point and the mean of the dataset.


It is the sum of the square of the difference between the predicted and actual target variable over all data points divided by the no of data points.


It is the task of recognizing a person using images or video of their eyes, under portable conditions. Using deep neural networks advances the capability of standard recognition and expands its usage into smartphone cameras potentially under different illumination and distance conditions.


Model drift is the degradation of model quality over time. This can occur naturally due to external conditions that change data and their relation to model predictions. Model drift can occur due to concept drift, data drift, feature drift, etc.


A machine learning model lifecycle is the set of processes done by data scientists and MLOps teams to create and use a model for business use. It covers a wider gamut of processes than machine learning pipelines and includes use case identification, data collection, feature cleaning, feature engineering, model design, hyperparameter tuning, deployment and post-deployment monitoring.


Model validations is a set of process intended to evaluate the goodness of fit of the model and the ultimate goal is to produce an accurate and credible model.


It is a gradient descent algorithm in which the learning rate depends on the derivative of current and preceeding step.It computes the gradient as an exponentially weighted average.


It is a sub-class of depth perception tasks which uses monocular images, i.e images taken from a single camera image. To make this possible, cues like texture variations, defocus, gradients, haze, etc. are captured using Markov Random Field learning algorithms.


These are algorithmic techniques that predict frames in a video making them ideal for video compression and decompression requirements. Deep learning has enhanced motion compensation using frame interpolation techniques generated by convolutional neural networks.


Multidimensional scaling is a set of related ordination technique used to visualize the similarities and dissimilarities in a dataset.It is a form of non-linear dimensionality reduction in which each item in the matrix that holds the item-item similarities are mapped to a low-dimension space.The major types of MDS algorithm include classical multi-dimensional scaling, metric multi-dimensional scaling, non-metric multi-dimensional scaling and generalized multi-dimensional scaling.


It is an enhanced version of super-resolution that uses multiple low resolution frames to create an upscaled picture or video. It has the advantage of being more representative of the original data at the cost of needing more images and reference points. It has applications in satellite monitoring, military, and defense.


A Multilayer Perceptron Neural Network is an interconnected web of nodes which are called neurons(mathematical functions), and the edges that can join them together. A neural network’s main function is to receive a set of inputs to perform progressively complex calculations and the use the output to solve a problem. Neural networks develop algorithms on the basis of processing of human brains and builds model for complex patterns and prediction problems. The behavior of an artificial neural network can be decided based on both the weights and input-output function which is the activation function. This functions can be linear, threshold and sigmoid.The neural network with differentiable activation function can be trained using back propogation which will adjust the weights to reduce the error.


Affordance detection is an object detection subgenre that is used to detect objects from their functional point of view, that is, how they will be used. Multiple affordance expands on this concept to simultaneously detect functional use of multiple objects in images. It has applications in enabling robotic process automation.


Mutual information score between two clustering is the measure of the similarity between two labels of the same data.i.e, it measures the dependency between two random variables.


Naive Bayesian is a probabilistic classifier. It is based on applying Bayes theory with strong independence assumptions among features.Naive Bayes often outperform other sophisticated classification methods when the dimensionality of the inputs are high.In general purpose classifiers, without any assumptions about the variable distribution user can fit any distribution of data with various distribution classes.Naive Bayes classifier can be used for document classification in NLP by setting up either multinomial model or Bernoulli model.


It is the task of identifying a moment in video from a query posted in natural language. For e.g. finding the third strike for a baseball batsman with jersey 13 from a match video using a query posted in that phrasing. These complex problems are solved using advanced algorithms like Moment Aligned Network that is trained using single-shot architectures.


NLP is concerned with the development of applications that can process and manipulate large amount of natural data.It is a component of artificial intelligence.Natural language processing include speech recognition, natural language understanding, natural language generation and so on.


It keep the track of the previous layer’s gradient and use it for updating gradient.It first accumulates the previous gradient and then compute the gradient at the current point and make the correction to speed up the SGD.


Neural Gas clustering algorithm is similar to Self-Organizing map and it can be used for clustering related data based on feature vectors.It is an artificial neural network composed on N neurons in which the neurons tends to move around abruptly according to the distance of their reference vectors to the input signal during training, hence the name neural gas.The adaptation step of neural gas can be interpreted as gradient descent on a cost function.


Deep Learning is all about Neural Networks. Structure of neural network is like any other kind of network; there is interconnected web of nodes, which are called neurons(mathematical functions), and the edges that can join them together. A neural network’s main function is to receive a set of inputs to perform progressively complex calculations and the use the output to solve a problem. Neural networks develop algorithms on the basis of processing of human brains and builds model for complex patterns and prediction problems. Neural Networks are used for lots of different applications.


Normalization is the process of scaling the data to a standardized range.It is usually performed in the data pre-processing phase.Data normalization is the process of re-scaling the attributes to the range 0 to 1. it is useful when the data has varying scales. Data Standardization is the process of re-scaling the attributes so that they have mean 0 and standard deviation 1.


It is a set of computer vision tasks that require identifying, locating, and labeling objects in images. It has wide applications in real-world vision and is subdivided into categories like 3D object detection, 2D object detection, video object detection, etc. based on problem space.


It is a preprocessing technique used in object detection tasks. It speeds up object detection tasks and avoids extensive sliding window search of images by using image cues like edges and saliency.


Object tracking, an extension of object detection, is the task of tracking an object's motion across video frames. This has wide applications across industries - from better camerawork in sports, safety regulations in factories to speed traps and traffic analysis. Sophisticated deep learning algorithms like DeepSort, ROLO, and MDNet are some of the common approaches to object tracking.


One Hot Encoding is the process by which the categorical integer features are converted into a sparse matrix that machine learning algorithm can work with.


One-shot (and by extension k-shot) segmentation techniques are image semantic classification architectures that use a single (or a few) reference image along with its pixel annotation to make segmentation. These approaches are generally more prone to overfitting.


It is the combination of semantic and instance segmentation to power real-world vision systems that are as competent as humans. It is the most resource-exhaustive task in image segmentation and requires specialized metrics for performance measurement that capture predictions for all classes.


It is a linear model used for binary classification with a simple input-output consists of a step function with a threshold value that outputs a real-valued single binary value depending on the input and associated weights.


An algorithm that takes a collection of sentences and extract all n-gram phrases corresponding to the MaxNGramSize.


An algorithm used for stemming the words for information retrieval.It is the process of reducing the derived words to the word stem in which the suffixes are removed automatically.It will reduce the size and complexity of data which is always advantageous.Porter stemmer algorithm is applied sequentially in five phases in which the conditions are tested in each phase and the suffix is removed accordingly as the rule fires.


Pose estimation is a machine learning task that identifies the position, boundaries and pose of a person. It has applications in animation, sports, and gaming among other fields. Pose estimation uses techniques like joints/parts (keypoints) detection and deep learning architectures like Hourglass Networks, DeepPose, Convolutional Pose Machines, etc.


It is a subset of image manipulation tasks achieved with deep learning. The aim is to translate the pose of a person from one type to another. The best quality is achieved by conditioning GANs(specific architectures like PG2 ) on reference images and specified poses. It has applications in fashion, video generation, and film making.


A part of speech tagging is an algorithm that markup a word in the text corresponding to a particular part of speech based on the relationship with adjacent and related words in a corpus.


It is defined as the ratio of number of correct positive predictions to the total number of positive predictions.It is also known as positive predictive value or PPV. PPV = (TP/TP+FP) Prediction reaches its best value at 1.0 and worst value at 0.0.


Principal component analysis is a linear technique used for dimensionality reduction.It linearly transforms the data to a lower dimensional space such that it reconstructs the data with set of correlated variables into a smaller set of uncorrelated variables called principal components based on least-squarecriterion.It can also built to serve the purpose of data compression and to identify the potential clusters in data.


Probabilistic PCA is a technique used for dimensionality reduction using a latent variable model with linear relationship.PPCA algorithm is preferred to handle missing data..It can be expressed as the maximum likelihood solution of a probabilistic latent variable model.


Quadratic Discriminant Analysis is a classifier with a quadratic decision surface.QDA models the conditional probability density functions as a Gaussian distribution.Gaussian parameters can be estimated using machine likelihood estimation. Then posterior distribution can be used to obtain the class of the given data.In QDA the covariance of each of the class need not be identical.


RBF network uses radial basis function as activation functions and they mainly act as function approximators.It can be represented as an approximating functions which is the sum of radial basis functions associated with different center and weight coefficients.RBF networks can be trained in two step algorithm, by choosing the center vectors of the RBF functions in the first step and then a linear model with coefficients is simply fit to the hidden layers output.It can also be used for time series prediction and control.


Random Forest is an ensemble learning method for classification and regression.It consists many decision trees that can build classification and regression models and the outputs are obtained by accounting the majority votes of individual decision trees.The training set will be randomly sampled for growing the tree.The bagging idea and the random selection of features in order to construct a collection of trees with controlled variance, improves the stability and accuracy of machine learning algorithm.


Rand Index is the measure of the similarity between two data clusterings.It is defined as the number of pairs of sample in the same or different clusters divided by the total number of pairs of samples.Its value ranges from 0 to 1 where ) indicates the two data clustering do not agree on any pair of points and 1 indicates the clusterings are exactly same.


Random projection is a simple and less erroneous technique used for dimensionality reduction.The idea behind random projection is that the points of high dimension in a vector space can be projected to a less-dimensional space without distorting the distance between the points.Since this reduced dimension is too high the dimension is again reduced for the case of mixtures of Gaussians.Therefore, it is a promising dimensionality technique for learning mixtures of Gaussians.


Radial Basis Function is a primary tool for interpolating multidimensional scattered data.RBF is a real valued function whose value depends on the distance from origin.RBF interpolation can be represented as an approximating function which is the sum of the N radial basis function associated with different center and weight coefficients. Commonly used radial basis functions are Gaussian, Multiquadric, Inverse quadratic,Inverse multiquadratic, Polyharmonic spline,Thin plate spline.


It is an enhancement to semantic segmentation - the classification of different entities in an image correctly using fewer resources with greater computational efficiency. This makes it applicable for real-time situations where quick results are needed.


Recall is the ratio of relevant instances that have been retrieved to the total number of relevant instances. In information retrieval area sensitivity is called recall.


Receiver operating characteristic is a graphical representation created by plotting the true positive rate against the false positive rate at different threshold values It illustrates the performance of a binary classification models at different classification thresholds.


It belongs to the family of feed forward neural networks in which the information are send over time-steps.RNN draws each vector from a sequence of input vectors and model them one at a time thus allowing the network to retain its state while modeling.RNN can be used to produce predictive result for sequential data where the information cycles through a loop and takes decision considering the current inputs and the information obtained from previously learned inputs.


Recursive autoencoder takes a sequence of representation vectors and reconstruct the input so that a reduced dimensional representation of that sequence is obtained.Recursive autoencoder can be used to split sentences into segments for NLP.


Recursive neural network is composed of a shared-weight matrix and a binary tree structure that allows the network to learn varying sequences of texts or image.It can be implemented as a sentence and scene parser.


Recursive neural tensor network is a supervised neural network that computes the supervised objective at each node of the tree.The tensor is used to calculate the gradient using a matrix of three or more dimensions.Recursive neural tensors can be used to break up an image into its composing objects and label the objects semantically.


Regression models are used to make predictions from data by understanding the relations between data and some observed,continuous-valued response.Regression is all about predicting a continuous quantity by approximating a mapping function from input variable to continuous output variable. It is usually used in applications like stock price predictions


Like decision trees regression trees can be learned by performing recursive partitioning.It is a decision tree for regression. The main problem associated with regression and classification tree is their high variance . But it can handle both numerical and categorical data.


Regularization is a hyperparameter that help to modify the gradient to minimize the overfitting problems.Therefore, it is a measure taken against overfitting so that the neural network can generalize well on over all new inputs.


In LDA and FLD the small eigen values will be sensitive to choose exact training data. There arises the need of RDA that regularizes the covariance matrix of each class and allows the covariance of QDA to shrink towards a common variance as in LDA.The regularization factor α determines the complexity of the model. When α is one RDA is equivalent to QDA and when zero it is equivalent to LDA.


Relevance ranking algorithm finds its application in text indexing and retrieval is a method used to order the results list in such a way that the most relevant record will be listed first.


An activation function in which neuron is activated when the input is above a threshold value in which the independent variable has a linear relationship with dependent variable.While the input is below zero the output is approximated as zero.Mathematically expressed as, f(x) = { x (if x>0), 0 (otherwise)


A residual sum of square is a statistical technique used to measure the variance in the dataset which is not explained by the regression model.It can also be defined as the measure of the discrepancy between regression function and the dataset.


Restricted Boltzmann machine are mainly used in deep learning for feature extraction and dimensionality reduction.RBM consists of visible layer and hidden layer in which they are connected by connections with associated weights and no units of same layer are connected.RBMs are used for pretraining layer in large networks to reconstruct the original data from a limited set of sample.


RD-based pose estimation is a subtype of pose estimation task that uses radio and WiFi frequencies that can travel between walls. This allows it to predict human poses through walls and opaque objects. This pose estimation technique can get as accurate as vision-based techniques and has applications in military and defense sectors.


Ridge Regression is a technique for analyzing multiple regression data which possess multicollinearity in which one predictor variable can be linearly predicted from others with a considerable accuracy.It provides a regularization method to ill-posed problems by shrinking the coefficients by adding a degree of bias to the regression estimates.


RMSprop is a gradient based optimization algorithm which resolves the issue of diminishing learning rates in AdaGrad.It divides the learning rate by an exponentially decaying average of squared gradients.


Root mean square is the square root of mean squared error.It aggregates the magnitude of the errors associated with predictions into a single measure of predictive power.RMSE is useful when large errors are particularly undesirable.


Salient Object Detection is the machine learning task of identifying important elements in an image or video and distinguishing it from natural and background scenes.


Sammon’s mapping is an iterative algorithm used for multidimensional scaling.It projects the high dimensional space into a low dimensional space by retaining the structure of inter-point distance as in high dimensional space.The Sammon’s mapping can also be used to isometrically project an object into a lower dimensional space and in other case it can be used to project it down to reduce the distortion in the inter-point distances and thereby limit the change in the topology of the object.


Scene segmentation is a specialized application of semantic segmentation that targets specialized problems related to solving background separation and partial segmentation issues. Methods and learnings vary between types of datasets - for e.g. histopathology presents different challenges than natural scenes or satellite images.


Self-supervised Images Classification is a paradigm of image classification that allows machines to generate labels automatically for images. It is done by creating relevant pretext tasks that enable the algorithm to transfer this learning and create labels automatically. This is particularly useful for image classification where labelled image data is difficult to find.


A computer vision task that tags and identifies specific portions of images for what they are. For example, in a picture with a man, a cat, and a dog, the goal of this task is to identify each entity correctly. Semantic segmentation is achieved by an advanced application of Image classification.


It is the task of image classification that uses both labelled and unlabelled images. This is the preferred method of learning when there is a dearth of labelled image data. They can use a variety of approaches like Generative Methods, Discriminative Methods, Similarity-based approaches, etc.


Sensitivity or true positive rate is the ratio of the correct positive to the total number of positives. SN = (TP/TP+FN) The best sensitivity is 1.0 and worst is 0.0.It is the statistical measure of the performance of binary classification test.


In NLP tasks the input text has to be divided into sentences.This algorithm is a simple sentence splitter for English that identifies the boundary of sentences and return the text as list of string where each string corresponds to a sentence.


Sequential Information Bottleneck algorithm is a technique used to cluster the co-occurrence data such as text documents vs words.It randomly draws a document from the cluster and finds a new cluster for it by minimizing a merging criterion.It is preferred to employ unweighted Jensen-Shannon divergence as the criterion.


It is a special case of normalized radial basis function interpolation.It is developed for interpolation to arbitrarily spaced discrete bivariate data.Shepard interpolation is widely used because of its simplicity and also It is fast ,simple and perform efficiently for quick and dirty applications.


An activation function which is a special case of logistic function in which the extreme values and outliers in the data can be reduced without removing them. Sigmoid function converts the independent variables of near infinite range into simple probabilities between 1 and 0. The function can be expressed as, f(x) = 1/(1+exp(-x))


It is defined as the ratio of signal strength to the noise, where signal strength is characterized by the difference in class-conditional means ad noise as the difference in class-conditional standard deviations. i.e, SNR= |μ 1 - μ 2 | (σ 1 + σ 2 ) class2, σ 1 and , where μ 1 and μ 2 are the mean value of the variables in class 1 and σ 2 are their standard deviations. SNR is a feature ranking metric which can be used as benchmark for feature selection in binary classification. Larger the value of SNR better the features for classification.


It is a sub-genre of object detection tasks that require identifying smaller objects in a large images. Due to its complexity, it requires additional techniques like focal loss, different anchor sizes, or architectures like Feature Pyramid Networks.


Smile recognition systems can help photography applications on smartphones, cameras and in attribute tagging. Advanced methods of smile detection use optical flow tracking of mouth corners, face alignment and horizontal weighting to account for photos of any orientation in real-time.


Softmax function is an activation function which can be applied to continuous data.It can contain multiple decision boundaries and can handle multinomial labeling systems.Softmax function is often used at the output layer of a classifier.It can be represented as, f(x_i )=exp(x_i )/(∑( j=0 to k) exp(x_j ), i=0,1,2,..,k


An activation function which is a smooth version of RELU.It also overcomes the issue of Dying RELU by making itself differentiable everywhere and causes less saturation overall.Mathematically expressed as, f(x) = log(1+e^x)


An activation function for neural networks as an alternative to hyperbolic tangent.The softsign function converges polynomial whereas tanh converges exponentially.It can be mathematically expressed as, f(x) = x / (1+|x|)


It is a technique for image classification that uses learned redundant dictionaries and scattered (sparse) representations or classification of each pixel. This is done by associating a pixel with a patch of information adjoining it which are trained into dictionaries.


Specificity(SP) or true negative rate evaluates the performance of binary classification by calculating the ratio of correct negative to the total number of negatives. SP = (TN/TN+FP) Specificity has its best value at 1.0 and worst at 0.0.


Spectral Clustering is a data clustering algorithm which makes use of the eigen values of the similarity matrix of the data to perform dimensionality reduction.The main objective of spectral clustering is to cluster connected data.The spectral clustering algorithm works by obtaining the data representation in the low-dimensional space that can be easily clustered.


In time series data, stationarity means that the statistical features of data like mean, variance, etc. do not change over time. It is usually a good practice to convert non-stationary data to stationary data for the purpose of predictions wherever possible.


This is a subcategory of depth estimation tasks which uses two images, separated by a distance, to calculate depth of objects. It uses disparity maps between images and neural net architectures that combine different inputs and upsampling to create effective solutions.


It is a stochastic approximation of gradient descent optimization to optimize the loss function by assuming the batch size as one.As it is a stochastic approximation a single example is selected randomly to calculate the gradient at each iteration.


Sum squares ratio can be used as a feature selection criterion for multi-class problems. It is a uni-variate feature ranking metric which can be defined as the ratio between groups to within groups.


It is the process of upscaling images to a higher resolution than their original native resolution. Super-resolution uses techniques like image repair, image inpainting, and Generative Adversarial Networks - which are highly effective for super resolution tasks.


Support Vector Machines is an algorithms used for classification analysis. SVM can perform linear classification by choosing a hyperplane that acts as a margin between two classes.In addition to that it can perform non-linear classification by using kernel trick which allows the algorithm to fit the maximum-margin hyperplane in a high dimensional feature space. The effectiveness of SVM depends on the Selection of kernel function, its parameters, and soft margin penalty parameter. Multi-class SVM can be created by reducing it to multiple binary classification.


Support Vector Regression is used as a regression method that uses the same principle as SVM for classification.It is an effective tool to estimate real value function by setting a margin of tolerance ε approximately.Like SVM it also uses kernel trick for implicit mapping and the potency of the model also depends on the ε,loss function error threshold .


Talking Face Generation is the task of creating facial animations that correspond to the syllables uttered in an audio clip. This can be applied in motion pictures to improve voice-over and dubbing technology, in games & animated films for realistic facial expressions. It can be created by disentangling subject and speech information using Adversarial network as part of the training process.


It is an activation function used in deep learning.Tanh is a hyperbolic trigonometric function defined by, f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)) Tanh have a normalized range between -1 and 1 and therefore it can deal with negative numbers.


t-digests are sparse representations of cumulative distribution functions that contain most of the relevant information contained in them. They are serializable, smaller, and programmatically convenient structures for machine learning work.


It is the task of finding a sequence or an action - using either keywords or images - in a long, untrimmed video. This has great potential in keyword-based search of video content - for stock footage, for movies, sports, or consumer convenience in streaming. An effective method to achieve this is to use a chain of 3D Convolutional Neural Networks for identifying candidate segments, a classification network for coarse-tuning and a localization network learner for fine tuning.


Augmenting text data requires more transformative techniques than image data. Common techniques include using synonyms and back translations from various languages to enhance text information. Advanced techniques utilize machine learning algorithms like k-nearest neighbours and cosine similarity to find replacement words, text generation, and contextualized word embeddings.


Text-to-image generation is one of the most advanced application of deep learning architectures, specifically Generative Adversarial Networks. It is achieved by combining interpolated word embeddings to expand datasets and training images.


Any data of a variable that is collected over a period of time is called time series data. Analysis of time series data generally requires decomposition into trends, seasonality, and residuals in addition to concepts like autocorrelation and stationarity.


It is the breakdown of time series data into a trend (which describes the overarching direction of the data), seasonality (which describes cyclicality of data), and residuals (which describes the remnant of the data which lacks an obvious pattern). Decomposition is used to simplify time series predictions.


Assuming the text is splitted into sentences tokenizer segments the sentences into individual tokens.i.e, it receives a stream of characters and outputs a stream of tokens.


Transfer Learning is the process of using a model built for a task to build a different model for a more specialized application of the task. This is done by removing the final output layer but retaining the neural structure and weights for previous layers before joining the transferred architecture to the new architecture.


The t-distributed stochastic neighbor embedding is a non-linear dimensionality reduction technique in which data can be embedded into a lower dimensional (2D or 3D)space, which can then be visualized in a scatter plot. The dimensionality reduction is done in such a way that the similar objects are modeled by nearby points and dissimilar objects by distant points. The t-SNE algorithm comprises of two stages such as constructing a probability distribution over pairs of high-dimensional objects and then defines a similar probability distribution in the low dimensional map.


Unsupervised classification is the process of categorizing images without preset labels for training data. The main approaches for this use unsupervised machine learning algorithms like dimensionality reduction and clustering algorithms.


anishing gradients is a common problem that is encountered when training neural networks. This occurs when gradients at each step of training become too small and severely impact the training speed of the network. This can usually be corrected using different structures, initiations, and activation layers.


Variational autoencoder consists of a encoder, decoder and loss function where, encoder and decoder are neural nets and loss function is the negative likelihood with a regularizer.They use a variational approach for latent representation learning makes them useful for generative modeling.


Data can be loaded using a data-preprocessing library which makes building of data pipelines easier. It converts the data which is loaded into a format that neural networks can understand. It is designed to support all major types of input data whether it can be text data, CSV, audio, image or video.


Video denoising tasks require the removal of noises like color distortions, choppiness, etc. to create better quality output. Video denoising with deep learning is still an emerging field in deep learning with new advances like FastDVDnet that allow it to handle fast runtimes and wider range of noise levels.


Video deinterlacing is the conversion of interlaced videos into progressive/non-interlaced structure to cut down on flickering. Deep Convolutional Neural Networks can reconstruct missing scanlines from both odd and even frames to create a better picture quality than conventional methods in real-time operations.


It is the computational task of enhancing the frame rate and recovery in video streaming by synthesizing frames in the middle of two frames. Deep learning - especially auto-encoders - makes this process better by making better representative synthesis than is possible using conventional methods.


Video generation is one of the most complex tasks in the deep learning problem space. Usually these tasks are centered around generating video from text descriptions. It utilizes a chain of models in which static and dynamic information is first extracted from text to create images, which are then merged together into videos. Advanced methods use hybrid frameworks that combine Variational auto-encoders with Generative Adversarial Networks to create effective solutions.


ideo prediction is the use of Deep Learning methods to predict outcomes of events from video footage. Its applications are in long-term planning, video interpolation, anomaly detection, pedestrian tracking for automated driving, precipitation nowcasting, etc.


Video recognition is a subcategory of video retrieval tasks which require finding the original video using a clip or a shorter duration video. Similar to video retrieval, it uses a combination of tasks like indexing, understanding and clustering on video content to achieve this objective.


It is a set of tasks whose goal is to find and return videos using images, frames or text as search keywords. Video retrieval is achieved by combining tasks like visual, camera motion, and semantic indexing, video understanding, and clustering. It has applications in media industry like stock footage search, media management, etc.


Upscaling or video superresolution is a more demanding task than its counterpart image superresolution. It utilizes convolutional networks to combine multiple low-resolution frames and creates upscaled video. Advances in this field use inferred high resolution images to create computationally efficient results. It is used in gaming devices, streaming devices to upscale native resolutions to higher quality.


Video synchronization, the matching timestamps of audio to video, is one of the emerging applications of deep learning which great potential. Deep convolutional neural networks are used for scene understanding while algorithm's like Dijkstra's shortest path are used for video alignment to solve these challenges.


It is an enhancement of video classification tasks that goes beyond classification and instead into finding the semantic context of a video. Applications of video understanding are more advanced. For e.g. where video classification would classify an action as suspicious/not, video understanding would describe the actions and provide summaries. It is done by chaining together multiple semantic classification tasks and uses advanced architectures like Temporal cycle-consistency learning.


It is a type of video translational task that translates a video in one form to another. This has applications in the fields of animation, movie-making and gaming among others. In addition to the challenges of image-to-image translation, video synthesis requires understanding temporal dynamics to create effective solutions. GANs are one of the most effective methods of solving these tasks.


It is a measure of the distance between two cumulative distribution functions. In machine learning, it is commonly used to detect data drift and is also known as Kantorovich–Rubinstein metric.


Weakly Supervised Semantic Segmentation is a subcategory of semantic segmentation which utilizes image-level labels instead of point- or pixel-level labels for segmentation. These images lack positional information and generally perform well on known datasets than unseen ones.


Weights are coefficients that scale the input signal to a given neuron in the network.It can also be defined as a coefficient of feature in a linear model or the edges in a deep network.The value of weight determines whether the feature contribute to the model.


White test is a statistical test to determine the heteroscedasticity of time series data. Using statistical tests is simpler and more effective than plots when working with many levels of time series data.


X-Means clustering is a variation of K-Mean clustering algorithm in which the number of clusters are determined automatically based on Bayesian Information Criterion(BIC) score.The clustering process starts with a single cluster and is repetitively partitioned, keeping the optimal resultant splits until the criterion is reached.


It is the sub-group of object detection tasks which center around identifying objects that have never been encountered in training data. It is done by using visual-semantic embeddings and trying techniques like dense sampling of semantic labels with a huge library of categories.


Zero shot learning is a problem space of machine learning which requires models to classify object classes it has never been trained on - i.e live as model operates. These methods utilize transfer learning in intelligent ways to design models that are capable of zero-shot learning. They have wide-ranging uses - especially in use-cases which lack sufficient training data and expect variation in real life.