| ACCAMS | 65 | 1 |
| ADADELTA | 149 | 1 |
| Abscissa | 121 | 0 |
| AdaBoost | 194 | 0 |
| Additive clustering | 0 | 0 |
| Additive model | 0 | 1 |
| Adversarial Variational Bayes | 0 | 1 |
| Adversarial autoencoder | 0 | 1 |
| Affine space | 0 | 0 |
| Affinity analysis | 0 | 1 |
| Affinity propagation clustering | 0 | 1 |
| Alternating conditional expectation (ACE) algorithm | 0 | 1 |
| Antonym | 0 | 0 |
| Association rule mining | 0 | 2 |
| Attention Mechanism | 0 | 0 |
| Autocorrelation matrix | 0 | 1 |
| Average pooling | 0 | 0 |
| Backprop | 0 | 0 |
| Backpropagation | 159 | 2 |
| Backpropagation Through Time (BPTT) | 0 | 1 |
| Bayesian Probabilistic Matrix Factorization (BPMF) | 0 | 0 |
| Bayesian optimization | 0 | 0 |
| Bias | 0 | 2 |
| Bidirectional LSTM | 0 | 0 |
| Bidirectional Recurrent Neural Network (BRNN) | 0 | 0 |
| Bilingual Evaluation Understudy (BLEU) | 0 | 0 |
| Binary Tree LSTM | 0 | 1 |
| Black-Box optimization | 0 | 0 |
| Boltzmann machine | 0 | 1 |
| Categorical mixture model | 0 | 0 |
| Child-Sum Tree-LSTM | 0 | 1 |
| Chinese Restaurant Process | 0 | 0 |
| Clustering | 0 | 0 |
| Clustering stability | 0 | 0 |
| Co-clustering | 0 | 0 |
| Collaborative Topic Regression (CTR) | 0 | 0 |
| Collaborative filtering | 0 | 0 |
| Community detection | 134 | 0 |
| Community structure | 0 | 1 |
| Conditional GAN | 0 | 0 |
| Conditional Markov Models (CMMs) | 0 | 0 |
| Conditional Random Fields (CRFs) | 0 | 1 |
| Confusion matrix | 0 | 0 |
| Connectionism | 0 | 1 |
| Constituency Tree-LSTM | 0 | 1 |
| Contextual Bandit | 0 | 2 |
| Continuous-Bag-of-Words (CBOW) | 190 | 0 |
| Contractive autoencoder (CAE) | 0 | 0 |
| Convex optimization | 0 | 0 |
| Convolutional Neural Networks (CNN) | 0 | 3 |
| Cosine similarity | 0 | 0 |
| Covariance | 0 | 0 |
| Covariate shift | 0 | 1 |
| Cross-Entropy loss | 0 | 0 |
| Decision tree | 184 | 1 |
| Deep Learning | 148 | 1 |
| Denoising autoencoder | 0 | 0 |
| Dependency Tree LSTM | 0 | 1 |
| Derivative-free optimization | 0 | 2 |
| Differential Evolution (DE) | 0 | 1 |
| Differential Topic Modeling | 0 | 0 |
| Dirichlet process | 0 | 0 |
| Dirichlet-multinomial distribution | 0 | 1 |
| Domain adaptation | 0 | 1 |
| Dynamic k-Max Pooling | 0 | 1 |
| Early stopping | 0 | 0 |
| Error-Correcting Tournaments | 0 | 1 |
| Expectation | 0 | 0 |
| Expectation-maximization (EM) algorithm | 0 | 1 |
| Exploding gradient problem | 0 | 0 |
| Exponential Linear Unit (ELU) | 0 | 2 |
| Fast Fourier transform (FFT) | 0 | 1 |
| Fast R-CNN | 0 | 1 |
| Feature learning | 0 | 0 |
| Finite-state transducer (FST) | 0 | 1 |
| Gap statistic | 0 | 1 |
| Gaussian mixture model (GMM) | 0 | 1 |
| Generalized additive model (GAM) | 0 | 1 |
| Gibbs sampling | 0 | 0 |
| GloVe (Global Vectors) embeddings | 91 | 2 |
| Global Average Pooling (GAP) | 0 | 1 |
| GoogLeNet | 0 | 0 |
| Gradient Clipping | 0 | 2 |
| Graph | 0 | 0 |
| Graph Neural Network | 0 | 1 |
| Grid search | 0 | 0 |
| Hamming distance | 0 | 0 |
| Helvetica scenario | 0 | 0 |
| Hessian matrix | 0 | 0 |
| Hessian-free optimization | 0 | 3 |
| Hidden Markov Models (HMMs) | 0 | 0 |
| Hierarchical Dirichlet process (HDP) | 0 | 0 |
| Hierarchical Latent Dirichlet allocation (hLDA) | 0 | 0 |
| Hierarchical Softmax | 0 | 0 |
| Hypergraph | 194 | 3 |
| Hypernetwork | 0 | 1 |
| Hypernym | 0 | 0 |
| Hyperparameter | 0 | 0 |
| Hyponym | 0 | 0 |
| Identity mapping | 0 | 0 |
| Importance sampling | 0 | 0 |
| Inception | 152 | 7 |
| Indian Buffet Process | 0 | 1 |
| Jacobian matrix | 0 | 0 |
| K-Means clustering | 0 | 0 |
| Kernel (convolution) | 0 | 0 |
| Kullback-Leibler (KL) divergence | 0 | 0 |
| Laplacian matrix | 0 | 0 |
| Latent Dirichlet allocation (LDA) | 0 | 0 |
| Latent Semantic Indexing (LSI) | 0 | 3 |
| Latent semantic analysis (LSA) | 0 | 0 |
| Learning To Rank (LTR) | 123 | 2 |
| Learning rate | 0 | 0 |
| Learning rate annealing | 0 | 0 |
| Learning rate decay | 0 | 2 |
| Lexeme | 0 | 0 |
| Likelihood | 195 | 0 |
| Linear discriminant analysis (LDA) | 0 | 0 |
| Loss function | 0 | 0 |
| Market basket analysis | 0 | 0 |
| Markov Chain Monte Carlo (MCMC) | 0 | 0 |
| Max Pooling | 0 | 0 |
| Max-margin loss | 0 | 0 |
| Maximum A Posteriori (MAP) Estimation | 0 | 0 |
| Maximum Likelihood Estimation (MLE) | 0 | 0 |
| Maxout | 0 | 1 |
| Mention-pair coreference model | 0 | 1 |
| Mention-ranking coreference model | 0 | 1 |
| Meronym | 0 | 0 |
| Meta learning | 0 | 1 |
| Mini-Batching | 0 | 0 |
| Minibatch Gradient Descent | 0 | 0 |
| Minimal matching distance | 0 | 0 |
| Minimum description length (MDL) principle | 0 | 1 |
| Mixed-membership model | 0 | 0 |
| Model averaging | 0 | 0 |
| Model compression | 0 | 0 |
| Moore-Penrose Pseudoinverse | 0 | 0 |
| Multi-Armed Bandit | 0 | 0 |
| Multidimensional recurrent neural network (MDRNN) | 0 | 2 |
| Multilayer LSTM | 0 | 0 |
| Multinomial distribution | 0 | 0 |
| Mutual information | 0 | 1 |
| N-ary Tree LSTM | 0 | 1 |
| Named Entity Recognition (NER) | 0 | 1 |
| Narrow convolution | 0 | 0 |
| Natural Language Processing | 0 | 0 |
| Negative Log Likelihood | 0 | 2 |
| Negative Sampling | 0 | 0 |
| Nested Chinese Restaurant Process | 0 | 0 |
| Neural network | 0 | 0 |
| No Free Lunch (NFL) theorem | 137 | 2 |
| Nonparametric | 0 | 1 |
| Nonparametric clustering | 0 | 0 |
| Nonparametric regression | 0 | 1 |
| Object detection | 0 | 0 |
| One-dimensional convolution | 0 | 0 |
| Optimization | 0 | 0 |
| PageRank | 0 | 1 |
| Parameter sharing | 0 | 0 |
| Parametric clustering | 0 | 0 |
| Passive-Aggressive Algorithm | 0 | 2 |
| Pertainym | 0 | 0 |
| Pitman-Yor Topic Modeling (PYTM) | 0 | 0 |
| Pixel Recurrent Neural Network | 0 | 1 |
| Point Estimator | 90 | 0 |
| Pointwise Mutual Information (PMI) | 0 | 0 |
| Poisson Additive Co-Clustering (PACO) | 0 | 1 |
| Policy Gradient | 0 | 1 |
| Polysemy | 0 | 0 |
| Positive Pointwise Mutual Information (PPMI) | 0 | 0 |
| Principal Component Analysis (PCA) | 0 | 0 |
| Probabilistic Latent Semantic Indexing (PLSI) | 0 | 0 |
| Probabilistic Matrix Factorization (PMF) | 0 | 0 |
| Pólya urn model | 0 | 1 |
| Q-learning | 0 | 0 |
| R-CNN | 0 | 2 |
| REINFORCE Policy Gradient Algorithm | 0 | 1 |
| RMSProp | 0 | 0 |
| Rand Index | 0 | 0 |
| Random Forest (RF) | 0 | 0 |
| Random optimization | 0 | 1 |
| Random search | 0 | 0 |
| Receiver Operating Characteristic (ROC) | 0 | 0 |
| Recurrent Neural Network Language Model (RNNLM) | 111 | 1 |
| Recursive Neural Network | 0 | 0 |
| Regression based latent factors (RLFM) | 0 | 0 |
| Regularization | 0 | 0 |
| Reparameterization trick | 0 | 3 |
| Representation learning | 0 | 3 |
| Second-order information | 191 | 0 |
| Sequential Model-Based Optimization (SMBO) | 0 | 1 |
| Sequential pattern mining | 0 | 1 |
| Singular Value Decomposition (SVD) | 0 | 0 |
| Skip-Gram | 0 | 0 |
| Smooth support vector machine (SSVM) | 0 | 1 |
| Sparse autoencoder | 0 | 0 |
| Spearman's Rank Correlation Coefficient | 0 | 0 |
| Stacked autoencoder | 0 | 0 |
| Standard deviation | 0 | 1 |
| Stochastic Gradient Descent (SGD) | 0 | 0 |
| Stochastic Gradient Variational Bayes (SGVB) | 0 | 0 |
| Stochastic Optimization | 0 | 0 |
| Stochastic block model (SBM) | 120 | 0 |
| Stochastic convex hull (SCH) | 0 | 4 |
| Stride (convolution) | 124 | 0 |
| Structured Bayesian optimization (SBO) | 0 | 1 |
| Structured learning | 0 | 2 |
| Tabu Search | 0 | 2 |
| Temporal Generative Adversarial Network (TGAN) | 0 | 1 |
| Temporal classification | 0 | 0 |
| Test term | 122 | 0 |
| TextRank | 139 | 1 |
| Textual entailment | 0 | 0 |
| Time-delayed neural network | 0 | 0 |
| Time-delayed signal | 0 | 0 |
| Transduction | 176 | 1 |
| Triplet loss function | 0 | 0 |
| Troponym | 0 | 0 |
| Trust Region Policy Optimization (TRPO) | 0 | 1 |
| Underfitting | 0 | 0 |
| Unsupervised learning | 146 | 1 |
| Vanishing gradient problem | 0 | 0 |
| Variation of Information distance | 0 | 1 |
| Variational Autoencoder (VAE) | 0 | 0 |
| Weighted finite-state transducer (WFST) | 0 | 1 |
| Wide convolution | 0 | 0 |
| Wronskian matrix | 0 | 0 |
| YOLO9000 (object detection algorithm) | 0 | 0 |
| YOLOv2 (object detection algorithm) | 0 | 2 |
| k-Max Pooling | 0 | 1 |