Category
page 1Statistical natural language processing
language model
probabilistic model of a natural or formal language, or generally of elements of signal sequences
text mining
process of analysing text to extract information from it
glottochronology
Glottochronology (from Attic Greek 'tongue, language' and 'time') is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.

tf–idf
In information retrieval, tf–idf (term frequency–inverse document frequency, TF*IDF, TFIDF, TF–IDF, or Tf–idf) is a measure of importance of a word to a document in a collection or corpus, adjusted for the fact that some words appear more frequently in general. Like the bag-of-words model, it models a document as a multiset of words, without word order. It is a refinement over the simple bag-of-words model, by allowing the weight of words to depend on the rest of the corpus.
statistical machine translation
machine translation paradigm
Natural Language Toolkit
suite for natural language processing (NLP)
stochastic parrot
metaphor to describe the theory that large language models, though able to generate plausible language, do not understand the meaning of the language they process
topic model
type of model
Apache OpenNLP
machine learning based toolkit for the processing of natural language text
latent Dirichlet allocation
generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar
F1 score
thumb|350px|Precision and recall
In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all samples predicted to be positive, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and
probabilistic context-free grammar
Grammar model in linguistics
small language model
language model with relatively few parameters
probabilistic latent semantic analysis
Method for analyzing semantic data

interactive machine translation
sub-field of computer-aided translation
Frederick Jelinek
Czech linguist (1932–2010)
Moses
statistical machine translation system
additive smoothing
statistical technique for smoothing categorical data
statistical semantics
subfield of computational linguistics and natural language processing