Category

Statistical natural language processing

page 1

probabilistic model of a natural or formal language, or generally of elements of signal sequences

process of analysing text to extract information from it

glottochronology

Glottochronology (from Attic Greek 'tongue, language' and 'time') is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.

In information retrieval, tf–idf (term frequency–inverse document frequency, TF*IDF, TFIDF, TF–IDF, or Tf–idf) is a measure of importance of a word to a document in a collection or corpus, adjusted for the fact that some words appear more frequently in general. Like the bag-of-words model, it models a document as a multiset of words, without word order. It is a refinement over the simple bag-of-words model, by allowing the weight of words to depend on the rest of the corpus.

statistical machine translation

machine translation paradigm

Natural Language Toolkit

suite for natural language processing (NLP)

stochastic parrot

metaphor to describe the theory that large language models, though able to generate plausible language, do not understand the meaning of the language they process

machine learning based toolkit for the processing of natural language text

latent Dirichlet allocation

generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar

thumb|350px|Precision and recall In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all samples predicted to be positive, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and

probabilistic context-free grammar

Grammar model in linguistics

small language model

language model with relatively few parameters

probabilistic latent semantic analysis

Method for analyzing semantic data

interactive machine translation

sub-field of computer-aided translation

Frederick Jelinek

Czech linguist (1932–2010)

statistical machine translation system

additive smoothing

statistical technique for smoothing categorical data

statistical semantics

subfield of computational linguistics and natural language processing