Skip to content
Category

Language modeling

page 1
language model
probabilistic model of a natural or formal language, or generally of elements of signal sequences
hallucination
confident unjustified claim by an AI
word embedding
technique in natural language processing that represents words as vectors in a continuous vector space
n-gram
An '''n-gram' is a sequence of n adjacent symbols in a particular order. The symbols may be n'' adjacent letters (including punctuation marks and blanks), syllables, or rarely whole words found in a language dataset; or adjacent phonemes extracted from a speech-recording dataset, or adjacent base pairs extracted from a genome. They are collected from a text corpus or speech corpus.
reinforcement learning from human feedback
variant of reinforcement learning
foundation model
artificial intelligence model paradigm
perplexity
In information theory, perplexity is a measure of uncertainty for a discrete probability distribution. The perplexity of a fair coin toss is , and that of a fair die roll is ; and generally, for a probability distribution with exactly outcomes each having a probability of exactly , the perplexity is simply . But perplexity can also be applied to unfair dice, and to other non-uniform probability distributions. It can be defined as the exponentiation of the information entropy. The larger the perplexity, the less likely it is that an observer can guess the value which will be drawn from the dist
probabilistic context-free grammar
Grammar model in linguistics
small language model
language model with relatively few parameters
probabilistic latent semantic analysis
Method for analyzing semantic data
EleutherAI
EleutherAI () is a non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in 2020 to create an open-source version of GPT-3. In early 2023, it formally incorporated as the EleutherAI Institute, a non-profit research institute. As of 2025, the nonprofit maintains widely-used training datasets, conducts research, and is involved in public policy, among other activities.