Category
page 1Computational linguistics
machine translation
use of software for language translation
natural language processing
field of computer science and linguistics
computational linguistics
interdisciplinary field
optical character recognition
computer recognition of visual text
speech synthesis
artificial production of human speech
speech recognition
automatic conversion of spoken language into text
text corpus
large and structured set of texts being the basis for linguistic research
Levenshtein distance
computer science metric for string similarity
Zipf's law
probability distribution
text mining
process of analysing text to extract information from it
Hamming distance
number of bits that differ between two strings
Tatoeba
Tatoeba is a free collection of example sentences with translations geared towards foreign language learners. It is available in more than 400 languages. Its name comes from the Japanese phrase , meaning 'for example'. It is written and maintained by a community of volunteers through a model of open collaboration. Individual contributors are known as "Tatoebans". It is run by Association Tatoeba, a French non-profit organization funded through donations.
word-sense disambiguation
problem of natural language processing; identifying which sense of a word (has multiple meanings) is used in a sentence
neural machine translation
approach to machine translation in which a large neural network is trained to maximize translation performance
parse tree
ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar

Q533822
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. Its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download. The latest offici
n-gram
An '''n-gram' is a sequence of n adjacent symbols in a particular order. The symbols may be n'' adjacent letters (including punctuation marks and blanks), syllables, or rarely whole words found in a language dataset; or adjacent phonemes extracted from a speech-recording dataset, or adjacent base pairs extracted from a genome. They are collected from a text corpus or speech corpus.
handwriting recognition
ability of a computer to receive and interpret intelligible handwritten input
word embedding
technique in natural language processing that represents words as vectors in a continuous vector space
stemming
In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Algorithms for stemming have been studied in computer science since the 1960s. Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation.
lemmatisation
Lemmatization (or less commonly lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.
automatic summarization
computer-based method for shortening a text
question answering
research area in computer science
named-entity recognition
extraction of named entity mentions in unstructured text into pre-defined categories
part-of-speech tagging
process of identifying the grammatical type of words in a text
mobile translation
a device common to works to offer an instant translation of any language. As a convention, it is used to remove the problem of translating between alien languages
gesture recognition
topic in language and computer science
foundation model
artificial intelligence model paradigm
artificial intelligence content detection
algorithms to detect AI-generated content
Google Ngram Viewer
online search engine
Bradford's law
pattern that estimates the exponentially diminishing returns of extending a search for references in science journals

treebank
thumb|upright=1.35|right|Most syntactic treebanks annotate variants of either Phrase structure grammar|phrase structure (left) or dependency structure (right).
BabelNet
BabelNet is a multilingual lexical-semantic knowledge graph, ontology and encyclopedic dictionary developed at the NLP group of the Sapienza University of Rome under the supervision of Roberto Navigli. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages an
machine-readable dictionary
dictionary stored as machine (computer) data
Universal Networking Language
declarative formal language that represents semantic data in texts
Google Neural Machine Translation
system developed by Google to increase fluency and accuracy in Google Translate
word frequency list
list of words with their frequency
language identification
Determination of language from a text sample
Association for Computational Linguistics
learned society and publisher
Zeta distribution
probability distribution on the integers in which the probability of a number is inversely proportion to a fixed power of the number
trigram
Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for performing statistical analysis of texts and in cryptography for control and use of ciphers and codes. See results of analysis of "Letter Frequencies in the English Language".
distributional semantics
research area in semantic similarities between linguistic items
intelligent character recognition
computer recognition of written text
Lexical Markup Framework
ISO standard for Natural Language Processing (NLP) lexicons and Machine Readable Dictionaries (MRD)
semantic role labeling
Process in natural language processing
semantic similarity
Metric in computational linguistics
voice activity detection
technique used in speech processing in which the presence or absence of human speech is detected
Culturomics
Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts. Researchers data mine large digital archives to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called Quantitative Analysis of Culture Using Millions of Digitized Books, co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden.
grammar induction
machine learning process
ground truth
information provided by direct observation

interlingual machine translation
type of machine translation
Heaps' law
heuristic for distinct words in a document
speech-generating device
augmenting speech device
FrameNet
FrameNet is a group of online lexical databases based upon the theory of meaning known as Frame semantics, developed by linguist Charles J. Fillmore. The project's fundamental notion is simple: most words' meanings may be best understood in terms of a semantic frame, which is a description of a certain kind of event, connection, or item and its actors.
Zipf–Mandelbrot law
discrete probability distribution
natural-language user interface
type of computer human interface
ROUGE
metric
Text Retrieval Conference
Annual meeting focused on measuring the quality of search engines, recommender engines, and algorithms for text retrieval
Lesk algorithm
classical algorithm for word sense disambiguation
named entity
real world object such as persons, locations, organizations, products, etc, that can be denoted with a proper name; it can be abstract or have a physical existence