classic compilation of basic concepts for the purposes of historical-comparative linguistics
A Swadesh list is a carefully selected set of basic, everyday words (like numbers, body parts, and common verbs) that linguists use to compare languages and trace their historical relationships. It matters because these fundamental words are less likely to change over time or be borrowed from other languages, making them reliable tools for determining whether languages share a common ancestor and how long ago they may have diverged.
AI-generated from the Wikipedia summary — may contain errors.
A Swadesh list (/ˈswɑːdɛʃ/) is a compilation of tentatively universal concepts for the purposes of lexicostatistics. That is, a Swadesh list is a list of forms and concepts which all languages, without exception, have terms for, such as star, hand, water, kill, sleep, and so forth. The number of such terms is small – a few hundred at most, or possibly less than a hundred. The inclusion or exclusion of many terms is subject to debate among linguists; thus, there are several different lists, and some authors may refer to "Swadesh lists." The Swadesh list is named after linguist Morris Swadesh.
Translations of a Swadesh list into a set of languages allow for researchers to quantify the interrelatedness of those languages. Swadesh lists are used in lexicostatistics (the quantitative assessment of the genealogical relatedness of languages) and glottochronology (the dating of language divergence). For instance, the terms on a Swadesh list can be compared between two languages (since both languages will have them) to see if they are related and how closely, thus giving useful information that can be further applied to comparison of the languages. (Actual lexicostatistics is quite complicated, and usually sets of languages are compared.)
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).