language family of Continental Southeast Asia
Austroasiatic is a language family spoken across Continental Southeast Asia that includes languages like Khmer (spoken in Cambodia) and Vietnamese. It matters because it represents one of the major language groups in the region and helps linguists understand the historical connections and migrations of the millions of people who speak these languages today.
AI-generated from the Wikipedia summary — may contain errors.
The Austroasiatic languages (/ˌɒstroʊ.eɪʒiˈætɪk, ˌɔː-/ OSS-troh-ay-zhee-AT-ik, AWSS-) are a large language family spoken throughout Mainland Southeast Asia, South Asia, and East Asia. These languages are natively spoken by the majority of the population in Vietnam and Cambodia, and by minority populations scattered throughout parts of Thailand, Laos, India, Myanmar, Malaysia, Bangladesh, Nepal, and southern China. Approximately 117 million people speak an Austroasiatic language, of which more than two-thirds are Vietnamese speakers. Among the Austroasiatic languages, only Vietnamese, Khmer, and Mon have lengthy, established presences in the written, historical record. Only two are presently considered to be the national languages of sovereign states: Vietnamese in Vietnam, and Khmer in Cambodia. The Mon language is a recognized indigenous language in Myanmar and Thailand, while the Wa language is a "recognized national language" in the de facto autonomous Wa State within Myanmar. Santali is one of the 22 scheduled languages of India. The remainder of the family's languages are spoken by minority groups and have no official status.
Ethnologue identifies 168 Austroasiatic languages. These form thirteen established families (plus perhaps Shompen, which is poorly attested, as a fourteenth) that have traditionally been grouped into two, as Mon–Khmer, and Munda. However, one recent classification posits three groups (Munda, Mon-Khmer, and Khasi–Khmuic), while another has abandoned Mon–Khmer as a taxon altogether, making it synonymous with the larger family.
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).