system using a prescribed set of digital values to represent textual characters
Character encoding is a system that assigns specific digital values to letters, numbers, and symbols so that computers can store and display text. It matters because different encoding systems can represent different sets of characters, and if the wrong encoding is used to read text, the characters may appear garbled or incorrect.
AI-generated from the Wikipedia summary — may contain errors.
Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represent 1 and 0, respectively; for example, W is encoded as 1010111.
Character encoding is a convention of using a numeric value to represent each character of a writing script. Not only can a character set include natural language symbols, but it can also include codes that have meanings or functions outside of language, such as control characters and whitespace. Character encodings have also been defined for some constructed languages. When encoded, character data can be stored, transmitted, and transformed by a computer. The numerical values that make up a character encoding are known as code points and collectively comprise a code space or a code page.
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).