UTF-16

Also known as 16-bit Unicode Transformation Format, Unicode Transformation Format – 16-bit, Unicode Transformation Format - 16-bit, UTF16, UTF_16

UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed, including most emoji and important CJK characters such as for personal and place names.

Key facts

Character encoding.name: UTF-16
Character encoding.mime: • text/plain;charset=UTF-16• text/plain; charset=utf-16le• text/plain; charset=utf-16be
Character encoding.image: UTF-16 encoding.svg
Character encoding.caption: Example of Unicode character encoding through UTF-16
Character encoding.standard: Unicode Standard
Character encoding.classification: Unicode Transformation Format, variable-width encoding
Character encoding.lang: International
Character encoding.encodes: ISO/IEC 10646 (Unicode)
Character encoding.extends: UCS-2

via Wikipedia infobox

Described at

IBM Documentation

IBM Documentation.

~17 min read

Article

18 sections

Contents

History
Description
U+0000 to U+D7FF and U+E000 to U+FFFF
Code points from U+010000 to U+10FFFF
U+D800 to U+DFFF (surrogates)
Examples
Byte-order encoding schemes
Efficiency
Usage
Operating systems
File systems
Messaging
Programming languages
Firmware
See also
Notes
References
External links

UTF-16 is used by the Windows API, and by many programming environments such as Java and Qt. The variable-length character of UTF-16, combined with the fact that most characters are not variable-length (so variable length is rarely tested), has led to many bugs in software, including in Windows itself.