UTF-EBCDIC

UTF-EBCDIC is a character encoding capable of encoding all 1,112,064 valid character code points in Unicode using 1 to 5 bytes (in contrast to a maximum of 4 for UTF-8). It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for existing ASCII-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.

Described at

Stabilized Technical Report

unicode.org →

UTF-EBCDIC is an encoding form similar to UTF-8, but based on EBCDIC instead of ASCII. Even IBM EBCDIC-based systems usually use UTF-16 for Unicode text processing, rather than UTF-EBCDIC. Therefore, there is no need to develop this report any further.

Excerpt from a page describing this subject · 1,378 chars · not written by Vinony

Wikidata facts

Show 2 more facts

described at URL: www.unicode.org/reports/tr16
Commons category: UTF-EBCDIC

Sources (1)

wikidata.org

via Wikidata · CC0

~3 min read

Article

5 sections

Contents

Code page layout
{{anchor|UTFE}}Oracle UTFE
See also
References
External links

To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code points through (the C1 control codes) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this, UTF-8-Mod uses instead of as the format for trailing bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, the UTF-8-Mod encoding of codepoints above are larger than the UTF-8 encoding.