Cyrillic Text Encoding Methods

There are many different standards of character encoding, both the Latin and Cyrillic. When I use the word encoding, imagine the set of characters in a font. What characters are in there and where are they mapped within that font determines encoding. For example, a standard used in the USA is called ASCII or American National Standard Code for Information Interchange; its most common form is 7-bit (128 characters) encoding which only contains characters of the Latin alphabet.

Besides KOI8, there are several more methods of encoding Cyrillic text, and while surfing the Internet, you might see the names Codepage 1251 (MS-Windows ANSI) and Codepage 866 (Alternative PC). Those encodings are more commonly used on Windows and DOS computers, respectively.

KOI8
KOI stands for "Kod Obmena Informatsii" or Code of Information Exchange. It is an 8-bit encoding (hence the name KOI8) which includes both Latin and Cyrillic alphabets and is used in Russia predominantly for communication purposes, such as e-mail, USENET, Internet publishing via WWW, Gopher, etc.

The difference between OV and AV and the difference between koi8-r (RFC 1489) and koi8 ukrainian:

koi8-r

koi8 ukrainian

Updated 5/26/98 The encoding of the Ukrainian and Belarussian characters given above is not quite correct. Please compare with ISO-IR-111, KOI8-R, KOI8-uni, and KOI8-U. Submitted by Andreas Prilop

More information of the "Cyrillic alphabet soup" is available.

Apple Standard Cyrillic

Other proprietary encodings (fonts)

CP866 and CP1251

Code Page 866

Code Page 1251

ISO 8859 Character Sets
ISO 8859 is a standardized series of 8bit character sets for writing in Western alphabetic languages. It was designed by the European Computer Manufacturer's Association (ECMA).

UNICODE



Navigational Aid