KOI8-R
| Alias(es) | cp878 (code page 878) |
|---|---|
| Languages | Russian, Bulgarian |
| Classification | 8-bit KOI, extended ASCII |
| Extends | KOI8-B |
| Based on | KOI-8 |
| Other related encodings | KOI8-U, KOI8-RU |
KOI8-R (RFC 1489) is an 8-bit character encoding derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses the Russian subset of a Cyrillic script. KOI-8, on its turn, is an 8-bit extension of the KOI-7 encoding, which inherited a phonetic correspondence of Russian and Latin letters from the MTK-2 teletype code. As a result, Russian Cyrillic letters in KOI8-R are in pseudo-Latin alphabetical order rather than the normal Cyrillic one like in ISO 8859-5. Although this may seem unnatural, this has the useful effect that if the 8th bit is stripped, the text remains partially readable in any ASCII-based encoding (including KOI8-R itself) as a case-reversed transliteration. For example, "Код для обмена и обработки информации" (the Russian meaning of the "KOI" acronym) becomes kOD DLQ OBMENA I OBRABOTKI INFORMACII.
KOI-8 stands for 8-bitnyy kod dlya obmena i obrabotki informatsii (Russian: 8-битный код для обмена и обработки информации) which means "8-Bit Code for Information Interchange".[1] In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878.[2][3] KOI8-R also happens to cover Bulgarian.
It lacks proper quotation marks for these languages: both «...» and the Bulgarian „...“. Windows-1251 does support these, as well as more letters, and has thus become more popular. KOI8-R is used by less than 0.004% of websites, mostly Russian and Bulgarian.[citation needed] Unicode and UTF-8 is preferred to single-byte Cyrillic encodings in modern applications, Unicode contains 436 Cyrillic letters including for Old Cyrillic.
Character set
[edit]The following table shows the KOI8-R encoding. Each character is shown with its equivalent Unicode code point.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0x | ||||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | ─ 2500
|
│ 2502
|
┌ 250C
|
┐ 2510
|
└ 2514
|
┘ 2518
|
├ 251C
|
┤ 2524
|
┬ 252C
|
┴ 2534
|
┼ 253C
|
▀ 2580
|
▄ 2584
|
█ 2588
|
▌ 258C
|
▐ 2590
|
| 9x | ░ 2591
|
▒ 2592
|
▓ 2593
|
⌠ 2320
|
■ 25A0
|
∙ 2219
|
√ 221A
|
≈ 2248
|
≤ 2264
|
≥ 2265
|
NBSP | ⌡ 2321
|
° 00B0
|
² 00B2
|
· 00B7
|
÷ 00F7
|
| Ax | ═ 2550
|
║ 2551
|
╒ 2552
|
ё 0451
|
╓ 2553
|
╔ 2554
|
╕ 2555
|
╖ 2556
|
╗ 2557
|
╘ 2558
|
╙ 2559
|
╚ 255A
|
╛ 255B
|
╜ 255C
|
╝ 255D
|
╞ 255E
|
| Bx | ╟ 255F
|
╠ 2560
|
╡ 2561
|
Ё 0401
|
╢ 2562
|
╣ 2563
|
╤ 2564
|
╥ 2565
|
╦ 2566
|
╧ 2567
|
╨ 2568
|
╩ 2569
|
╪ 256A
|
╫ 256B
|
╬ 256C
|
© 00A9
|
| Cx | ю 044E
|
а 0430
|
б 0431
|
ц 0446
|
д 0434
|
е 0435
|
ф 0444
|
г 0433
|
х 0445
|
и 0438
|
й 0439
|
к 043A
|
л 043B
|
м 043C
|
н 043D
|
о 043E
|
| Dx | п 043F
|
я 044F
|
р 0440
|
с 0441
|
т 0442
|
у 0443
|
ж 0436
|
в 0432
|
ь 044C
|
ы 044B
|
з 0437
|
ш 0448
|
э 044D
|
щ 0449
|
ч 0447
|
ъ 044A
|
| Ex | Ю 042E
|
А 0410
|
Б 0411
|
Ц 0426
|
Д 0414
|
Е 0415
|
Ф 0424
|
Г 0413
|
Х 0425
|
И 0418
|
Й 0419
|
К 041A
|
Л 041B
|
М 041C
|
Н 041D
|
О 041E
|
| Fx | П 041F
|
Я 042F
|
Р 0420
|
С 0421
|
Т 0422
|
У 0423
|
Ж 0416
|
В 0412
|
Ь 042C
|
Ы 042B
|
З 0417
|
Ш 0428
|
Э 042D
|
Щ 0429
|
Ч 0427
|
Ъ 042A
|
See also
[edit]- KOI8-B, a derivation of KOI8-R with only the letter subset implemented
- KOI8-U, another derivative encoding which adds Ukrainian characters
- KOI character encodings
- RELCOM
- Windows-1251, another common Cyrillic character encoding
References
[edit]- ^ (in Russian) ГОСТ 19768-74 (СТ СЭВ 358-76). Машины вычислительные и система обработки данных. Коды 8-битные для обмена и обработки информации.
- ^ "SBCS code page information - CPGID: 00878 / Name: Russian internet koi8-r". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
- ^ "CCSID information document; CCSID 878; KOI8-R CYRILLIC". IBM. Retrieved 2017-02-18.
- ^ Richter, Helmut (2016-01-04) [1999-08-18]. "KOI8-R.TXT". 2.0. Retrieved 2016-12-09.
- ^ Code Page CPGID 00878 (pdf) (PDF), IBM
- ^ Code Page CPGID 00878 (txt), IBM
- ^ International Components for Unicode (ICU), ibm-878_P100-1996.ucm, 2002-12-03
Further reading
[edit]- Flohr, Guido; Kiss, Gabor; Chernov, Andrey A. (2016) [2006]. "Locale::RecodeData::KOI8_R - Conversion routines for KOI8-R". CPAN libintl-perl. 1.0. Archived from the original on 2017-01-15. Retrieved 2017-01-15.
- Kostis, Kosta. "koi8-r (Russian U*IX encoding, also used by RELCOM)". 1.20. Archived from the original on 2017-01-16. Retrieved 2017-01-16.
- RFC 1489
- "KOI8-R (RFC 1489)". Kermit. Columbia University. Retrieved 2020-06-24.
- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". 1.3. Retrieved 2020-06-24.
External links
[edit]- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts with broken KOI8-R or other character encodings.
- "The Home of the KOI8-R since 1995". 1995. Retrieved 2016-12-05.
- Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
- Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
- Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016-12-05. Retrieved 2016-12-05.