Computer encoding Cyrillic script
1 computer encoding
1.1 unicode
1.2 other
1.3 keyboard layouts
computer encoding
unicode
as of unicode version 10.0, cyrillic letters, including national , historical alphabets, encoded across several blocks:
cyrillic: u+0400–u+04ff
cyrillic supplement: u+0500–u+052f
cyrillic extended-a: u+2de0–u+2dff
cyrillic extended-b: u+a640–u+a69f
cyrillic extended-c: u+1c80–u+1c8f
phonetic extensions: u+1d2b, u+1d78
combining half marks: u+fe2e–u+fe2f
the characters in range u+0400 u+045f characters iso 8859-5 moved upward 864 positions. characters in range u+0460 u+0489 historic letters, not used now. characters in range u+048a u+052f additional letters various languages written cyrillic script.
unicode general rule not include accented cyrillic letters. few exceptions are:
combinations considered separate letters of respective alphabets, Й, Ў, Ё, Ї, Ѓ, Ќ (as many letters of non-slavic alphabets);
two frequent combinations orthographically required distinguish homonyms in bulgarian , macedonian: Ѐ, Ѝ;
a few old , new church slavonic combinations: Ѷ, Ѿ, Ѽ.
to indicate stressed or long vowels, combining diacritical marks can used after respective letter (for example, u+0301 ◌́ combining acute accent: ы́ э́ ю́ я́ etc.).
some languages, including church slavonic, still not supported.
unicode 5.1, released on 4 april 2008, introduces major changes cyrillic blocks. revisions existing cyrillic blocks, , addition of cyrillic extended (2de0...2dff) , cyrillic extended b (a640...a69f), improve support cyrillic alphabet, abkhaz, aleut, chuvash, kurdish, , moksha.
other
punctuation cyrillic text similar used in european latin-alphabet languages.
other character encoding systems cyrillic:
cp866 – 8-bit cyrillic character encoding established microsoft use in ms-dos known gost-alternative. cyrillic characters go in native order, window pseudographic characters.
iso/iec 8859-5 – 8-bit cyrillic character encoding established international organization standardization
koi8-r – 8-bit native russian character encoding. invented in ussr use on soviet clones of american ibm , dec computers. cyrillic characters go in order of latin counterparts, allowed text remain readable after transmission via 7-bit line removed significant bit each byte—the result became rough, readable, latin transliteration of cyrillic. standard encoding of 1990s unix systems , first russian internet encoding.
koi8-u – koi8-r addition of ukrainian letters.
mik – 8-bit native bulgarian character encoding use in microsoft dos.
windows-1251 – 8-bit cyrillic character encoding established microsoft use in microsoft windows. simplest 8-bit cyrillic encoding—32 capital chars in native order @ 0xc0–0xdf, 32 usual chars @ 0xe0–0xff, used yo characters somewhere else. no pseudographics. former standard encoding in gnu/linux distributions belarusian , bulgarian, displaced utf-8.
gost-main.
gb 2312 – principally simplified chinese encodings, there basic 33 russian cyrillic letters (in upper- , lower-case).
jis , shift jis – principally japanese encodings, there basic 33 russian cyrillic letters (in upper- , lower-case).
keyboard layouts
each language has own standard keyboard layout, adopted typewriters. flexibility of computer input methods, there transliterating or phonetic/homophonic keyboard layouts made typists more familiar other layouts, common english qwerty keyboard. when practical cyrillic keyboard layouts or fonts not available, computer users use transliteration or look-alike volapuk encoding type languages written cyrillic alphabet.
Comments
Post a Comment