Computer encoding Cyrillic script




1 computer encoding

1.1 unicode
1.2 other
1.3 keyboard layouts





computer encoding
unicode

as of unicode version 10.0, cyrillic letters, including national , historical alphabets, encoded across several blocks:



cyrillic: u+0400–u+04ff
cyrillic supplement: u+0500–u+052f
cyrillic extended-a: u+2de0–u+2dff
cyrillic extended-b: u+a640–u+a69f
cyrillic extended-c: u+1c80–u+1c8f
phonetic extensions: u+1d2b, u+1d78
combining half marks: u+fe2e–u+fe2f

the characters in range u+0400 u+045f characters iso 8859-5 moved upward 864 positions. characters in range u+0460 u+0489 historic letters, not used now. characters in range u+048a u+052f additional letters various languages written cyrillic script.


unicode general rule not include accented cyrillic letters. few exceptions are:



combinations considered separate letters of respective alphabets, Й, Ў, Ё, Ї, Ѓ, Ќ (as many letters of non-slavic alphabets);
two frequent combinations orthographically required distinguish homonyms in bulgarian , macedonian: Ѐ, Ѝ;
a few old , new church slavonic combinations: Ѷ, Ѿ, Ѽ.

to indicate stressed or long vowels, combining diacritical marks can used after respective letter (for example, u+0301 ◌́ combining acute accent: ы́ э́ ю́ я́ etc.).


some languages, including church slavonic, still not supported.


unicode 5.1, released on 4 april 2008, introduces major changes cyrillic blocks. revisions existing cyrillic blocks, , addition of cyrillic extended (2de0...2dff) , cyrillic extended b (a640...a69f), improve support cyrillic alphabet, abkhaz, aleut, chuvash, kurdish, , moksha.


other

punctuation cyrillic text similar used in european latin-alphabet languages.


other character encoding systems cyrillic:



cp866 – 8-bit cyrillic character encoding established microsoft use in ms-dos known gost-alternative. cyrillic characters go in native order, window pseudographic characters.
iso/iec 8859-5 – 8-bit cyrillic character encoding established international organization standardization
koi8-r – 8-bit native russian character encoding. invented in ussr use on soviet clones of american ibm , dec computers. cyrillic characters go in order of latin counterparts, allowed text remain readable after transmission via 7-bit line removed significant bit each byte—the result became rough, readable, latin transliteration of cyrillic. standard encoding of 1990s unix systems , first russian internet encoding.
koi8-u – koi8-r addition of ukrainian letters.
mik – 8-bit native bulgarian character encoding use in microsoft dos.
windows-1251 – 8-bit cyrillic character encoding established microsoft use in microsoft windows. simplest 8-bit cyrillic encoding—32 capital chars in native order @ 0xc0–0xdf, 32 usual chars @ 0xe0–0xff, used yo characters somewhere else. no pseudographics. former standard encoding in gnu/linux distributions belarusian , bulgarian, displaced utf-8.
gost-main.
gb 2312 – principally simplified chinese encodings, there basic 33 russian cyrillic letters (in upper- , lower-case).
jis , shift jis – principally japanese encodings, there basic 33 russian cyrillic letters (in upper- , lower-case).

keyboard layouts

each language has own standard keyboard layout, adopted typewriters. flexibility of computer input methods, there transliterating or phonetic/homophonic keyboard layouts made typists more familiar other layouts, common english qwerty keyboard. when practical cyrillic keyboard layouts or fonts not available, computer users use transliteration or look-alike volapuk encoding type languages written cyrillic alphabet.








Comments

Popular posts from this blog

Light cavalry divisions (DLC, Division Légère de Cavalerie) List of French divisions in World War II

History VMFA-121

Biography Pavel Yablochkov