Russian National Corpus

This text from Wikipedia is available under the Creative Commons Attribution-ShareAlike License, additional terms may apply. See Terms of Use for details. WikipediaŽ is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.

The Russian National Corpus (English official name; the Russian name is Национальный корпус русского языка, lit. National Corpus of Russian language, but as the official English variant the Russian National Corpus is used) is a corpus of Russian language that has been available online since April 29, 2004. It is being created by the Institute of Russian language, Russian Academy of Sciences.

It currently contains about 150 million word forms that are automatically lemmatized and POS-/grammeme-tagged, i. e. all the possible morphological analyses for each orthographic form are ascribed to it. Lemmata, POS, grammatical items and their combinations are searchable. Additionally, 6 million word forms are in the subcorpus with manually resolved homonymy.

The subcorpus with resolved morphological homonymy is also automatically accentuated. The whole corpus has a searchable tagging concerning lexical semantics (LS), including morphosemantic POS subclasses (proper noun, reflexive pronoun etc.), LS characteristics proper (thematic class, causativity, evaluation), derivation (diminutive, adverb formed from adjective etc.).

The RNC includes also the following subcorpora:

  • a treebank of syntactical dependencies (largely based on the Igor Mel'čuk's Meaning-Text Theory)
  • an English<=>Russian and German=>Russian parallel corpora;
  • a corpus of Russian poetry, where the rhyming words and poetic prosody (including meter, stanzas etc.) is additionally tagged;
  • a corpus of Russian dialects with specific dialect grammar tagging;
  • a corpus showing the history of Russian stress
  • an educational subcorpus reflecting school standards.

All the texts have tags bearing metatextual information - the author, his/her birth date, creation date, text size, text genres (general fiction, detective story, newspaper article etc.); all these categories are browsable and searchable separately. It is possible to define a user's subcorpus to search lemmata/POS-grammeme/semantic tags combinations only within this subset.

The corpus will be made available off-line and distributed for non-commercial purposes, but currently due to some technical and/or copyright problems it is accessible only on-line.


  1. Russian language
  2. Russian alphabet
  3. Russian orthography
  4. Russian phonology
  5. Russian grammar
  6. IPA for Russian
  7. Russian-Cyrillic alphabet
  8. Informal romanizations of Russian
  9. Languages of Russia
  10. List of countries where Russian is an official language
  11. List of English words of Russian origin
  12. List of languages of Russia
  13. Spelling rule
  14. Romanization of Russian
  15. Russian language-History of the Russian language
  16. List of Russian language television channels
  17. Reduplication in the Russian language
  18. Reforms of Russian orthography
  19. Rules of Russian Orthography and Punctuation
  20. Russian language-Runglish
  21. Russian exonyms
  22. Russian Morse code
  23. Russian sayings
  24. Russianism
  25. Russophone
  26. Slavic languages
  27. Test of Russian as a Foreign Language
  28. The differences of Moscovian and St.-Petersburg's speech
  29. Vowel reduction in Russian
  30. Russian proverbs
  31. Russian proverbs:USSR
  32. ALA-LC romanization for Russian
  33. Great Russian language
  34. Olympiada of Spoken Russian
  35. Russian cursive
  36. Russian jokes
  37. Russian National Corpus


LONWEB.ORG is a property of Casiraghi Jones Publishing srl
Owners: Roberto Casiraghi e Crystal Jones
Address: Piazzale Cadorna 10 - 20123 Milano - Italy
Tel. +39-02-78622122 email:
P.IVA e C. FISCALE 11603360154 • REA MILANO 1478561
Other company websites: