ISSN 0021-3454 (print version)
ISSN 2500-0381 (online version)

vol 62 / April, 2019

DOI 10.17586/0021-3454- 2017-60-10-932-939

UDC 004.822, 004.021


S. V. Klimenkov
ITMO University, Saint Petersburg, 197101, Russian Federation; Assistant

E. A. Tsopa
ITMO University, Saint Petersburg, 197101, Russian Federation; Assistant

S. A. Zhmylev
ITMO University, Saint Petersburg, 197101, Russian Federation; assistant

A. V. Pokid
SVD Embedded Systems Ltd.; Engineer-Programmer

N. M. Tkeshelashvili
Elcom Ltd., Software Development Department; Programmer

Read the full article 

Abstract. Development and usage of ontologies is an important part of modern text analysis application. When huge amount of text is analyzed, the lookup time in ontology becomes critical bottleneck. An approach to creation of search prefix tree of wordforms’ parts called x-gram is proposed. Division of the wordform into 3- and 4-gramms as well as phonetic and morphological syllable for Russian language is used. Every x-gram is represented by numerical index that allows its storage in the plain array. Resulting arrays are very sparse, so approach uses compactification to “insert” one array into another. When looking for a word, it is split into x-grams, the index for every x-gram is computed, consequent lookup is performed in constructed arrays, where each array corresponds to a single level of prefix tree. The developed program demonstrates the advantage of 36—50 % over Google dense hash-map in seek time and 12 % over Google sparse hash-map in memory consumption for set of Russian wordforms extracted from wellknown grammatical dictionary and Russian National Corpus. This approach is well suited for dictionary search in rarely changing wordform sets, such as ontology based on Russian Wiktionary.
Keywords: dictionary lookup, ontology based on Wiktionary, prefix tree, syllable, n-gramm, hashtables, exact match

  1. Pis'mak A.E., Kharitonova A.E., Tsopa E.A., Klimenkov S.V. Programmnye produkty i sistemy, 2016, no. 3, рр. 74–78. (in Russ.)
  2. Klimenkov S., Tsopa E., Pismak A., Yarkeev A. Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KDIR), 2016, no. 2, рр.74–80.
  3. Bocharov V.V., Granovskiy D.V., Surikov A.V. Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh (New Information Technologies in Automated Systems), Proceedings of the 15th Scientific-Practical Seminar, Moscow, 2012. (in Russ.)
  4. Neyer M.P. A Comparison of Dictionary Implementations, 2009,
  5. Mehlhorn K. and Sanders P.Algorithms and Data Structures. The Basic Toolbox, Springer, 2007, 285р.
  6. Peterson W.W. IBM J. of Research and Development, 1957, no. 1, рр. 130–146.
  7. Dumey A.I. Computers and Automation, 1956, no. 5(12), рр. 6–9.
  8. Comer D. and Shen V.Y. Practice and Experience, 1982, no. 7(12), рр. 669–682.
  9. Silverstein C. Google sparsehash package, 2010.
  10. Silverstein C. Google sparsehash package performance, 2010.
  11. Acharya Anurag, Huican Zhu, and Kai Shen. Workshop on Algorithm Engineering and Experimentation. Lecture Notes in Computer Science, Berlin–Heidelberg, Springer, 1999, no. 1619, рр. 300–315.
  12. Thenmozhi M., Srimathi H. Indian Journal of Science and Technology, 2015.
  13. Pryor D.V., Thistle M.R., and Shirazi N. Proceedings of the Workshop on FPGAs for Custom Computing Machines, 1993.
  14. Aho A.V., Hill M., Hopcroft J., Ulman J.D. Data Structures and Algorithms, 1983.
  15. Andreychenko L.N. Russkiy yazyk. Fonetika i fonologiya. Orfoepiya. Grafika i orfografiya (Russian. Phonetics and Phonology. Orthoepy. Graphics and Spelling), Moscow, 2003. (in Russ.)
  16. Malyshev B., Samarin A., and Vulis D. TUGboat, 1991, no. 2(12), рр.212–214.
  17. Zaliznyak A.A. Grammaticheskiy slovar' russkogo yazyka (Grammatical Dictionary of Russian), Moscow, 2008. (in Russ.)
  18. Granovskiy D.V., Bocharov V.V., Bichineva S.V. Komp'yuternaya lingvistika i razvitie semanticheskogo poiska v Internete (Computational Linguistics and Development of Semantic Search on the Internet), Proceedings of the Scientific Workshop of the XIII All-Russian Joint Conference "Internet and modern society", St. Petersburg, 2010, 94 р. (in Russ.)
  19. Zhukov M.A., Afanas'ev D.B. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, no. 3(15), рр. 470–475 (in Russ.)
  20. Cisłak A. and Grabowski S. A practical index for approximate dictionary matching with few mismatches, arXiv:1501.04948, 2015.