ISSN 0021-3454 (print version)
ISSN 2500-0381 (online version)

vol 63 / August, 2020

DOI 10.17586/0021-3454-2019-62-12-1060-1065

UDC 004.415.2


A. V. Platonov
ITMO University, Faculty of Software Engineering and Computer Technique ;

I. A. Bessmertny
ITMO University, Saint Petersburg, 197101, Russian Federation; Associate Professor

J. A. Koroleva
ITMO University; postgraduate

Abstract. The problem of modeling the semantics of text documents based on vector representation of words in a Hilbert space is considered. The vector representation of a word reflects the words surround-ing the given word (the context of the word). If a word is found in a document more than once, the set of contexts of a word forms its generalized context, i.e. meaning of the word. Different contexts of a word may be considered as different projections, and the generalized context — as a reconstructed multidi-mensional object. The purpose of the presented study is to improve the quality of the restoration of a word the context by considering additional factors, in particular, the possible non-orthogonality of con-texts. To achieve the goal, quantum probability theory is used here, and the context recovery procedure corresponds to the problem of quantum tomography in quantum physics. The task of restoring a word context or, in terms of quantum mathematics, the probability density matrix, is solved by the method of gradient descent using machine learning. Restrictions on the learning process are implemented by a set of regularizers that ensure the convergence of the process according to the Kullback—Leibler diver-gence criterion.
Keywords: natural language texts, document semantics, context, quantum theory of probabilities, quantum tomography

  1. Harris Z. Word, 1954, рр. 146–162.
  2. Chomsky N. IRE Transactions on Information Theory, 1956, рр. 113–124.
  3. Levy O., Goldberg Y. and Dagan I. Transactions of the Association for Computational Linguistics, 2015, vol. 3, рр. 211–225.
  4. Deerwester S., Dumais S.T., Furnas G.W., Landauer T.K. and Harshman R. Journal of the American Society for Information Science, 1990, no. 6(41), рр. 391–407.
  5. Hofmann T. Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, NY, 1999, рр. 50–57.
  6. Mikolov T., Chen K., Corrado G. and Dean J. CoRR, arXiv:1301.3781v3, 2013.
  7. Aerts D., Czachor M. and Sozzo S. CoRR, arXiv:1104.3345v1, 2011.
  8. Barros J., Toffano Z., Meguebli Y., Doan B.-L. Quantum Interaction, Berlin, Heidelberg, Springer, 2014, рр. 110–121.
  9. Hrennikov A.J. Vvedenie v kvantovuju teoriju informacii (Introduction to Quantum Information Theory), Moscow. 2008. (in Russ.)
  10. Frommholz I., Larsen B., Piwowarski B., Lalmas M., Ingwersen P. and van Rijsbergen K. Proc. of the 3rd symp. on Information interaction in context, ACM, 2010, рр. 115–124.
  11. Piwowarski B., Frommholz I., Lalmas M., Mounia and van Rijsbergen K. Proceedings of the 19th ACM international conference on information and knowledge management, CIKM '10, NY, USA, 2010, рр. 59–68. ISBN: 9781450300995
  12. Melucci M. and Piwowarski B. Proceedings of the 2013 Conference on the Theory of Information Re-trieval, 2013, NY, USA, 2013. ISBN: 978-1-4503-2107-5.
  13. Sadrzadeh M. and Grefenstette E. Quantum Interaction, Berlin, Heidelberg, Springer, 2011, рр. 35–47. Electronic ISBN: 978-3-642-24971-6
  14. Sordoni A., Nie J. and Bengio Y. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '13, NY, USA, 2013, рр. 653–662.
  15.  Khrennikov A. Quantum Probability and White Noise Analysis, 2010, рр. 179–192. ISBN-10: 9814295426.
  16. Piwowarski B. and Lalmas M. Advances in information retrieval theory, Berlin, Heidelberg, Springer, 2009, рр. 224–231. Electronic ISBN: 978-3-642-04417-5.