ISSN 0021-3454 (print version)
ISSN 2500-0381 (online version)

vol 63 / August, 2020

DOI 10.17586/0021-3454-2019-62-12-1060-1065

UDC 004.415.2


A. V. Platonov
ITMO University, Faculty of Software Engineering and Computer Technique ;

I. A. Bessmertny
ITMO University, Saint Petersburg, 197101, Russian Federation; Associate Professor

J. A. Koroleva
ITMO University; postgraduate

Abstract. The problem of modeling the semantics of text documents based on vector representation of words in a Hilbert space is considered. The vector representation of a word reflects the words surround-ing the given word (the context of the word). If a word is found in a document more than once, the set of contexts of a word forms its generalized context, i.e. meaning of the word. Different contexts of a word may be considered as different projections, and the generalized context — as a reconstructed multidi-mensional object. The purpose of the presented study is to improve the quality of the restoration of a word the context by considering additional factors, in particular, the possible non-orthogonality of con-texts. To achieve the goal, quantum probability theory is used here, and the context recovery procedure corresponds to the problem of quantum tomography in quantum physics. The task of restoring a word context or, in terms of quantum mathematics, the probability density matrix, is solved by the method of gradient descent using machine learning. Restrictions on the learning process are implemented by a set of regularizers that ensure the convergence of the process according to the Kullback—Leibler diver-gence criterion.
Keywords: natural language texts, document semantics, context, quantum theory of probabilities, quantum tomography

