SPEECH ANALYSIS SYSTEMS
ANALYSIS OF LANGUAGE STATISTICAL ASPECTS AND THEIR GENDER VARIATIONS BY THE EXAMPLE OF LITHUANIAN
Mikhail V. Khitrov, Andrey Yu. Vasiliev
7
Language aspects applicable for automatic language and speaker recognition are revealed. A method proposed for language recognition is based on statistical parameters of pitch stress pattern in a given language. Typical ranges of the parameters variation for languages of different language families are compared.
DETERMINATION OF CHANNEL-INDEPENDENT INFORMATION INDICATORS
Vitaliy V. Kiselyov, Andrey V. Tkachenia, Mikhail V. Khitrov
12
Information indicators of speech are analyzed for creation of channel-independent feature space aimed at improvement of speaker recognition system efficiency. For the problem of determination of similarity between several audio recordings, the optimal set of channel-independent information feature vectors is determined experimentally with the use of dynamic time warping.
ANALYSIS OF DATA BALANCING PROBLEM IN ACOUSTIC MODELING OF AUTOMATIC SPEECH RECOGNITION SYSTEM
Tomashenko Natalia A, Yury Yu. Khohlov
17
The problem of data balancing for training of acoustic models for automatic speech recognition system is considered. A metric is proposed which enables an explicit account for the data level in a cluster during triphone clustering. The proposed approach is shown to improve the quality of speech recognition.
CROSS-VALIDATION STATE CONTROL IN ACOUSTIC MODEL TRAINING OF AUTOMATIC SPEECH RECOGNITION SYSTEM
German A. Chernykh, Maksim L. Korenevsky, Levin Kirill E, Irina A. Ponomareva, Tomashenko Natalia A
23
A technique is presented for optimization of Gaussian mixture models (GMM) size during the training of hidden Markov models (HMM), an essential part of many of the automatic speech recognition systems. Application of the technique increases recognition accuracy by avoiding the over-fitting effect, and reduces significantly computational load of the recognition procedure.
STATISTICAL METHODS FOR AUTOMATIC PROSODIC BREAK DETECTION IN A TEXT-TO-SPEECH SYSTEM
Pavel Chistikov, Olga G. Khomitsevich, Sergey V. Rybin
28
Application of statistical methods for predicting positions and durations of prosodic breaks in a text-to-speech system is proposed. The methods are shown to ensure better results as compared with a baseline rule-based system.
SYSTEMS OF SPEECH AND ACOUSTIC SIGNAL PROCESSING
TIME DELAY ESTIMATION OF AUDIO SIGNALS USING THEIR ENVELOPES
Sergey A. Aleynikov, Mikhail B. Stolbov
33
The problem of time delay estimation for dual-channel acoustic signals (speech, music, etc.) recorded under reverberant conditions is investigated. A method of time delay estimation based on cross-correlation of temporal envelopes of the signals is presented. Comparison with other known methods of time delay estimation is provided.
SPEECH SIGNALS STOCHASTICITY AND ITS EVALUATION
Sergey A. Aleynikov, Mikhail B. Stolbov
40
The known and new presented methods for evaluation of speech signal stochasticity are analyzed. Results of statistical simulation demonstrate advantages of the proposed approach as compared with the known ones: the estimates obtained with the new method possess lower variance and bias.
ASSESSMENT OF FUNCTIONAL SAFETY OF DETECTION OF VIBROACOUSTIC SIGNAL FROM ARRIVING TRAIN WITH ENERGY SENSOR
Bibikov Sergey V, Matveev Yuri Nikolaevich, Nikolay N. Semenov
47
Functional safety of detection of a vibroacoustic signal from arriving train with energy detector is investigated. The lower value of detectability level is derived from a proposed false alarm probability. Sufficiency of developed method for detection of arriving train in the case of long-welded rails is demonstrated.
TARGET AND NON-TARGET SPEECH SEPARATION USING A DUAL MICROPHONE SYSTEM
Mikhail B. Stolbov, Marina Yu. Tatarnikova
53
A practical speech detection method for robust automatic speech recognition is proposed. The method employs a system of two symmetrical microphones oriented in opposite directions. The algorithm of signal processing allows for spatial filtering of speakers.
SPEAKER RECOGNITION SYSTEMS
EXPERT SYSTEMS AND METHODS FOR SPEAKER IDENTIFICATION
Bulgakova Elena V, Krasnova Ekaterina V
58
Modern approaches to forensic phonographic examination are analyzed. Utilizing different Various software used for the purpose of speaker identification is considered. Special phonographic editor SIS II developed by the Speech Technology Center is described.
CONCEPT OF THE NATIONAL VOICE ACCOUNTING AND VOICE BIOMETRIC SEARCH SYSTEM
Dmitry V. Dyrmovsky, Sergey L. Koval, Mikhail V. Khitrov
63
Concept of the national voice accounting and voice biometric search system is presented.
ANALYSIS OF MANIFOLD LEARNING METHODS APPLICABILITY TO SPEAKER RECOGNITION
Matveev Yuri Nikolaevich, Andrei Shulipa
70
Applicability of manifold learning methods widely used in image recognition, to the problem of speaker identification, is considered. Experimental study are carried out, the results are analyzed.
EMPLOYMENT OF DTW-BASED HMM-GMM MULTI-SESSION TRAINING IN TEXTDEPENDENT SPEAKER VERIFICATION
Sergey A. Novoselov, Vladislav A. Sukhmel, Sholokhov Alexey Vladimirovich, Timur Pekhovsky
77
An HMM training procedure using several password utterances is proposed. The proposed method is based on the Dynamic Time Warping algorithm, and is shown to allow for reduction of verification system errors.
STUDY OF VOICE VERIFICATION SYSTEM TOLERANCE TO SPOOFING ATTACKS USING A
TEXT-TO-SPEECH SYSTEM
Shchemelinin Vadim L. , Konstantin Simonchik
84
A method of spoofing text-dependent voice verification system based on the most popular TTS approaches (Unit Selection and HMM) is presented. The method is shown to allow for false acceptance error of 98—100 % in the case of sufficiently large TTS database. A distinctive feature of the method is that it can be fully automated if used in conjunction with a speech recognition system.