Speech Recognition Evaluation: New Tools and Datasets
A new approach to speech recognition evaluation has been presented, focusing on multi-reference and streaming scenarios. The research introduces an improved string alignment algorithm capable of handling multi-reference labeling, arbitrary-length insertions, and more precise word alignment. This is particularly useful for non-Latin languages with rich word formation, and for the analysis of long or complex speeches.
DiverseSpeech-Ru Dataset and Fine-tuning
In addition, a new test set called DiverseSpeech-Ru has been created, containing long-duration in-the-wild Russian recordings with curated multi-reference labeling. Multi-reference relabeling of existing Russian test sets was also performed, studying the fine-tuning dynamics on the corresponding training sets. The results show that models tend to adapt to dataset-specific labeling, creating an illusion of metric improvement.
Tools for Streaming and Visual Alignment
Based on the improved word alignment, tools have been developed to evaluate streaming speech recognition and to align multiple transcriptions for visual comparison. Uniform wrappers are also provided for various speech recognition models, both offline and streaming. The code will be made publicly available.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!