WFST解码器词图生成算法中的非活跃节点检测与内存优化

doi:10.7523/j.issn.2095-6134.2019.01.015

中国科学院大学学报 ›› 2019, Vol. 36 ›› Issue (1): 109-114.DOI: 10.7523/j.issn.2095-6134.2019.01.015

WFST解码器词图生成算法中的非活跃节点检测与内存优化

丁佳伟¹, 刘加¹, 张卫强¹, 冯运波², 刘利军², 于乐²

1. 清华大学电子工程系, 北京 100084;
2. 中国移动通信信息安全管理与运行中心, 北京 100053

收稿日期:2017-12-22 修回日期:2018-03-02 发布日期:2019-01-15
通讯作者: 刘加,E-mail:liuj@tsinghua.edu.cn
基金资助:
国家自然科学基金（U1836219）资助

Inactive-node detection and memory optimization in WFST decoder lattice generation algorithm

DING Jiawei¹, LIU Jia¹, ZHANG Weiqiang¹, FENG Yunbo², LIU Lijun², YU Le²

1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
2. China Mobile Information Security Center, Beijing 100053, China

Received:2017-12-22 Revised:2018-03-02 Published:2019-01-15

摘要/Abstract

摘要： 解码器引擎是语音识别系统的核心模块，而基于加权有限状态机（WFST）的解码器则是解码器的一种典型形式。分析静态WFST解码器在实际应用中的资源占用问题，提出一种在解码和词图生成过程中通过检测非活跃节点动态回收系统资源的策略。最后，在OpenKWS 15数据集上进行实验，证明该策略使解码器的内存消耗比不回收系统资源的解码器降低75%左右。

关键词: 语音识别解码器, 加权有限状态机, 工程应用, 内存回收

Abstract: Decoder is the core module of speech recognition system, and the decoder based on the weighted finite-state transducers (WFST) is a typical form of decoder. We analyze the resource occupation of WFST-based static decoder in practice, and propose a strategy for dynamical recovery of system resources by detecting inactive nodes during decoding and lattice generation. Finally, we carry out experiments on the OpenKWS 15 dataset to show that the decoder with this strategy consumes about 75% less memory than decoders that do not reclaim system resources.

Key words: speech recognition decoder, WFST, engineering application, memory recycling

中图分类号:

TN912

丁佳伟, 刘加, 张卫强, 冯运波, 刘利军, 于乐. WFST解码器词图生成算法中的非活跃节点检测与内存优化[J]. 中国科学院大学学报, 2019, 36(1): 109-114.

DING Jiawei, LIU Jia, ZHANG Weiqiang, FENG Yunbo, LIU Lijun, YU Le. Inactive-node detection and memory optimization in WFST decoder lattice generation algorithm[J]. , 2019, 36(1): 109-114.

参考文献

[1] Young S. A review of large-vocabulary continuous-speech[J]. IEEE Signal Processing Magazine, 1996, 13(5):45-57.
[2] Rybach D, Ney H, Schluter R. Lexical prefix tree and WFST:a comparison of two dynamic search concepts for LVCSR[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2013, 21(6):1295-1307.
[3] Rybach D, Gollan C, Heigold G, et al. The RWTH Aachen University open source speech recognition system[C]//Tenth Annual Conference of the International Speech Communication Association. ISCA, 2009:2111-2114.
[4] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.
[5] Moore D, Dines J, Doss M M, et al. Juicer:a weighted finite-state transducer speech decoder[C]//International Workshop on Machine Learning for Multimodal Interaction. Springer Berlin Heidelberg, 2006:285-296.
[6] Hori T, Nakamura A. Speech recognition algorithms using weighted finite-state transducers[J]. Synthesis Lectures on Speech and Audio Processing, 2013, 9(1):1-162.
[7] Mohri M, Pereira F, Riley M. Weighted finite-state transducers in speech recognition[J]. Computer Speech & Language, 2002, 16(1):69-88.
[8] Hopcroft J E, Motwani R, Ullman J D. Introduction to automata theory, languages, and computation[J]. ACM SIGACT News, 2001, 32(1):60-65.
[9] 李伟, 吴及, 王智国. 一种快速的语音识别词图生成算法[J]. 清华大学学报(自然科学版), 2009(S1):1254-1257.
[10] Pan G, Lu C, Liu J. An exact word lattice generation method in the WFST framework[C]//Information Science and Technology (ICIST), 2016 Sixth International Conference on. IEEE, 2016:394-398.
[11] Ljolje A, Pereira F, Riley M. Efficient general lattice generation and rescoring[C]//EUROSPEECH. ISCA, 1999:1251-1254.
[12] Liu X, Chen X, Wang Y, et al. Two efficient lattice rescoring methods using recurrent neural network language models[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2016, 24(8):1438-1449.
[13] Ortmanns S, Ney H, Aubert X. A word graph algorithm for large vocabulary continuous speech recognition[J]. Computer Speech & Language, 1997, 11(1):43-72.

WFST解码器词图生成算法中的非活跃节点检测与内存优化

Inactive-node detection and memory optimization in WFST decoder lattice generation algorithm

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 8

编辑推荐

Metrics

本文评价

访问统计

联系我们

[1]	陈光祖, 黄鑫陈, 谭冲, 卜智勇. 基于EH的D2D通信中继选择算法^*[J]. 中国科学院大学学报, 0, (): 49-49.
[2]	曹乐搬, 周斌, 卜智勇, 兰云. 轨道角动量双模复用的远场增益及隔离度[J]. 中国科学院大学学报, 2021, 38(1): 137-144.
[3]	王健飞, 张卫强, 刘加. 基于多状态跳转模型的场景独立音频事件检测方法[J]. 中国科学院大学学报, 2019, 36(2): 218-225.
[4]	杨建斌, 张卫强, 刘加. 深度神经网络自适应中基于身份认证向量的归一化方法[J]. 中国科学院大学学报, 2017, 34(5): 633-639.
[5]	陈振锋, 杨晓昊, 吴蔚澜, 刘加, 夏善红. 航班预定口语对话系统的设计与实现[J]. 中国科学院大学学报, 2015, 32(2): 252-258.
[6]	吴蔚澜, 张卫强, 刘巍巍, 田垚, 陈振锋, 刘加, 夏善红. 说话人识别中基于音素分类的数据选择方法[J]. 中国科学院大学学报, 2014, 31(5): 714-719.
[7]	陈振锋, 吴蔚澜, 刘加, 夏善红. 基于Mel倒谱特征顺序统计滤波的语音端点检测算法[J]. 中国科学院大学学报, 2014, 31(4): 524-529.
[8]	刘斌; 杜利民. 基于MPC5200的嵌入式非特定人连续语音识别系统[J]. 中国科学院大学学报, 2006, 23(2): 174-177.