文档的编辑和检索要求能够自动识别数学公式,数学公式识别是一个活跃的研究领域,经过多年的发展提出了许多解决方法。公式的输入数据格式有文档图像、笔划、矢量图形、特殊语言等几种形式,不同的输入方式决定数学公式的提取和和识别方式的不同。本文介绍了数学表达式识别邻域的研究现状,讨论了表达的检测、符号识别、结构分析、语义分析等四部分的问题,并提出未来数学表达式的研究方向和热点。<br/>In order to search and edit the documents which contain mathematical formulas, we must auto-matically recognize the expression. Mathematical formula recognition is an active research field and many approaches have been proposed over the years. Nowadays, there are several forms of input data format such as document images, strokes, vector images and so on. Different ways of inputs determine the methods to extract mathematical formulas and different ways of mathematical formula recognition. This article describes the currently researching work of mathematical formula recognition, discusses the four components problems in mathematical formula recognition: the detection of expression, symbol recognition, structural analysis, interpretation and so on, and points out the future research directions of mathematical expressions.
数学公式识别,研究现状,文档图像,笔划,矢量图形, Mathematical Formula Recognition Research Status Document Images Strokes Vector Images数学公式识别研究现状
Lei Hu [6] 提出了递归基线提取算法用于符号结构分析,并将手写笔划作为输入数据。基线提取在改进的LL(1)分析器中用于词法分析,当解析器要求沿当前基线输入最左边的或左到右的下一个符号时返回一组候选符号。候选符号被用来产生解析树的森林,返回排名最高的解析结果。隐马尔可夫模型(HMM)用于符号分类,和符号之间的水平邻接使用两个概率二次分类器,一个用于上行符号,另一个用于中心以及下行符号。MacLean等 [7] 提出一个系统,它可以捕获所有对输入的可识别的解释,并将它们组织在解析森林中。如果排名第一的解释是不正确的,用户可以要求交替,并选择他们想要的识别结果。树提取步骤采用一种新颖的概率树评分策略,其中贝叶斯网络是基于输入的结构构成,各关节变量赋值对应于不同的语法分析树,然后为了降低概率产生解析树。
刘东明,陈 联,李 明,张 矩, (2015) 数学公式识别研究现状Research Status of Mathematical Formula Recognition. 计算机科学与应用,06,218-224. doi: 10.12677/CSA.2015.56028
参考文献 (References)ReferencesPavan Kumar, P., Agarwal, A. and Bhagvati, C. (2011) A rule-based approach to form mathematical symbols in printed mathematical expressions. Lecture Notes in Computer Science, 7080, 181-192.Yoo, Y.-H. and Kim, J.-H. (2013) Mathematical formula recognition based on modified recursive projection profile cutting and labeling with double linked list. Advances in Intelligent Systems and Computing, 208, 983-992.Amit, P. (2014) Intelligent combination of structural analysis algorithms: application to mathematical expression recognition. Rochester Institute of Technology. http://scholarworks.rit.edu/theses/7874郭育生, 黄磊, 刘昌平 (2007) 基于多候选的数学公式识别系统. 计算机研究与发展, 44, 1144.肖建于, 王潜平, 洪留荣 (2008) 基于凸壳和模糊识别的数学公式识别. 计算机应用与软件, 29, 208.Hu, L. (2012) Baseline extraction-driven Parsing of handwritten mathematical expressions. 21st International Conference on Pattern Recognition (ICPR), 11-15 November 2012, 326-330.MacLean, S. and Labahn, G. (2014) A Bayesian model for recognizing handwritten mathematical expressions. Thu, 18 September 2014 14:45:24 GMT.Le, A.D., Van Phan, T. and Nakagawa, M. (2014) A system for recognizing online handwritten mathematical expressions and improvement of structure analysis. 11th IAPR International Workshop on Document Analysis Systems (DAS), 7-10 April 2014, 51-55.Simistira, F., Papavassiliou, V., Katsouros, V. and Carayannis, G. (2012) A system for recognition of on-line hand- written mathematical expressions. ICFHR 12 Proceedings of the 2012 International Conference on Frontiers in Hand- writing Recognition, 193-198.Álvaro, F., Sánchez, J.-A. and Benedí, J.-M. (2014) Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recognition Letters, 35, 58-67.Awala, A.-M., Mouchèreb, H. and Viard-Gaudinb, C. (2014) A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognition Letters, 35, 68-77.MacLean, S. and Labahn, G. (2013) A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. International Journal on Document Analysis and Recognition (IJDAR), 16, 139-163.Nina, S., Hirata, T. and Honda, W.Y. (2011) Automatic labeling of handwritten mathematical symbols via expression matching. Graph-based representations in pattern recognition. Lecture Notes in Computer Science, 6658, 295-304.Hu, Y., Peng, L.R. and Tang, Y.J. (2014) On-line handwritten mathematical expression recognition method based on statistical and semantic analysis. 11th IAPR International Workshop on Document Analysis Systems (DAS), 7-10 April 2014, 171-175.Julca-Aguilar, F., Hirata, N., Viard-Gaudin, C., Mouchère, H. and Medjkoune, S. (2014) Mathematical symbol hypothesis recognition with rejection option. 14th International Conference on Frontiers in Handwriting Recognition, Crete, 500-504.卢晓卫, 林嘉宇 (2010) 一种基于分块树的手写数学公式结构分析算法. 计算机工程与科学, 23, 69.陈临强, 李云霞, 沈俊 (2009) 联机手写数学公式识别系统的设计和实现. 杭州电子科技大学学报: 自然科学版, 29, 36.Baker, J.B., Sexton, A.P. and Sorge, V. (2009) A linear grammar approach to mathematical formula recognition from PDF. Lecture Notes in Computer Science, 5625, 201-216.Yu, B.T., Tian, X.D. and Luo, W.J. (2014) Extracting Mathematical components directly from PDF documents for mathematical expression recognition and retrieval. Lecture Notes in Computer Science, 8795, 170-179.Lin, X.Y., Gao, L.C., Tang, Z., Lin, X.F. and Hu, X. (2011) Mathematical formula identification in PDF documents. International Conference on Document Analysis and Recognition (ICDAR), 1419-1423.