Computer Science and Application
Vol. 09  No. 12 ( 2019 ), Article ID: 33294 , 11 pages
10.12677/CSA.2019.912252

Anomaly Detection of Large Scale Microservice Architecture Software System Based on Log Parsing

Liyuan Tai1, Chunqi Tian1, Wei Wang2

1Department of Computer Science and Engineering, Tongji University, Shanghai

2School of Data Science, East China Normal University, Shanghai

Received: Nov. 14th, 2019; accepted: Nov. 29th, 2019; published: Dec. 5th, 2019

ABSTRACT

In recent years, with the rise of microservice architecture, the scale of the system is becoming larger and larger. The traditional manual positioning problems and anomaly methods are inefficient and time and energy are consumed. How to carry out automatic anomaly detection has attracted extensive attention of researchers. It is an effective means to carry out anomaly detection through logs. Due to the complexity of microservice architecture software system business, the amount of log data generated is huge, and these logs are unstructured logs from different cluster nodes and different user requests, with various types and complex formats, so it is difficult to extract useful log information for anomaly detection. This paper proposes an anomaly detection method that analyzes log source code through an abstract syntax tree, converts unstructured log data into structured data, and then groups the structured logs according to time windows and event identifiers. Long and short term memory networks are modeled to detect abnormal execution paths in the system. The experiment shows that it can effectively detect the anomalies in the microservice architecture software system, and the accuracy of the model is improved by about 10% compared with the traditional statistical method. At the same time, we also study the effect of the length of the log key sequence and the size of the training data set on the anomaly detection model.

Keywords:Log Parsing, Exception Detection, Microservice, Abstract Syntax Tree, Long Short Term Memory Network

1同济大学计算机科学与技术系，上海

2华东师范大学数据科学学院，上海

1. 引言

Figure 1. Heterogeneous log of microservice architecture software system

Figure 2. Overall flow chart of exception detection based on log analysis

2. 相关工作

Zhao X，Kc K [12] [13] 等人基于源代码分析日志的文法结构，形成正则表达式。文献 [14] 通过分析原始日志并结构化来发现应用程序日志中的模式，它从一组表示系统正常运行的日志中发现一组GROK模式，然后使用这些GROK模式来解析日志。

3. 异常检测模型

3.1. 微服务系统日志数据预处理

3.1.1. 微服务系统日志解析

Figure 3. Log of service gateway in the system

Figure 4. Program fragment: method call

3.1.2. 微服务系统日志划分

3.2. 建立模型

3.2.1. 提取微服务系统日志键序列

3.2.2. LSTM异常检测模型

Figure 5. Unit of LSTM recurrent neural network

$\mathrm{Pr}\left({d}_{t}=k|{y}_{t}\right)=\frac{{e}^{{y}_{t}^{k}}}{{\sum }_{i=1}^{k}{e}^{{y}_{t}^{k}}}$ (1)

$C=-{\sum }_{i=1}^{K}{w}_{k}×\left[{d}_{t}^{i}\mathrm{log}\left({y}_{t}^{i}\right)+\left(1-{d}_{t}^{i}\right)\mathrm{log}\left(1-{y}_{t}^{i}\right)\right]$ (2)

4. 实验及分析

4.1. 实验数据准备

EFK是由一套开源软件组成的日志解决方案，它包括三个组件：Elasticsearch, Fluentd, Kibana。Elasticsearch是一个分布式的日志存储和日志搜索引擎，通过Restful方式进行交互，Fluentd负责收集日志发送给 Elasticsearch, Kibana可以将Elasticsearch中的数据通过友好的界面展示出来。我们通过EFK来收集在线实训平台系统产生的海量日志，当系统管理员发现问题时，会记录该异常数据。

4.2. 准确率

Figure 6. Microservice architecture software system architecture

Table 1. Confusion matrix

$precision=\frac{TP}{TP+FP}$ (3)

$recall=\frac{TP}{TP+FN}$ (4)

${F}_{1}=\frac{2×precision×recall}{precision+recall}$ (5)

Figure 7. Comparison of the accuracy of LSTM anomaly detection model with PCA and Logcollect methods

4.3. 日志键序列的长度

Figure 8. Effect of log key sequence length on the accuracy of LSTM anomaly detection model

4.4. 训练数据集的大小

Figure 9. Effect of training data set size on the accuracy of LSTM anomaly detection model

5. 结论

Anomaly Detection of Large Scale Microservice Architecture Software System Based on Log Parsing[J]. 计算机科学与应用, 2019, 09(12): 2266-2276. https://doi.org/10.12677/CSA.2019.912252

1. 1. Dragoni, N., Giallorenzo, S., Lafuente, A.L., et al. (2016) Microservices: Yesterday, Today, and Tomorrow. In: Mazzara, M. and Meyer, B., Eds., Present and Ulterior Software Engineering, Springer, Cham, 195-216. https://doi.org/10.1007/978-3-319-67425-4_12

2. 2. Gabbrielli, M., Giallorenzo, S., Guidi, C., Mauro, J. and Montesi, F. (2016) Self-Reconfiguring Microservices. In: Ábrahám, E., Bonsangue, M. and Johnsen, E., Eds., Theory and Practice of Formal Methods. Lecture Notes in Computer Science, Springer, Cham, 194-210. https://doi.org/10.1007/978-3-319-30734-3_14

3. 3. Thönes, J. (2015) Microservices. IEEE Software, 32, 113-116. https://doi.org/10.1109/MS.2015.11

4. 4. 廖湘科, 李姗姗, 董威, 等. 大规模软件系统日志研究综述[J]. 软件学报, 2016, 27(8): 1934-1947.

5. 5. Alspaugh, S., Chen, B., Lin, J., Ganapathi, A., Hearst, M. and Katz, R. (2014) An-alyzing Log Analysis: An Empirical Study of User Log Mining. LISA14, 62-77.

6. 6. Lee, G., Lin, J., Liu, C., Lorek, A. and Ryaboy, D. (2012) The Unified Logging Infrastructure for Data Analytics at Twitter. Proceedings of the VLDB En-dowment, 5, 1771-1780. https://doi.org/10.14778/2367502.2367516

7. 7. 陆杰, 李丰, 李炼. 分布式系统中的日志分析及应用[J]. 高技术通讯, 2019, 29(4): 303-320.

8. 8. Tang, L., Li, T. and Perng, C.S. (2011) LogSig: Generat-ing System Events from Raw Textual Logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, 785-794. https://doi.org/10.1145/2063576.2063690

9. 9. Fu, Q., Lou, J.G., Wang, Y. and Li, J. (2009) Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. 2009 Ninth IEEE International Conference on Data Mining, Miami, FL, 6-9 December 2009, 149-158. https://doi.org/10.1109/ICDM.2009.60

10. 10. Vaarandi, R. (2004) A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs. In: Aagesen, F.A., Anutariya, C. and Wuwongse, V., Eds., Intelligence in Communication Systems. INTELLCOMM 2004. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 293-308. https://doi.org/10.1007/978-3-540-30179-0_27

11. 11. Yamanishi, K. and Maruyama, Y. (2005) Dynamic Syslog Mining for Network Failure Monitoring. Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, 21-24 August 2005, 499-508. https://doi.org/10.1145/1081870.1081927

12. 12. Zhao, X., Zhang, Y., Lion, D., et al. (2014) LPROF: A Non-Intrusive Request Flow Profiler for Distributed Systems. In: Proceedings of the 11th USENIX Symposium on Op-erating Systems Design and Implementation, Broomfield, CO, 629-644.

13. 13. Kc, K. and Gu, X. (2011) ELT: Efficient Log-Based Troubleshooting System for Cloud Computing Infrastructures. Proceedings of the 30th IEEE Symposium on Reliable Distributed Systems, Madrid, Spain, 4-7 October 2011, 11-20. https://doi.org/10.1109/SRDS.2011.11

14. 14. Debnath, B, Khan, L, Solaimani, M., et al. (2018) LogLens A Re-al-Time Log Analysis System. IEEE 38th International Conference on Distributed Computing Systems, Vienna, Austria, 2-6 July 2018, 1052-1062. https://doi.org/10.1109/ICDCS.2018.00105

15. 15. Beschastnikh, I., Brun, Y., Ernst, M.D. and Krishnamurthy, A. (2014) Inferring Models of Concurrent Systems from Logs of Their Behavior with CSight. Proceedings of the 36th In-ternational Conference on Software Engineering, Hyderabad, Italy, 31 May-7 June 2014, 468-479. https://doi.org/10.1145/2568225.2568246

16. 16. Beschastnikh, I., Brun, Y., Schneider, S., et al. (2011) Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models. Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Szeged, Hungary, 5-9 Sep-tember 2011, 267-277. https://doi.org/10.1145/2025113.2025151

17. 17. Logstash (2018) Centralize, Transform & Stash Your Data. https://www.elastic.co/cn/products/logstash

18. 18. Chen, M., Zheng, A.X., Lloyd, J., Jordan, M.I. and Brewer, E. (2004) Failure Diagnosis Using Decision Trees. Proceedings of the 1st International Conference on Autonomic Compu-ting, New York, 17-18 May 2004, 36-43.

19. 19. Liang, Y., Zhang, Y., Xiong, H. and Sahoo, R. (2007) Failure Predic-tion in IBM BlueGene/L Event Logs. Seventh IEEE International Conference on Data Mining, Omaha, NE, 28-31 Octo-ber 2007, 583-588. https://doi.org/10.1109/ICDM.2007.46

20. 20. Lin, Q., Zhang, H., Lou, J.G., Zhang, Y. and Chen, X. (2016) Log Clustering Based Problem Identification for Online Service Systems. 201616 Proceedings of the 38th International Con-ference on Software Engineering, Austin, TX, 14-22 May 2016, 102-111. https://doi.org/10.1145/2889160.2889232

21. 21. Xu, W., Ling, H., Fox, A., Patterson, D. and Jordan, M.I. (2009) Detecting Large-Scale System Problems by Mining Console Logs. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT, 11-14 October 2009, 117-132. https://doi.org/10.1145/1629575.1629587

22. 22. Lou, J.G., Fu, Q., Yang, S., Xu, Y. and Li, J. (2010) Mining In-variants from Console Logs for System Problem Detection. Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, Boston, MA, 23-25 June 2010, 24.

23. 23. Fu, X., Ren, R., Zhan, J., et al. (2012) LogMas-ter: Mining Event Correlations in Logs of Large-Scale Cluster Systems. Proceedings of the 31st Symposium on Reliable Distributed Systems, Irvine, CA, 8-11 October 2012, 71-80. https://doi.org/10.1109/SRDS.2012.40