logz.io一个企业级的ELK日志分析器 内部集成了机器学习识别威胁——核心:利用用户对于特定日志事件的反馈处理动作来学习判断日志威胁 + 类似语音识别的专家系统从各方收集日志威胁信息
2017-01-25 15:23
2536 查看
转自: 可看到它使用机器学习算法来识别DNS安全问题 http://logz.io/blog/machine-learning-log-analytics/
A Machine Learning Approach to Log Analytics
By Tomer Levy| January 19th, 2017|Blog, Log ManagementOpening a Kibana dashboard at any given time reveals a simple and probably overstated truth — there are simply too many logs for a human to process. Sure, you can do it the hard way, debugging issues in production by querying and searching among the millions of log messages in your system.
But this is far from being a methodological and productive method.
Kibana searches, visualizations, and dashboards are very effective ways to analyze a system, but a serious limitation of any log analytics platform, including the ELK Stack, is the fact that the people running them only know what they know. A Kibana search, for example, is limited to the knowledge of the operator who formulated it.
“Alexa/Cortana/Siri, What’s Wrong With My Production Environment?”
Asking a virtual personal assistant for help in debugging a production system may seem like a far fetched idea, but the notion of using a machine learning approach is actually very feasible and practical.Machine learning algorithms have proven very useful in recent years at solving complex problems in many fields. From computer vision to autonomous cars to spam filters to medical diagnosis, machine learning algorithms are providing solutions to problems and solving issues where once expert humans were required.
Supervised Machine Learning
Among the various approaches to machine learning, supervised machine learning stands out as one of the most powerful tools in the data scientist’s toolbox.Supervised machine learning is based on the idea of learning by example. The algorithm is fed with data that relates to the problem domain and meta data that attributes a label to the data. For example, the domain-specific data may be an image, essentially a set of pixels, and a label. This label may indicate that the set of pixels forms a car, a pedestrian, or an important traffic landmark. The process of assigning labels to data is referred to as “labeling,” and it plays a crucial part of obtaining good results from supervised machine learning.
Formulating the problem in this fashion enables machine learning algorithms to sift through huge amounts of data, making the necessary correlations and deducing the interdependencies between the data points.
Dealing with terabytes of log data, we at Logz.io pose this classification question: “Is this log interesting?”
An Ill-Posed Question
The question of log relevancy is not a trivial one. A log entry may prove very useful to one user and completely irrelevant to another. Moreover, in the process of data labeling, interesting logs may not get labeled correctly or at all because they were lost in the clutter.To tackle the problem of data labeling, we at Logz.io are using the below methodologies:
Use implicit and explicit user behavior. We pay attention to the ways that our clients interact with our tools. Creating an alert, viewing a log, creating dashboards and other actions are all actions during which our users indicate what is important to them.
Inter-user similarities. All of our clients are unique, and we cherish every one of them. Our moms’ reassurances notwithstanding, we are also all very similar and use the same components and, therefore, share similar log entries. Consequently, similar users may draw benefits from common labeling.
Harvest public resources such as CQA (community questions and answers) sites and others. Sites such as Stack Overflow, GitHub, and even Wikipedia contain wealths of information and host a vast pools of knowledge that can be used to evaluate the importance of logs and even propose solutions to the root problems that are indicated by these logs.
Combining data from these resources enables us at Logz.io to create a very rich dataset of labeled logs, together with meta data on the log relevance, frequency and, in some cases, information that shows how to solve the underlying issue.
Training Your Classifier
Once the necessary data — log entries and corresponding labels — has been accumulated, it is possible to construct a log classifier.Classification can be performed in many ways, and one such method is Linear Support Vector Machines (SVM). This type of classifier offers simple training and is easy to interpret by domain experts.
More information on SVM and its application to text classification can be found here:
http://www.cs.cornell.edu/people/tj/publications/joachims_98a.pdf
http://nlp.stanford.edu/IR-book/html/htmledition/support-vector-machines-and-machine-learning-on-documents-1.html
For this example, a feature vector needs to be constructed. Using short n-grams usually yields a feature space of a dimension of about 1M dimensions, which is feasible and rich enough to give good results.
Examples of such n-grams and corresponding weight coefficients are presented below. As can be seen, it is very easy to interpret the results and verify them for sanity. Positive values indicate some sort of system failure, whereas negative values indicate a log entry that does not contain an actionable, relevant, state.
unable: 0.671539714688
topic: 0.678756599452
error: 0.788508324168
connected: -0.157199772246
to provider: -0.15319903564
connected successfully: -0.15319903564
Another possibility for training a classifier is to use Random Forests, which are very useful in cases where the features are categorical (non-numerical) and do not fit linear models very well. More information about using Random Forests for classification can be found here:
https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
While seemingly trivial, this process is very powerful. It may not take a rocket scientist to tell you that “error” is a phrase that may indicate a production issue, but it is virtually impossible for even the best DevOps group in existence to find the correlations and relations between a million phrases that occur in log data. The process of feeding these vast amounts of data to supervised ML algorithms enables the machine to learn from the accumulated knowledge of hundreds of DevOps teams and hundreds of thousands of contributors to knowledge sites.
At Logz.io, we use a set of machine learning algorithms that are able to collect bits and pieces of data — mostly on what users care about in their log data — and fuse all of them together into a supervised process that trains our machine learning code. One of the most powerful parts of the Logz.io learning system is that it learns from the way in which users react to these highlighted events, enabling ongoing supervision and continuous learning.
Integration
Once the classifier was trained, it was integrated into the Logz.io pipeline. We used tools including Spark and Hadoop to run the classifier and machine learning at the scale that was required. The logs that pass the entire classification stage are labeled as “Cognitive Insights” and additional information that has been gathered in the labeling stage is attached to them. This enables Logz.io not only to highlight relevant logs to our customers but also to enrich the logs with additional information.A Classification Example
Obviously, the Logz.io learning technology is much more complicated and includes a multi-vector analysis, but we thought to share a simplified example. The following log was analyzed in our system (note that specific values have been obfuscated):1 | “Address <strong>IP_OCTET</strong> maps to <strong>URL</strong>, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!” |
The log was then passed through the augmentation module, and several relevant threads on knowledge sites were found:
http://superuser.com/questions/408080/unable-to-access-the-server-via-ssh
http://stackoverflow.com/questions/23202963/password-for-gitlab
http://stackoverflow.com/questions/23250519/check-gitlab-api-access-failed-code-302
These online resources indicate that contrary to the log text, it is more likely to be a DNS issue than an actual security threat.
The system then displays the log and the data to the user in an informative way:
Summary
Utilizing a machine learning approach to log analytics is a very promising way to make life easier for DevOps engineers. Classifying relevant and important logs using supervised machine learning is just the first step to harnessing the power of the crowd and Big Data in log analytics. Adaptive log clustering, log recommendation, and some other cool features are coming soon, so stay tuned!Logz.io is an AI-powered log analysis platform that offers the open source ELK Stack as a cloud service with machine learning technology. Learn more about ourCognitive Insights technology or create a free demo account to test drive the entire platform for yourself.
相关文章推荐
- 利用Rsyslog集中收集系统日志和用户操作记录以及相关处理方法
- 利用Rsyslog集中收集系统日志和用户操作记录以及相关处理方法 推荐
- windows 不能在本地计算机启动apache2。有关更多信息,查阅系统事件日志。如果这是非Microsoft服务,请与服务厂商联系,并参考特定服务错误代码1
- MVC框架——学生信息管理系统(多表,多事务如何处理,一个用户如何共用一个Connection连接)
- UNIX环境高级编程学习之第六章系统数据文件和信息 用链表的形式读出一个服务器的远程用户登入登出信息
- windows 不能在本地计算机启动apache2。有关更多信息,查阅系统事件日志。如果这是非Microsoft服务,请与服务厂商联系,并参考特定服务错误代码1
- 3.【练习题】构造方法与重载 定义一个网络用户类,要处理的信息有用户ID、用户密码、email地址。拓展:判断密码长度
- 在系统启动时至少有一个服务或驱动程序产生错误,详细信息,请使用事件查看器查看事件日志
- 在系统启动时至少有一个服务或驱动程序产生错误。详细信息,请使用时间查看器查看事件日志
- “windows不能在本地计算机启动Apache Tomcat. 有关更多信息,查阅系统事件日志。如果这是非Microsoft服务,请与服务厂商联系,并参考特定服务错误代码0.”
- 机器学习: 专家系统、认知模拟、规划和问题求解、数据挖掘、网络信息服务、图象识别、故障诊断、自然语言理解、机器人和博弈等领域。
- mysql dba系统学习(10)innodb引擎的redo log日志的原理 mysql dba系统学习(11)管理innodb引擎的redo log日志的一个问题
- 在系统启动时至少有一个服务器或驱动程序产生错误。详细信息请用事件查看器查看事件日志。
- 机器学习: 专家系统、认知模拟、规划和问题求解、数据挖掘、网络信息服务、图象识别、故障诊断、自然语言理解、机器人和博弈等领域。
- 基于深度学习神经网络等机器学习技术实现一个医学辅助诊断的专家系统原型
- Windows 不能在 本地计算机 启动 SQL Server (MSSQLSERVER)。有关更多信息,查阅系统事件日志。如果这是非 Microsoft 服务,请与服务厂商联系,并参考特定服务错误代码 3417。
- windows 不能在 本地计算机 启动 Apache Tomcat 6.0。有关更多信息,查阅系统事件日志。如果这是非 Microsoft 服务,请与服务厂商联系,并参考特定服务错误代码 0。
- 最大堆---实现一个简化的搜索提示系统。给定一个包含了用户query的日志文件,对于输入的任意一个字符串s,输出以s为前缀的在日志中出现频率最高的前10条query。
- Windows 不能在 本地计算机 启动 OracleDBConsoleorcl。有关更多信息,查阅系统事件日志。如果这是非 Microsoft 服务,请与服务厂商联系,并参考特定服务错误代码 2。
- mysql dba系统学习(11)管理innodb引擎的redo log日志的一个问题