An Update on Machine Learning and AI in Security (Sept 2018)

Originally Published: 09 September 2018

I have not written on this topic lately and thought it time to do an update. People may remember a couple of years ago I was very excited by the prospect of utilising Machine Learning (ML) and Big Data Analytics in solving security problems. While there are a number of Use Cases successfully using ML, solving many other security problems with machine learning is turning out to be very hard. I’ll come back to that part later, but let me start by providing an overview of what I’m seeing in the market and this technology domain.

My first observation is that we currently appear to be at ‘buzzword saturation’, particularly around the topic of Artificial Intelligence (AI) applied to security. I am seeing a lot of people and vendor marketing people in particular using the term AI very liberally. If we consider the Encyclopaedia Britannica definition – “artificial intelligence (AI), the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.”, then I don’t believe any true AI security product exist today.

With that said, there have been some very significant advances using a number of related technologies in certain security applications. When vendors talk about using AI, in most cases it likely means they are using some form of ML or statistical analysis…. And, done right, that can still be incredibly useful. Couple that with the fact that there are many freely available ML tool sets. These include TensorFlow, Keras, PyTorch, scikit-learn, just to name a few. So, accessing the technology is not difficult.

The biggest mindset shift which has occurred in the security domain in the last 5 years is the acceptance that a purely preventative strategy is insufficient given the sophistication of many attacks. A preventative strategy needs to be complemented with a detection and response capability. It is here that these technologies can play an important role.

However, the difficulty with ML in many security applications is its reliance on large amounts of labelled data for the algorithms to ‘learn’. For many applications, that labelled data doesn’t currently exist on the scale that is required. While it has been used successfully in some areas, it is still very early days for most security application areas.

What are the key Use Cases?

The two most prominent uses of ML techniques are in Malware Classification and Spam Detection. Both of these have successfully utilised Supervised ML due to the fact that in both cases very large amounts of labelled training data have been available. By that I mean, a human has previously classified the samples, a bit like an image recognition system is trained by feeding it a huge number of pictures of animals with the correct names attached as the label. In the case of Malware classification, ML has worked very well as most new malware is usually an adaptation of some previous or current malware family. Hence the common attributes can be detected using ML approaches. Spam detection works on a similar principal.

There is a lot of promising work occurring in the area known as ‘automating the Level-One analyst’. A notable project in this area is the AI^2 project developed at MIT. For most organisations, the sheer volume of security log messages today is beyond what any human can process. The AI^2 system processes log data looking for anomalies and uses the input of human analysts to train the system. As more training data is fed into the system, the more its operation is fine tuned to identify legitimate security events. While currently the system only achieves about 85% accuracy, this can be highly effective in distilling mountains of data into more useful events that an analyst can investigate. The key element however is the need for the human analysts to train the system. So, don’t expect this or other systems to extrapolate new conclusions without being explicitly trained.

Then there are unsupervised ML techniques. Most commonly this means Clustering. With unsupervised learning there is no knowledge of the categories of the data or even if it can be classified. While it is very successfully used in some specific toolsets such as DNS record analysis and processing Threat Feeds, at present I have not seen any high-impact solution purely based on unsupervised techniques. Going forward, I believe unsupervised ML will play an important role, but as a part of a larger system or combined with other ML techniques.

Neural Networks are a hot topic. These are systems which have been designed to mimic the operation of the human Brain. They are being used in many applications today, most notably in Image Recognition, Speech Recognition and Natural Language Processing. For these systems to operate they require labelled training data and often in huge quantities to perform accurately. Again, I have not seen any significant applications of Neural Networks specific to security at this point in time.

A key personal interest area in this field is in analysing network based Netflow Data records to detect attacks. Netflow is the networking equivalent of a Call Detail Record in the telephony world. You know who spoke to who and for how long, but not the contents of each call. The approach is highly scalable. Learning a ‘known good’ network traffic profile is much harder than it appears on the surface for many reasons. A key one is that virtually any network of any size will have something bad or anomalous happening at any point in time. Without this known good baseline, identifying anything bad is very difficult.

The action is not just happening with the good guys. We are starting to see evidence of ML based tools being embedded in malware with an objective of maximising their impact. In the last month, we saw proof-of-concept code called ‘ DeepLocker’ (as in Deep Learning) demonstrated at Black Hat USA. The code spies on the user and learns their behaviour allowing the ransomware to be triggered by any of a variety of learned conditions. If this is a taste of what’s to come, the security community needs to prepare to face a new level of ML-powered attacks.

Where will it all go. Today, experts suggest that any task that can be easily be performed by a human in about 1 second, is a candidate for automation through AI techniques. A lot more to come in this space I believe.

In conclusion, don’t expect AI to come to the rescue for a while yet. Human Experts are essential to lead security operations and security projects. Given the current skills shortage, an investment to develop those key people into, or maintained as, experts, is an initiative that every business should take very seriously. Look after these people and complement them with an investment in these newer technologies which can make their job easier. For the foreseeable future, experts on staff are still today’s most vital asset.

An Update on Machine Learning and AI in Security (Sept 2018)

What are the key Use Cases?

Quick Links

Get the latest updates and news. Keep me informed.

© Neon Knight 2023