The present Security State of many networks is a pretty sad situation. We are regularly seeing breach discovery times in excess of 200 days, with the discoveries often made by external parties. Those figures are based on the breaches that we know about. I’d would suggest those figures are just the tip of the iceberg.

This is a dreadful situation which simply says that many organisations DO NOT HAVE either sufficient ‘visibility’ into their internal infrastructure, or are not able to effectively process, correlate or analyse the data which does exist.

There are many people in the industry openly stating that the attackers have the advantage. I would not try and argue this point, but there is a lot that can be done. If we view security technologies from a Force Multiplier perspective, there are some technologies which provide only a marginal benefit (compliance activities perhaps.. IMHO), while others provide a very significant advantage to the defender.  

I believe that Security Analytics has the potential to have a profound effect on the security business and provide the defenders a very significant advantage. Effective Analytics providing detection capability, should enable a reduction in those statistics from hundreds of days to hours or minutes.

In the last few years we have seen an explosion in Big Data technology with many Open Source tools now being freely available. The scene is young and changing rapidly. But there are many opportunities for people in Security roles to gain exposure to these technologies. While some investment is required, it is possible to enter this domain at low cost.

At present Security Analytics tools are in their infancy. There are a lot of security companies using the buzzwords of Data Science, Machine Learning (ML) and Artificial Intelligence (AI), with very little to no detail on how they are being used or what capabilities are achieved. In reality most are just performing Correlation and basic statistics. With that said, those activities in themselves are very worthwhile. Coupled with some good visualisations, there is a lot of value in doing just those two things.

To lift the hood on some of the terms used in the Security Analytics Domain;

  • Statistics – is quantifying numbers.
  • Data Mining - discovering and explaining patterns in large data sets.
  • Anomaly detection - detecting what is outside of normal.
  • Machine Learning – learning from and making predictions on data through the use of models.
  • Supervised Machine Learning – The initial input data (or training data) has a known label (or result) which can be learned. The model then learns from the training data until a defined level of error is achieved.
  • Unsupervised Machine Learning – The input data is not labelled and the model is prepared by deducing structures present in the input data.
  • Artificial Intelligence -  automatic ways of reasoning and reaching a conclusion by computers.

Mathematical skills in Probability and Statistics, including Bayesian Models, as well as Linear Algebra are heavily used in these domains.

Today there are an increasing number of security data and telemetry sources available for analysis. These include various security logs from hosts, servers and network security devices such as firewalls, IDS/IPS alerts, flow information, packet captures, threat and intelligence feeds, etc. As network speeds and complexity has increased, so has the volume of the data. While there is a vast amount of security data available, identifying threats or intrusions within this data, can still be a huge challenge.

From my recent research into this space, I can conclude Security Analytics is a hard and complex problem, with the necessary algorithms being literally rocket science. To build any sort of Security Analytics toolsets, it is essential that detailed security domain knowledge be coupled with a knowledge of Big Data and Data Science technologies. There are currently very few people who possess both skill sets, so forming small teams will be essential. While this is a big and somewhat complex field, this fact should not put people off starting. Like any new technology, there will be a learning curve.

Suggestions going forward - I always like to provide some actionable recommendations out of any discussion.

Before you can analyse the data, you need to have the data and easy access to it.

 

Establishing a Security Data Lake.

To address the storage of security data, some organisations are now creating a centralised repository known as a Security Data Lake. This should not be seen as an exercise in replacing SIEM Technology, but an augmentation to these systems. On this topic, I would refer people to an excellent free O’Reilly publication by Raffael Marty, located at;

http://www.oreilly.com/data/free/security-data-lake.csp

Data Lakes are often Hadoop clusters or some other NoSQL database, many of which are now freely available. Establishment of a Security Data Lake should be a starting point.

 

Look to closely monitor your ten to twenty most critical servers.

There needs to be a starting point and monitoring a set of key servers is an excellent and practical starting point.  There are many statistics that can be monitored – root/admin logons, user usage statistics, password resets, user source addresses, port usage statistics, packet size distribution, and many others. Start by visualising this data and use it as an operational tool. Security Analytics will mature over time, getting started provides operational experience that will only grow over time.

Apache Metron ( http://metron.incubator.apache.org ) and PNDA ( http://pnda.io ) are two Open Source projects which could potentially be a starting point for your organisation. Both are worth a serious look.