To everyone who attended my presentation on Friday 26 March 2021 at the Hilton Brisbane, thank you for the opportunity.

The presentation material can be downloaded from Deciphering Zero Trust .

Deciphering the Zero Trust Architecture - Video 1 – An Overview 

This video provides an overview of the Zero Trust Architecture. What is it about? What problems does it address? Why may it a be beneficial to my organisation?


Deciphering the Zero Trust - Architecture - Video 2 – Architectures  

In a first video I attempted to decipher the Zero trust Architecture - what it was about and the benefits it provided. This second video in the series moves onto a deeper coverage of the potential architectural approaches.

I have always thought of Zero Trust (ZT) as a nonsensical term as it is simply not possible to operate any IT infrastructure without some level of trust. It is interesting to note that Gartner have also recently referred to Zero Trust as ‘misnamed’ and have developed their own framework – CARTA, which stands for Continuous Adaptive Risk and Trust Assessment. The fact that Gartner have taken this position endorses the fact that Zero Trust in principal, can bring very real benefits.

The intention of this post is to explain what Zero Trust means, the benefits it can provide, while also highlighting some of the practical considerations for those looking to implement a Zero Trust initiative. 

The term Zero Trust has very much become a buzzword even though many in the security industry struggle to articulate what it actually means. Zero Trust was initially conceived by John Kindervag during his time as Forrester Research (note that he is now at Palo Alto Networks continuing to promote it there).  While a number of perspectives of Zero Trust exist including many vendor marketing spins, fundamentally its about ensuring that trust relationships aren’t exploited. It is not about making an untrusted or high-risk environment trusted, or achieving a trusted state, some common misconceptions I hear. It’s about avoiding trust as a failure point.

Kindervag has argued that the root cause of virtually all intrusions is some violation of a trust relationship. This can mean; 

  • A miscreant gaining access to an openly accessible system through no or insufficient network segmentation 
  • A legitimate user exceeding their intended authority, i.e. privilege or access abuse 
  • A compromised system jumping from one system to another, i.e. east-west attack propagation
  • Compromised account credentials being used to easily facilitate the above scenarios. 

Hence Zero Trust proposed a framework which assumes of no level of trust in the design process. There is certainly good logic in that fundamental assumption, although it will come at a cost. 

In practice, implementing Zero Trust or CARTA requires very tight filtering and deep inspection of traffic flows coupled with a strong user identity function. Much of the deployment of Zero Trust is based on Network Segmentation and utilising network security technology such as Next Generation Firewalls to deeply inspect and enforce traffic flows crossing trust boundaries. Additionally, establishing a trusted Identity for both users and administrators through techniques such as Multi-Factor Authentication (MFA) is also a vital element.

Its here where the practical considerations start. I have seen many organisations struggling with their existing, usually dated segmentation models and tightly associated firewall rule bases. In the majority of cases this infrastructure has been in place for well over a decade (or longer) and has been expanded as the application infrastructure has grown organically (usually through several generations of administrators). Being polite, most have not grown well. In many cases, these messes are both complex and costly to operate often sucking valuable funds from stretched security budgets. In most cases they are no longer an effective solution, often only there to provide a false sense of security or a compliance tick-box.

Today most corporate IT infrastructures are large, support multiple business critical applications with have highly complex transaction flows and dependencies. Understanding this type of environment is a non-trivial task. Let alone re-engineering such an environment against near continuous uptime requirements.

So, what can be achieved? How can organisations proceed?

I’d suggest the place to begin is by identifying the organisations Ten (or so) most valuable information assets, or alternatively a set of potential candidates. Start small and don’t get too ambitious too quickly. Then monitor the traffic flows to those applications and understand them. A number of technologies and/or techniques can be used including an application-aware Next-Generation Firewall located in front of the candidate systems or applications. Monitoring tools including those which ingest Netfow can also be highly effective. I would strongly recommend whatever solution is used it should also be integrated with the organisations Identity system, be that Active Directory or other systems such as Cisco’s Identity Service Engine (ISE). Linking identity into the monitoring provides not just application visibility, but ‘context’ of ‘who’ is accessing the application. For example, why is a Building Management System accessing the Finance System?

Based on what is discovered, and assuming it isn’t to onerous (which may be a big assumption), then the applications can be located within their own security zone such as a Secure Enclave and tight application-level, Identity-Aware filtering constructed, both in and out. This is the heart of Zero Trust. The objective is to not just control what gets in, but also to ensure that data exfiltration from the key assets can’t occur. For example, how often do we hear of Credit Card Databases being easily FTP’ed out of an organisation? ZT typically recommends additional inspections such as IPS, Network AV and Day-Zero Malware interception are deployed at trust boundaries.

So far so good..

If you are fortunate enough to have only a small number of key assets to protect, then you can rinse and repeat this approach (which for the purpose of this post is fairly oversimplified). Tightly protecting your organisations crown jewels is a valuable initiative, if this can be achieved, you’re in a far better position.

However, most environments I see are not this fortunate and its here where things start to get a whole lot harder. 

Firstly, if you are embarking on a larger scale ZT initiative, then it is essential to invest in an Analytics solution which can provide visibility into the traffic flows between all systems/applications/workloads on scale. Maybe some ultra-well organised organisations have achieved this on a spreadsheet, but these are very few in my experience. Of those that have, it requires significant human resource, is always error prone and changes frequently.

OK, so above I recommended identifying and placing critical applications into security zones with tight filtering at trust boundaries. For larger and more complex environments, it’s the same fundamental principal – (1) Expanding it and (2) on-scale being the big differences. It’s here that the tooling becomes essential to identify application dependencies and subsequently create an optimal zone structure. In my experience this is a key area of any ZT design as the complexity of the trust boundary filtering will be dependent on the quality of the zone structure. You want to get this part right. 

A key design complexity is that just because two systems are talking to each other does not mean that the conversation is always benign. Techniques such as ‘Living off the Land’ attacks are designed to mimic legitimate conversations. You can’t just blindly build a set of access-controls on this basis. Human sanity checking is needed. It is this exact problem which has made it so hard for Machine Learning algorithms to identify nefarious activity – in a mass of good what do the small amount of bad conversation look like?

Anyhow, this post has grown larger than I intended. I would like to discuss the use of ML algorithms in solving the Optimal Zoning and Scale issues I noted above. I’ll leave that for a Part Two.


In the last week we have seen a spectacular report out of Bloomberg in relation to malicious hardware implants within Supermicro server motherboards. The implications of this report are potentially huge. However, the technical details disclosed are minimal and a large number of unanswered questions exist.

Personally, I subscribe to the adage of “where there’s smoke, there must be at least some amount of fire”. Subsequent reports have claimed that it was not just Supermicro motherboards affected, but that the problem could be far more widespread affecting other vendors as well. With all that said, I acknowledge that this whole situation has not been substantiated and there is a chance it could be inaccurate, grossly exaggerated, or completely false. However, for the purpose of this post let’s put that debate aside and assume that the reports are correct.

The first point I would like to make is that hardware inspection is a highly specialised field and there are currently very few vendor organisations either experienced or equipped to perform this work. This means that for the vast majority of organisations, hardware inspection is not going to be a viable option.

So, I want to discuss options for network monitoring and visibility. But first the problem.

From the information available to date, it has been suggested that the malicious modifications have been made to the management controller of server motherboards, Cisco calls this an IMC (Intelligent Management Controller), Dell call it a DRAC (Dell Remote Access Controller) and HPE calls this ILO (Integrated Lights Out). All of these devices are essentially a small computer that controls the computer. There is a long history of these devices being notoriously insecure. Furthermore, these management controllers have access to just about every aspect of the server’s hardware providing them more control over the hardware than the operating system itself.

Compromised hardware only takes the attacker so far. At some point the malicious hardware will need to communicate over the network to a Command and Control (C&C) server. Depending on the nature of the implant, malicious communication attempts could originate from either compromised management controller interface, from the server’s operating system, hosted virtual machines or any/all of the above.

In my mind this situation further makes a compelling case for the deployment of network monitoring and analytics. The key issue now is the ability to detect malicious traffic and respond quickly in the event of such an occurrence, whether that attack has stemmed from either a hardware implant, or through unrelated but still malicious activities.

What are my recommendations and the options.

A critical point is that server management controller interfaces should not be routable to the internet. I recommend that they are segmented and are only allowed to communicate with the minimum number of workstations needed to support the operation of those devices. i.e. how many people really need access to the management controller? If you can isolate the kill chain at this point, it is highly likely an attacker won’t be able to gain control and further progress an attack. If server management ports must connect to something on the big bad internet, then enable it very selectively. 

At a network level, I would recommend the deployment of a Sinkhole on the management network. A sinkhole is a part of the network that attracts all traffic which has no other legitimate destination. Sinkholes are an infrequently deployed but incredibly useful for attracting all sorts of traffic which could either be the result of misconfiguration, or, of key interest here, malicious traffic. Once traffic is routed into a sinkhole there are many tools which can be used for analysis. I realise that’s a bit light on in technical detail, but I will aim to publish a subsequent Blog post on Sinkholes in the next week or so.

Let’s now talk about available network based monitoring options. 

When we start talking about monitoring options, the first call out is firewall logs. Assuming the management network is segmented, then whatever firewall is in place will be capable of connection logging. There are many examples where detailed evidence of an attack has been collected in the firewall logs. If you aren’t collecting and archiving your firewall logs, then this is recommendation one. And if you think I’m ‘stating the bleeding obvious’ – you would not believe the number of organisation who fall into the category of ‘people who should know better’, who don’t do this. If this is your organisation and it gets compromised, any forensic investigation will be both exponentially harder and exponentially more expensive! Ignore this advice at your peril.

Netflow - I have been a huge fan of Netflow as a security tool for many years. Netflow is the networking equivalent of a telephony Call Detail Record (CDR). At an IP level it records who spoke to who, how much, and for how long. Like firewall logs, flow records can be exported, collected, analysed and archived. A ley point is that for security applications, you must use full-flow Netflow to capture all conversations at a point-in-the-network as opposed to sampled Netflow.

Analysis tools – Many both commercial and open source tools are available which can be used for both log and Netflow analysis. I won’t call out any commercial options or discuss SIEMs, but the Elasticstack (formerly the ELK) is a very robust and widely deployed open source option. If you have nothing, Elasticstack is a good place to start.

Full Packet Capture – This approach captures all traffic that passes some through some point-in-the-network. I’m not going to elaborate on it too much as it’s a costly approach and generally reserved for serious organisations. However, I will mention one approach I have seen some organisations deploy. It is the use of a full packet capture card that collects in the order of a day to a weeks data in a circular buffer. In the event of an incident being detected the available full packet capture can be copied and stored for investigation. 

What are we looking for?

If we use a Cyber Kill Chain as a foundation, then we wish to look for any evidence of those attack stages within an attack lifecycle. This can range from beacons to a C&C server, download of additional malware or most importantly achievement of the ultimate objective, exfiltration of data (at which point you’re probably pretty screwed). And we must also assume that any potential attack traffic is going to be encrypted. That is another more in-depth topic, but lets just say it makes monitoring the contents of a traffic stream very difficult. 

Geo-Location – Is a widely available feature. It can provide a very quick indication of the termination country of a connection’s remote endpoint. So, if you were to see a connection from inside your network, connecting to a suspicious country, that’s something that requires investigation.

Threat Intelligence – The sheer number of active threats in general including malicious destinations on the Internet is well beyond the vast majority of organisations to track. This is where the use of Threat Intelligence comes in. If anything inside your organisation (and that includes Cloud Infrastructure) speaks to a known malicious internet endpoint, then you want to know about it. Threat Intelligence comes in a variety of forms, including both Open-Source and commercial feeds. The key objective is to correlate the information received from a reputable feed (or feeds) with the traffic ingressing and egressing your network. The goal of Threat Intelligence usage is to quickly identify a malicious event within what will typically be a mountain of network traffic. 

Threat Intelligence works on the assumption that someone has seen an attack previously. So if this is a unique or first time attack, it probably won’t help, but there will be a lot of cases that have been seen before making it a valuable tool.

As the primary focus of this post is compromised server hardware, then monitoring the communication habits of management controller ports should be a key focus. If these devices start trying to talk to unexplained destinations (including trying to resolve unexplained destinations), then prompt investigation is required. 



I have not written on this topic lately and thought it time to do an update. People may remember a couple of years ago I was very excited by the prospect of utilising Machine Learning (ML) and Big Data Analytics in solving security problems. While there are a number of Use Cases successfully using ML, solving many other security problems with machine learning is turning out to be very hard. I’ll come back to that part later, but let me start by providing an overview of what I’m seeing in the market and this technology domain. 


My first observation is that we currently appear to be at ‘buzzword saturation’, particularly around the topic of Artificial Intelligence (AI) applied to security. I am seeing a lot of people and vendor marketing people in particular using the term AI very liberally. If we consider the Encyclopaedia Britannica definition - “artificial intelligence (AI), the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.”, then I don’t believe any true AI security product exist today. 


With that said, there have been some very significant advances using a number of related technologies in certain security applications. When vendors talk about using AI, in most cases it likely means they are using some form of ML or statistical analysis…. And, done right, that can still be incredibly useful. Couple that with the fact that there are many freely available ML tool sets. These include TensorFlow, Keras, PyTorch, scikit-learn, just to name a few. So, accessing the technology is not difficult.


The biggest mindset shift which has occurred in the security domain in the last 5 years is the acceptance that a purely preventative strategy is insufficient given the sophistication of many attacks. A preventative strategy needs to be complemented with a detection and response capability.  It is here that these technologies can play an important role.


However, the difficulty with ML in many security applications is its reliance on large amounts of labelled data for the algorithms to 'learn'. For many applications, that labelled data doesn't currently exist on the scale that is required. While it has been used successfully in some areas, it is still very early days for most security application areas.


So, what are the key Use Cases?


The two most prominent uses of ML techniques are in Malware Classification and Spam Detection. Both of these have successfully utilised Supervised ML due to the fact that in both cases very large amounts of labelled training data have been available. By that I mean, a human has previously classified the samples, a bit like an image recognition system is trained by feeding it a huge number of pictures of animals with the correct names attached as the label. In the case of Malware classification, ML has worked very well as most new malware is usually an adaptation of some previous or current malware family. Hence the common attributes can be detected using ML approaches. Spam detection works on a similar principal.


There is a lot of promising work occurring in the area known as ‘automating the Level-One analyst’. A notable project in this area is the AI^2 project developed at MIT. For most organisations, the sheer volume of security log messages today is beyond what any human can process. The AI^2 system processes log data looking for anomalies and uses the input of human analysts to train the system. As more training data is fed into the system, the more its operation is fine tuned to identify legitimate security events. While currently the system only achieves about 85% accuracy, this can be highly effective in distilling mountains of data into more useful events that an analyst can investigate. The key element however is the need for the human analysts to train the system. So, don’t expect this or other systems to extrapolate new conclusions without being explicitly trained.


Then there are unsupervised ML techniques. Most commonly this means Clustering. With unsupervised learning there is no knowledge of the categories of the data or even if it can be classified. While it is very successfully used in some specific toolsets such as DNS record analysis and processing Threat Feeds, at present I have not seen any high-impact solution purely based on unsupervised techniques. Going forward, I believe unsupervised ML will play an important role, but as a part of a larger system or combined with other ML techniques.


Neural Networks are a hot topic. These are systems which have been designed to mimic the operation of the human Brain. They are being used in many applications today, most notably in Image Recognition, Speech Recognition and Natural Language Processing. For these systems to operate they require labelled training data and often in huge quantities to perform accurately. Again, I have not seen any significant applications of Neural Networks specific to security at this point in time.


A key personal interest area in this field is in analysing network based Netflow Data records to detect attacks. Netflow is the networking equivalent of a Call Detail Record in the telephony world. You know who spoke to who and for how long, but not the contents of each call. The approach is highly scalable. Learning a ‘known good’ network traffic profile is much harder than it appears on the surface for many reasons. A key one is that virtually any network of any size will have something bad or anomalous happening at any point in time. Without this known good baseline, identifying anything bad is very difficult. 


The action is not just happening with the good guys. We are starting to see evidence of ML based tools being embedded in malware with an objective of maximising their impact. In the last month, we saw proof-of-concept code called ‘ DeepLocker’ (as in Deep Learning) demonstrated at Black Hat USA. The code spies on the user and learns their behaviour allowing the ransomware to be triggered by any of a variety of learned conditions. If this is a taste of what’s to come, the security community needs to prepare to face a new level of ML-powered attacks.


Where will it all go. Today, experts suggest that any task that can be easily be performed by a human in about 1 second, is a candidate for automation through AI techniques. A lot more to come in this space I believe.


In conclusion, don’t expect AI to come to the rescue for a while yet. Human Experts are essential to lead security operations and security projects. Given the current skills shortage, an investment to develop those key people into, or maintained as, experts, is an initiative that every business should take very seriously. Look after these people and complement them with an investment in these newer technologies which can make their job easier. For the foreseeable future, experts on staff are still today’s most vital asset.