Tony Turner

Building Resilient Layered Defenses with a Cyber FMEA

Read
Cyber Informed Engineering

Topic:

Cyber Informed Engineering

Cyber Informed Engineering a methodology proposed by US Department of Energy to establish Secure by Design thinking in the engineering process to achieve Critical Function Assurance. It leverages 12 core principles focused on reducing the consequences of failure for an organization's most critical functions.

Using Failure Mode effects Analysis (FMEA) as a means for validating security defenses and understanding and preparing for failure of tools.

Article content

Use a Cyber FMEA to understand how your security products actually work, and embed this understanding into your own security operations plans.

The Need for Resiliency in Defense

One of the core principles in the Department of Energy Cyber-Informed Engineering (CIE) is the concept of Resilient Layered Defenses. The idea is that we should assume compromise and ensure that the defenses we rely on to protect us will fail in predictable ways and that when they do fail, we still have sufficient protection to meet our mission objectives. This allows us to minimize the opportunity for a single failure to negatively impact our critical processes or create undesirable downstream impacts.

OK, this all sounds good, and a little bit like zero trust architecture. Well, except that zero trust is not really what the name implies and tends to shift the trust decisions to more robust elements of your architecture. What we are really talking about in this article, is how we can ensure a defense in depth approach, or at the very least, identify when and how defenses can fail so we are prepared with mitigating controls.

Introduction to FMEA

One popular way to manage this, is through the concept of a purple team exercise, where controls are implemented and then tested to validate their effectiveness. That is certainly one valuable approach. But another less talked about way to accomplish this is through the concept of a FMEA, or failure mode effects analysis assessment.

The concept of FMEA is not new, and in fact is one of the elements taught in Six Sigma curriculum, but their roots go back to the 1940s in the US military to reduce quality variances in munitions. We can apply a similar approach to the evaluation of cybersecurity controls, but first let’s look at what is involved in a FMEA assessment:

STEP 1: Deconstruct the process

  • Break the process into manageable chunks using a process flow diagram and split the process into multiple subprocesses if it becomes too complex.
  • List each process component in the FMEA table. See the example below.
  • [Security] – How does your control work? What are the requirements? The dependencies? Track network and data flows, deconstruct the security control in much the same way as a process. You need to go very deep here, the deeper the better.

STEP 2: Identify potential failure modes

  • Look at existing historical data or interview subject matter experts on the process to document all the ways that the process might fail. Perform this activity at every step in the process flow.
  • Do not stop at just one failure condition, identify all that you can, be as exhaustive as possible.
  • [Security] – Prior security incidents, product level vulnerabilities, research on protocols used in the security control, and even innovative ideas on how you might circumvent that one step in the security process. For instance, if a firewall will trust packets coming from an internal host, are there ways you could trick that host from requesting an outbound connection? Don't make assumptions about how your product works either, different vendors implement RFCs in strange ways sometimes.

STEP 3: Document the potential effects of each failure

  • How does this failure create a consequence for the process? This captures every applicable way that impact might be created if this step in the process fails.
  • [Security] – What happens if your control fails? What happens if a given step in the control fails? For instance, does your intermediation control fail open or fail closed? Is this desirable? Have you assessed the operational impacts of denying all traffic through that device?

STEP 4: Assign a Severity Rating

  • For every instance where a failure might occur, what is the severity of that impact?
  • [Security] – If the firewall stops processing traffic, is it a big deal? How critical is it? For a process control, if we stop doing it, will anyone notice? Perform this severity analysis for every failure condition for every step in the control process.

STEP 5: Assign Occurrence metrics

  • This is really a likelihood indicator. What is the expected rate of occurrence? Its better if you have data of course to know how frequently failures occur.
  • [Security] – This is really where there may be a point of contention in your FMEA process depending on your approach. For CIE, we are assuming compromise, so the failure has already occurred in our scenario, but how prevalent is this failure? It is still helpful to understand this.

STEP 6: Assign Detection metrics

  • Do you have controls to detect the failure before it occurs? Are their leading indicators? For instance, perhaps a motor emits a higher pitch sound or has other audible indicators before it fails
  • [Security] – Likewise, can you predict an imminent failure in one of your security controls? Perhaps you know that when a memory buffer fills over 97%, that a device crash is imminent. Can you poll metrics on that device to detect the failure before it occurs? What about number of packets, or known bad packets that violate RFCs in specific ways that create unpredictability?

STEP 7: Calculate the Risk Priority Number or RPN

  • The Risk Priority Number is derived by the calculation of Severity * Occurrence * Detectability and uses a value from 1 to 10 for each of the three factors.

STEP 8: Create your prioritized Action Plan

  • Use the RPN for each failure to understand where to prioritize your efforts. The highest RPN will be where you start.
  • Create a project plan identifying a set of action items, owners, dependencies, and due dates to address the failures

STEP 9: Take action

  • Execute on your plan, and take note of any lessons learned

STEP 10: Calculate the resulting RPN

  • Re-evaluate each of the potential failures after you have implemented the plan and determine if another iterative cycle is required or if it is time to move on to the next lowest RPN. Make sure to recalculate the RPNs for the failures you have addressed, considering the new controls, their impact on occurrence and detectability and any severity mitigations.

A Sample Cyber FMEA for a Firewall

While not an exhaustive review of all the steps for a firewall, here are a few that came to mind as I was authoring the article. A proper Cyber FMEA should be far more exhaustive, and in real-world scenarios where I have performed this type of analysis, I typically plan for this to take several days, including some RFC research time, interviews with stakeholders, and reviews of control product documentation. Ideally including some hands-on time with the product being reviewed, but this is not always feasible. The numbering scale we will use here is on a 1 to 10 basis, with ten being the most severe.

Cyber FMEA Objective: Identify failure modes in a firewalls ability to mitigate malicious traffic

No alt text provided for this image

Recommended Actions from the Cyber FMEA

Once you have conducted your Cyber FMEA, you need to create an action plan that aligns to the recommendations that are the output of the assessment. By aligning the same scoring methodology, you can start to see how your efforts are addressing the failures, or where gaps occur that may require risk acceptance or other risk management measures. By understanding where your weaknesses are, you are better prepared to respond and recover quickly, even if you cannot remediate them immediately.

Using the FMEA RPNs above, we will prioritize which failures to address first. For this example, we will only look at the top 3 failures, ignoring the Fragmentation Attack and the Payload Smuggling for the time being, until we can tackle the more serious concerns.

RPN 800 – Any/Any Rule at Top of Rules – This one is by far the most concerning and the easiest to fix. The problem is this firewall was never properly implemented. By performing a firewall rule review and requiring a business justification for each entry in the firewall, we can drastically reduce the problem with this one. This may be more of a project than a task if this has never been done. It will likely require many conversations with various business stakeholders to accomplish.

RPN 160 – Log Injection Attacks while not terribly frequent, are serve in consequences since they can compromise the firewall administrator machine and gain elevated permissions. Likewise, they are not difficult to detect, but may require logging on the administrators actual desktop to identify when this is occurring. It may be difficult to completely mitigate this one depending on how the administrator interacts with the logs. If this is done via a web interface or built-in firewall administration interface, may require product vendor modification. But most products will support a CLI interface, that in almost all cases will mitigate this attack. Therefore, this is the recommended action we will take, and in parallel work with the vendor to address the lack of untrusted script blocking.

RPN 128 - Regex Mismatch may be challenging to mitigate at the firewall level unless there are heuristics capabilities available. This attack largely stems for an over-reliance on signature-based detection. Mitigating this may necessitate a newer firewall with these capabilities, or better correlation rules within the SIEM. One key challenge as you start to utilize behavior-based detection methods, is they tend to be very computationally expensive and you may find that you cannot enable all the features you want to unless you drastically oversize your new appliance.

All the above – A separate set of recommended actions we will explore seeks to reduce the severity of these attacks. By implementing the following desktop level protections, each of these scenarios becomes lessened in severity:

  • Restrict administrator access – use lesser privileged accounts for normal system usage
  • Apply allow-listing technologies to ensure that only authorized applications are allowed to be used
  • Apply content filtering and IP reputation for outbound connections
  • Backup and encrypt all mission critical data to reduce the impact of ransomware-based scenarios
  • Implement an incident response plan to ensure that if any of these scenarios occur, the organization can quickly respond and recover
No alt text provided for this image

You will note that because of these actions, the top two were greatly reduced in risk, while the upgrade to a new firewall, despite the firewall vendor sales advice, resulted in far less, but still significant, value in reducing the risk of these failure conditions. In closing, these RPNs would be further reduced by the above host level controls, and if you extend this activity further, you may find many other opportunities for failure reduction through changes in process or configuration that provide far better risk reduction than investment in expensive security products.

For more information on how to build a Resilient Layered Defense or on CIE in general, get in tough with the Opswright team.

Building Resilient Layered Defenses with a Cyber FMEA

Tony Turner

Founder, CEO

Experienced cybersecurity executive 30+ years, Author of SANS SEC547 Defending Product Supply Chains and Software Transparency.

Author's page