Skip to main content

OT/ICS Cybersecurity

Removing Complexity in OT System Management

 

We know the adage "You cannot protect what you can't see." But often, when we think of Operational Technology (OT) inventory, we tend to focus on hardware. OT inventory includes not only hardware components, but also firmware, software and configuration files. OT inventory is essential to identify and manage critical assets, to understand the relationship between these assets and to provide full visibility of the OT system. With this visibility, we can manage, secure and control the OT system effectively. 
  
Working in OT security, I find resilience is a common theme - things go wrong and we implement plans to recover quickly. With Distributed Control Systems (DCS), we implement redundant systems and highly available server clusters. With Safety Instrumented Systems (SIS), we implement layers of protection based on failure rates and consequences. With OT security, we employ defense-in-depth strategies based on threat, likelihood and consequences. We know systems fail, so measures are put in place to reduce the failure rates and develop procedures to recover quickly when impacted.  
  
On a recent flight, an audiobook from the inflight entertainment sparked my interest. The subject was resilience. While the topic focused on resilience when facing personal challenges, many of the concepts hold true for the resilience of industrial control systems. The author borrowed from the concept of differentiating "complex systems" from "complicated systems". With complex systems, a change to an item has unknowable consequences, i.e., the butterfly effect. However, with complicated systems, we can know the impact of any change. The key difference between a complex system and a complicated system is therefore that a complicated system can be documented. An OT system is complicated, but without complete documentation, it should be considered complex. 
  
The complicated system concept is important to OT. Early in my career working with a legacy 1980's control system, I found a way to document and store the system configuration in an electronic format. I was on call 24/7 to support multiple customers from various industries with various models and versions of control system technology. When called to support a breakdown recovery, I was able to troubleshoot offline and identify the fault before heading to the site. I was not unique in creating this database. During that period, many control system engineers developed their own toolkits to manage and troubleshoot their systems. The more complicated the system, the more reliant we were on the documentation for troubleshooting and, more importantly, for understanding the impact of any modification. Documenting a system helps with bringing the configuration back from the edge of complexity. 
  
Certainly, there are many complex OT systems keeping control system engineers awake at night. Not only for the difficult task of troubleshooting but also for certainty that a critical spare is available in the event of failure. I have experience salvaging a component from less critical systems to get a plant back up and running because the critical spare had not been identified. With a complex system, resilience is difficult, and maintenance is reactive rather than proactive, leading to unstable processes or unplanned shutdowns. Another adage, "If it ain’t broke, don’t fix it", is not practical when considering complex systems that are inherently unreliable, difficult to manage and lack resiliency. Even more so, these undocumented complex systems can be difficult to troubleshoot, especially at 3:00 a.m. with a production manager awaiting results and anticipating a fast resolution.

With complex systems, production managers are often reluctant to invest budget to improve reliability as even a small change can mean many hours of downtime and many days of unreliable operations. Therefore, often, we accept known challenges because the solution is too risky or expensive. Today, with an OT configuration management solution, OT systems can be documented automatically and provide the level of data and insight to efficiently manage these complicated systems and build resilience through visibility of the OT System inventory, i.e., hardware, software, firmware and configuration files. With complete OT asset visibility, we can effectively perform control system management activities such as:  

  • Vulnerability management 

  • Risk management 

  • Policy and compliance reporting 

  • Forensic analysis and rapid recovery 

  • Incident avoidance 

  • Configuration anomalies 

  • Personnel productivity 

  • Obsolescence management 

  • IIoT and digitalization data integrity 

OT systems will always be complicated. However, with the right automated solution that provides the required documentation, they do not have to be complex.