Skip to main content

Build Confidence in the Integrity of Your Backups

Background 

Configuration data is the crown jewel of Industrial Control System (ICS) Operational Technology (OT). The increase of ransomware attacks on these highlights the importance of having robust backup and recovery policies, procedures and guidelines. Often, ICS engineers are confident in their system backups integrity. But at what level? Considering the consequences of not having a reliable backup when the control system is compromised, 100% confidence is needed in the backup and recovery process. 

In managing Operational Technology Systems, engineers perform two critical cybersecurity tasks well before the OT cybersecurity concept exists; configuration management and backup and recovery. Essential at all phases of the control system lifecycle, these tasks are often initially performed in an ad hoc manner. Procedures are then developed and rationalized as system operations mature.  

Without configuration management, the control system logic quickly regresses into an unreliable and unmanageable complex state. Without backup and recovery management, a corrupted or compromised database may cause significant financial consequences. These tasks complement each other with backups enabling rollback when configuration changes go wrong, and configuration management tools enabling the identification of corrupted or missing files in the system backups.   

The Challenge 

The three main components of backup and recovery are policies, procedures and verification. ICS engineers have always performed routine backups, often without formal procedures in place. Typically, these procedures are based on the control system vendors recommendations, with backup frequency strategically determined to achieve the stated recovery time objectives and recovery point objectives. Policies are required for the effective governance process of procedures that generate complete and comprehensive backups. Lastly, verification ensures compliance to the policies, procedures and the integrity of the backups.  

So, We Are Sorted? What Can Go Wrong? 

Numerous events impact ICS/OT availability. These can include human error, hardware failures, software bugs, untested patches, natural disasters, malware, ransomware or sabotage. To manage these potential situations, ICS engineers implement security controls and countermeasures to mitigate risks. Although, it is difficult to cover every scenario and there is no guarantee that a security control will be 100% effective. Backup and recovery systems are a priority and important to any Cybersecurity Management System (CSMS). The events that compromise OT configuration are difficult to quantify and predict. Therefore, proactive pre-event scenario planning is crucial.  

It is common to become complacent with these policies and procedures, so verification is essential to ensure compliance and to confirm the integrity of the backup files. ICS engineers are often confident that their system backup procedures are under control. High frequency backups, multiple backup media types, onsite backups, offsite backups and off network backups can lead to a false sense of security. Complacency is a risk. For example, ICS engineers may assume the backups are performed on a regular schedule, but the responsibility may lie with others. I have often observed uncertainty around who was performing backups, the location of the backups and how the backups are tested/verified.  Unfortunately, it is not uncommon to find situations where the backup media has been overwritten, corrupted, doesn’t contain the required files, or where the backup stored in the data safe is older than expected.  

The OT Configuration Management Database 

ICS engineers require an up-to-date backup of the control system database. In addition to the regular daily/weekly/monthly/annual backups, we also perform ad hoc backups when making configuration changes. Essential to enabling a reset when things go wrong, configuration management requires a backup before the change (as found) and a backup after the change (as left).

Alongside the backups, the “as found” and “as left” traditionally included printouts of the logic before and after the changes. This provides visibility, on a granular scale, of changes in case trouble shooting is required. An OT Configuration Management Database (OT CMDB) provides this electronically to enable a quick search. This reduces the time to troubleshoot or backtrack, if required. If current backups are compromised, then rebuilding the system with old backup data will require details of all the changes made since. Configuration management tools aid with these recovery efforts.  

How confident are you on the integrity of your backups; that they are complete and not compromised? This is a common challenge, and often tested only when things go wrong. Backup and recovery software checks integrity during the process and ensures a higher level of confidence, also considering the consequences of an extended loss of production in the event of system failure. A full system restore to a sandbox environment is commonly used to test the integrity of the backups, but it is time consuming and not performed regularly enough. Especially with large heterogenous complex OT systems, the ability to validate every subsystem through regular restoring to a sandbox engineering environment is not practical. 

The OT CMDB is not a backup and recovery solution per se. However, it complements the backup and recovery system, providing a level of validation, comparison and identification of changes between backups files to give visibility into any discrepancies. Corrupted or missing files in the backup are quickly recognized. When all else fails, the OT CMDB contains the OT configuration files which can (and have) been used to restore OT system data, lowering reduce mean time to recovery and decreasing the consequences of prolonged unplanned downtime.  

Definitions 

Backup and Recovery: Enables rapid restoration of control system operations in the event of a worst-case scenario. Supports in-depth forensic analysis. Captures full configuration backups to speed recovery.” 

Configuration Management: Monitors for unauthorized changes to control strategies, device inventory, asset configuration and logical and graphical files. Automates remediation actions via workflows based on asset value and risk, guiding operations, compliance and cybersecurity responses. Establishes configuration baselines for OT cybersecurity, compliance, governance and operations change monitoring."

About the Author

Chris O'Sullivan has over 30 years of experience in Industrial Automation and Control Systems. His experience includes developing business-wide alignment around operational technologies and cybersecurity strategies for risk management, governance, safety and operational and commercial objectives. Expertise in the execution of OT cybersecurity solutions, cyber risk management, and development of OT cyber policies and procedures.

Profile Photo of Chris O'Sullivan