ICS Continuous Hardening: Shifting the Focus to Risk-Based Strategies

Final Thoughts on ICS Continuous Hardening: Shifting the Focus to Risk-Based Strategies

Nick Cappi

This is my final blog of the year on ICS Continuous Hardening, and I wanted to make it impactful. I wondered, what should I discuss? What can I say that hasn’t been said in my October blog or my September blog on the same topic? In these situations, I think Dale Carnegie's advice holds true: “An hour of planning can save you 10 hours of doing.” With that in mind, I turned to the trusty World Wide Web for inspiration. I spent over an hour searching and reading various websites, including those of industry analysts, other vendors, media, and even revisiting our own content. Below are the areas of continuous hardening of ICS I discovered people are talking about.

Summary of Continuous Hardening of ICS

Work Process	Main Task	Action
Asset Inventory and Management	Identify and Document Assets: Create a detailed inventory of all ICS components, including hardware, software and network devices.	Update Regularly: Keep the inventory updated with any changes or additions to the system.
Network Segmentation	Segregate Networks: Divide the OT network into segments based on functionality and criticality. Implement demilitarized zones (DMZs) between IT and OT networks.	Control Access: Use firewalls and access control lists (ACLs) to restrict traffic between segments.
Access Control	Role-Based Access Control (RBAC): Implement RBAC to ensure users have access only to the resources necessary for their roles.	Regular Audits: Periodically review and update access controls to ensure they are current and effective.
Patch Management	Regular Updates: Apply patches and updates to ICS components and software as they become available.	Testing: Test patches in a controlled environment before deployment to ensure they do not disrupt operations.
Incident Response Planning	Develop an Incident Response Plan: Create and regularly update an incident response plan tailored to ICS and OT environments.	Training and Drills: Conduct regular training and simulated drills for incident response teams.
Monitoring and Logging	Continuous Monitoring: Implement continuous monitoring of ICS networks and systems for unusual activity.	Log Management: Collect and analyze logs from all critical systems and network devices to detect potential security incidents.
Security Policies and Procedures	Develop and Enforce Policies: Establish comprehensive security policies and procedures specific to OT environments.	Regular Reviews: Periodically review and update security policies to reflect changes in the threat landscape and operational requirements.
Physical Security	Restrict Physical Access: Ensure that physical access to ICS components is restricted to authorized personnel only.	Environmental Controls: Implement controls to protect against environmental threats such as temperature, humidity and power fluctuations.
Backup and Recovery	Regular Backups: Perform regular backups of critical ICS data and configurations.	Test Recovery Procedures: Regularly test backup and recovery procedures to ensure data integrity and availability in case of an incident.
Threat Intelligence and Vulnerability Management	Stay Informed: Subscribe to threat intelligence feeds relevant to OT environments.	Vulnerability Assessments: Conduct regular vulnerability assessments of ICS endpoints and their associated components.
Employee Training and Awareness	Regular Training: Provide ongoing training for employees on security best practices and emerging threats.	Awareness Programs: Implement security awareness programs to reinforce the importance of security in daily operations.
Third-Party Risk Management	Evaluate Vendors: Assess the security practices of third-party vendors and service providers.	Contractual Obligations: Include security requirements in contracts with third-party vendors.
Compliance and Audits	Adhere to Standards: Ensure compliance with relevant industry standards and regulations (e.g., NIST, IEC 62443, NERC CIP).	Regular Audits: Conduct regular security audits to verify compliance and identify areas for improvement.

It is clear that there is a lot of discussion about executing various tasks and work processes, with the assumption that everyone should be doing these things on every asset, all the time. However, there was little mention of risk. Given that resources and funds are limited and unexpected outages are unacceptable, we must balance outages, efforts and expenditures to ensure profitable, sustainable, safe and secure operations. This balance requires shifting the conversation from task execution to identifying, evaluating and prioritizing risks. Tasks should exist solely to address prioritized risks, not the other way around.

For some assets, we might need to implement all the suggested work processes and more, while for others, we may only need to do a small portion. Without understanding our risks, it is impossible to determine which actions are necessary, which are nice to have and which are not needed at all.

The primary function of any security program is to mitigate risk to an acceptable level. With a prioritized list of risks, we can address each concern methodically by choosing the appropriate work processes. It's important to recognize that some level of risk will always exist; it can never be completely eliminated.

Consider this scenario: we have a vulnerability with an Attack Vector of "Adjacent Network," Attack Complexity of "Low," and a CVSS Base Score of "9.0" (making it a "Critical Vulnerability"). It has known exploits and an associated ICS vendor-approved patch. Should we stop everything and start patching immediately? Should we patch every device with that weakness? The correct answer is - it depends.

If the vulnerability is found on a device in a connected network that serves as the configuration server for a Safety Instrumented System (SIS) linked to a critical part of the plant, we would likely apply the patch in the next patching cycle, if not sooner. However, if the same vulnerability is found on an isolated network or device, we might wait until the next outage or turnaround to apply the patch.

When dealing with vulnerabilities, the first step is justifying the need for the component with the security weakness. Next, it's crucial to ensure the risk justifies any action, as the CVSS score alone doesn’t represent risk. If the risk justifies action and the component can’t be removed, the next step is to evaluate whether to upgrade or patch. An available patch isn’t automatically the best remediation.

Finally, if the risk is worth addressing and upgrading or patching isn’t an ideal path, other remediation methods like firewall rules, access controls and whitelisting should be considered.

By applying the basic risk equation Risk = Likelihood x Consequence to your environment, you will likely discover that we are expending too much energy on tasks that do not significantly reduce risk, and too little on work processes that would have a meaningful impact.

Maybe an hour spent identifying, evaluating and prioritizing risks can save you dozens of hours in implementing ICS continuous hardening work processes.