Skip to main content

Operations & Maintenance

Mostly Leading Indicators - Safety System Health and Activation

In the previous editions of this series (linked below), we’ve examined the importance of having multiple layers of protection to prevent loss of primary containment (LOPC) events. We’ve discussed how metrics can monitor the health and performance of the automated control loop and alarms with operator response layers of protection. Both of those layers of protection can be considered leading indicators of escalating events. The performance of those systems can be looked into while the process is in operation, and these layers support the operator’s ability to handle abnormal conditions. The result? Any disturbances in the process can be addressed, returning the process to normal or optimal processing conditions. 

Activation of Instrumented Safeguards - Interlocks and Shutdown Systems 

If the control loops don’t handle the disturbance and the alarm/operator intervention doesn’t return the process to normal operating conditions, the next layer of protection is the activation of instrumented safeguards like interlocks and shutdown systems. These systems are designed to activate before the process reaches and exceeds hazardous limits. These limits can be chemical, explosive, reactive or mechanical, etc. In short, these systems are designed to prevent disturbances and abnormal conditions from escalating to undesirable (often catastrophic) consequences.  

Instead of attempting to keep the process running, these systems are designed to shut the process down safely. Once you’ve shut the process down, you’ve reached a point of loss. There is no longer the possibility of avoiding some level of consequence. Ideally, these consequences are typically economic because of an interruption in the process, potentially environmental because of increased emissions from flaring, and some reputational because of limited community impact. A tower of flames shooting out of a flare, for example, is not quiet and is very noticeable.  

Leading Indicators of Performance Metrics Compiled During Operations 

How could I suggest that these are the “ideal” consequences of this layer of protection? Because failure of this layer of protection allows escalation to Tier 2 and Tier 1 LOPC events. These events tend to be reported nationally and often involve significant property damage from fires and explosions, injuries, fatalities and the shutting down of roadways and ship channels, alongside shelter-in-place orders. With that comparison, the expense of no longer producing a product is the more desirable condition. 

This layer of protection can have both leading indicators of performance and lagging indicators of performance. Leading indicators of performance involve metrics that can be compiled during operations, such as: 

  • Frequency and Duration of Bypasses – A bypass or defeat of this layer of protection is sometimes an unfortunate necessity in certain situations, like startups. Metrics around the frequency and duration of bypasses and defeats can ensure bypasses are not forgotten about and left in place indefinitely.
  • Health of Mitigating Controls – Whenever a bypass is required, a series of additional mitigating controls are put in place to reduce the risk of a possible hazard escalating to a consequence because the protective layer was in a bypass state. For example, if an alarm/operator response is listed as a mitigating control, monitor the frequency and status of that alarm, and check if it got shelved. The metric type depends on the mitigating control documented by the bypass work process.
  • Health of Bypass Work Processes – Every bypass requires a work process and systems around it to ensure the risk of implementing the bypass is effectively managed. Metrics around this management process are needed to ensure appropriate reviews, approvals and shift-to-shift communications. From my own experience, it is undesirable to find out that a review process to prevent an incident hasn’t been done because the old dot matrix printer broke years ago, and there were no spare parts.
  • Test Records – Shutdown systems are designed based on assumptions of how often the devices involved will fail to perform their task. There are assumed rates on the inputs, the logic solver and the final elements. These assumptions are a foundational piece to determine what level of redundancy is needed to mitigate the risk of failure. Comparing failure rates in “as found” test data to the assumptions in the design documentation provides insights into whether the system is over or under-designed. Identifying under-designed systems and improving them reduces the risk of failure. 

When Lagging Indicators Become Leading Indicators to Prevent Escalations  

The activation of a safety system is more of a lagging indicator than a leading one. After all, the consequences of the shutdown can no longer be avoided; it just happened. However, they can be considered leading indicators to prevent escalations to higher consequence events like Tier 1 and 2 LOPC events.  

  • Demand Rates – Much like test records, safety systems were designed based on an assumption of how often the process would place a real demand against them. Understanding how different the actual demand rate is from the assumed demand rate can expose higher risks than anticipated that needs to be addressed.
  • Demand Rates by Cause – The need for an instrumented safety function is most commonly identified during a Process Hazard Analysis (PHA) or Hazard and Operability (HAZOP) study exercise. The protective function is designed to mitigate identified scenarios. During the post-event investigation, being able to determine the scenario involved in the trip and comparing it to the identified scenarios can uncover scenarios not considered by the PHA/HAZOP team. Metrics around demand rates and the root cause scenario are a way to measure the effectiveness of the PHA/HAZOP process, as well as compare the assumed scenario likelihood.
  • Safety System Performance – Similarly to testing processes, whenever a safety system activates, it can generate metrics around how well it performed. Did the valves close? Was it closed in the time specified in the safety requirements specification (SRS)? Does the process data after the trip indicate that the valve maintained isolation? Metrics designed to answer these questions, and more, can lead to greater insight and identification of risks that would otherwise go unnoticed. 

Safety systems perform an essential function by ensuring conditions don’t escalate into high-consequence LOPC events. While not the most leading of indicators, it is still a valuable source of measurable performance to assess the overall health of the layers of protection in a facility. Combined with other leading indicators, they provide a comprehensive view of all the systems in place to prevent conditions from escalating. Monitoring, understanding and acting upon leading indicators of process safety is key to solving problems while they are small and before they get the chance to escalate to higher consequence events. 

Want to learn more or get in touch with us? 

About the Author

Brian Nixon received a Bachelor of Science in Chemical Engineering with a minor in Computer Science from Rose-Hulman Institute of Technology. His early career was as a process/plant engineer in the process industries, including agricultural processing, specialty chemical manufacturing and plastics compounding. He then transitioned into control systems consulting, business development and software product management roles. He is currently a Senior Strategy & Enablement Consulting Lead with Hexagon's Asset Lifecycle Intelligence division.

Profile Photo of Brian Nixon