Skip to main content

Operations Risk Management

Leading Indicators – Alarms and Operator Interventions

In an operating facility, if your automatic control loops cannot handle a process disturbance, the next layer of protection is an alarm with an associated operator action. The standard definition of an alarm is “An audible and/or visible indication to the operator of an equipment malfunction, process deviation or abnormal condition requiring a timely response.” The alarm is a call to action; it’s an intentional interruption to the operator to adjust process conditions to return the process to a normal operating state. Typical operator actions are to adjust the output of control loops, start/stop equipment, adjust valve lineups, clean out clogged filters, etc. 

A well-performing alarm system ensures that this layer of protection is robust. If the alarm and associated operator response are unsuccessful, the next layer of protection is typically shutdown systems or interlocks. When the shutdown layer of protection activates, you’ve reached the point of loss. You’re no longer producing the intended product, and the plant/unit/equipment has shut down. Performant alarm systems alert, inform and guide operators to address process abnormalities and prevent unintended shutdowns. 

The field of alarm management exists to ensure the alarm system layer of protection functions as intended. The standards (ISA 18.2 and IEC 62682) and the associated technical reports discuss multiple metrics to fully understand the health of the alarm system. A complete discussion of those standards/reports is beyond the scope of this blog. Below are some of the highlights for the most common metrics to ensure the health of the alarm system. All alarm system metrics are based on the span of control of a single operator position. 

  • Average and peak alarm rates The overall rate of alarms presented to an operator is a good high-level health metric. If alarms are presented at a rate faster than an operator can respond to, they are forced to ignore them, which degrades this layer of protection significantly. 

  • Number of chattering and fleeting alarms Nuisance alarms decrease the operator’s confidence in the alarm system. They effectively “cry wolf,” leading the operators to become comfortable ignoring them. The Chemical Safety Board (CSB) has specifically mentioned nuisance alarms, leading operators to ignore actual issues in incident reports.  

  • Time spent in an alarm flood - Alarm floods occur when the alarm system generates alarms significantly faster than an operator can respond. During an alarm flood, the alarm system does not help the operator understand what is going on. It becomes a noise-making distraction, not a helpful tool. 

  • Count and duration of alarm shelving/suppression - Heavy use of shelving and alarm suppression hides alarm system issues from other metrics without addressing the root causes. Suppression and shelving become the rug under which many other alarm system problems are swept 

  • Count and type of operator actions - Understanding how frequently operators adjust process parameters can reveal insight into how well the control system supports them. High numbers of controller mode, setpoint or output changes can indicate poorly performing loops and high numbers of other changes can indicate a need for improved automation of frequent tasks. 

A comprehensive set of metrics around the alarm system and operator actions is key to understanding the health of this barrier. There might be a tendency to roll multiple metrics into a single performance classification for the entire system. However, most of the published classification methods can lead to wildly different alarm systems ending up in the same performance category. This can obscure the system's true performance and create a false sense of well-being. 

Automatic control loops and alarms with operator actions form the first two layers of protection against process upsets. Ensuring these layers of protection are robust and healthy gives operations the best chance to handle disturbances and avoid unplanned shutdowns from activating safety systems and interlocks. 

Want to learn moreCheck out additional resources and keep up to date on What's New in Operations & Maintenance (hexagon.com) 


About the Author

Brian Nixon received a Bachelor of Science in Chemical Engineering with a minor in Computer Science from Rose-Hulman Institute of Technology. His early career was as a process/plant engineer in the process industries, including agricultural processing, specialty chemical manufacturing and plastics compounding. He then transitioned into control systems consulting, business development and software product management roles. He is currently a Senior Strategy & Enablement Consulting Lead with Hexagon's Asset Lifecycle Intelligence division.

Profile Photo of Brian Nixon