Criticality / Probability of Failure / Factors Impacting Probability of Failure

Factors Impacting Probability of Failure

Prior to assigning probability of failure ratings to each asset, system staff will want to create a probability of failure rating structure. Once the rating structure is created, system staff will rate the probability of failure of each asset using the rating structure. When assigning a probability of failure rating to an asset there are four failure modes to consider: mortality, financial inefficiency, capacity and level of service. Mortality failures occur when an asset physically fails or stops performing its intended function. When costs of operation and maintenance activities, as well as repairs, are considered on an individual asset basis, it is possible to determine the point at which it no longer makes economic sense to keep an asset in service. This is the point when it is actually cheaper to replace the asset than to continue to operate and maintain it. Thus, the asset fails due to financial inefficiency. The final two failure mechanisms occur in assets that may be functioning properly but are no longer providing either the physical capacity needed or the service level desired.

Mortality is the most common failure mode for gray assets. However, for green assets, the concept of asset failure, especially failure via mortality, often differs from gray assets. Most green assets degrade rather than completely fail. All four failure modes do occur for green assets, but there is more often a level of service failure rather than a failure due to mortality. Assessing probability of failure for green assets can also be challenging because many green assets have living, complex and dynamic components and sufficient data about the asset may not be available. Additionally, the probability of failure for gray and green assets will likely take different paths over time. Gray assets typically have a finite life, and with data and good record keeping, that life cycle can be estimated. As equipment wears, gray assets will require increased operations and maintenance, but this is not necessarily the case for many green assets. Green assets are designed to become more resilient and effective as vegetation matures and adapts to local resource cycles. The flexible and adaptable nature of green assets extends life cycle and often their performance level over time. Green assets often appreciate in value rather than depreciate, and thus cannot be easily replaced if a significant failure occurs. Performance may eventually diminish, but if installed properly with routine maintenance, green assets generally strengthen and improve–unlike gray assets. However, green assets often have an indeterminant lifetime, which makes it challenging to gauge probability of failure and life cycle. For example, trees in a forest surrounding a watershed can last decades if not over a century. However, trees in an urban environment have a much shorter life. In Denver, they have found their trees last approximately ten years. As the asset management practice with green infrastructure matures, systems will have a better idea of various asset lifespans and potential likelihood of failure. For now, it is necessary to make the best educated guess, and thus evaluating risk and managing risk for green assets can be more challenging. Systems must look at all four failure modes in order to make informed judgements about the probability of failure across all types of assets. Below are descriptions of the four failure modes and a list of common factors influencing each. Systems should review these modes and factors when assigning a probability of failure rating to their assets.

Mortality Failure – the asset physically fails or is unable to perform its function. This mode of failure is what typically comes to mind when someone mentions an asset failure. Mortality can occur due to the natural deterioration of an asset over time, or it can be sudden and unpredictable, such as an asset being run over by a car. Active assets usually arrive at mortality for different reasons than passive assets. Active assets usually have moving parts, so the asset fails with use rather than time. Failure for active assets is related to overall run time, frequency of starting and stopping, and frequency and skill of maintenance. Passive assets usually fail with time because they are in constant service. However, their failure times vary significantly depending on the environment where they are located. Similar assets in the same system can vary in deterioration based on their physical location and its characteristics.

For the most part, green assets are passive assets. Failure for passive assets is related to soil characteristics, bedding conditions, asset material, exposure to weather and other environmental factors. Some examples of mortality failures include leaks, stuck valves, vegetation or soil stripped during a storm event and plastic pipe failure from exposure to extreme temperatures and thermal expansion. Below is a list of factors that impact the probability an asset will fail due to mortality. Click on the plus sign to expand the box and read more about each factor. These factors can be used when considering what probability of failure rating to assign an asset.

Asset Condition

As an asset consistently performs its function, the asset’s condition will begin to deteriorate. As the asset’s condition deteriorates, it will be more likely to fail. This pattern is somewhat different for green assets using living plants. Initially, the condition may be lower but it improves over time as the vegetation becomes established. As vegetation remains established, the condition of that asset may stay the same or continue to improve. However, without proper maintenance or due to environmental factors (drought, disease, soil deterioration) these green assets can be subject to condition deterioration as well. It is important, therefore, to make the best possible attempt to give the assets a reasonable condition assessment and update it at least annually. Assets with better condition ratings will be less likely to fail than those with lower condition ratings.

Repair History

Staff should track the repair history of an asset including when and where the repair occurred and why it was needed. This information should be as specific as possible to assist staff in understanding how mortality occurred. Past failure is not completely predictive of future failure, but it can provide some indication of the probability of future failure, especially if detailed information on the past failures is collected and reviewed. Other information can also be tracked to improve system knowledge such as how the failure was determined, details about how the mortality occurred and field observations that may help explain the failure. Work orders and maintenance reports provide insight into what can happen with a given asset, and they can also help a system figure out how much downtime is associated with mortality failure.

Operations and Maintenance History

Knowledge of how the asset was operated and maintained will provide information about how likely the asset is to fail. The lack of adequate maintenance is likely to shorten an asset’s useful life and cause premature failure. Similarly, knowledge of how the asset was operated will help determine the asset’s likelihood of failure. An asset that is subject to harsh operation (e.g., frequent starts and stops, operation outside of design specifications) is more likely to fail than one that is operating within design standards.

Asset Useful Life

Over time, assets can deteriorate. This is always the case for gray assets. However, some green assets improve over time as vegetation grows and becomes more established. There is no “magic age” at which an asset can be expected to fail. An asset’s useful life is highly related to the conditions of use, the amount of maintenance, the original design, construction techniques, and the type of material used in construction. The useful life estimates developed in the Current State of the Assets component of Asset Management help provide information on the probability of failure. If the asset is close to the end of its useful life, it can be expected to fail. If it is far from the end, it is much less likely to fail. Useful life is a better metric to use than actual age of the asset because it is more indicative of the end of life. In some cases, very old assets can still have a significant amount of life left and low probability of failure compared to newer assets, depending on many other factors.

Major Potential Disruptions

The likelihood of some natural hazard events are relatively well‐established in existing datasets and studies though climate change is adding a layer of uncertainty. For example, climatic and hydrologic studies provide reliable estimates of the probability of large floods based on documented past events. Systems can plan for plausible future frequencies and magnitudes of future hazard events like floods or droughts or other events that may lead to gray and green asset mortality. Another major potential disruption to consider that could lead to asset mortality is power failures. Although these may be unpredictable, if they have happened multiple times in the past they should be factored into the likelihood of failure for the future.

Experience with a Similar Asset

Although probability of failure is asset and site specific, some guidance regarding probability of failure can be gained by examining experience with similar assets. If similar assets experienced failure via mortality in the past, that should be investigated and considered in the likelihood of mortality failure for current and future similar assets.

Historical Knowledge

If staff have any additional knowledge regarding the asset and what might prevent it from performing its function, it should be considered when assigning the probability of failure rating. This type of information may include knowledge of poor design, construction or manufacturing practices used at the time the asset was installed or knowledge of inadequate materials or vegetation used.

Capacity Failure – the asset is still operational but is unable to provide the physical capacity needed. Some examples of capacity failure are combined sewer overflows, fluvial flooding, precipitation exceeding drainage, reduced flow, inadequate well, a storage tank with insufficient capacity, and exceedance of soil capacity. The magnitude of stormwater discharged in, and around, urban areas often approaches unmanageable levels that can overwhelm the capacity of gray water infrastructure. Green assets are often installed to expand urban stormwater management capacity, but they too can experience similar capacity failures if the quantity of stormwater exceeds design capacity. Assets may also experience capacity failure due to improper planning or design. For example, poorly designed bioswales can result in insufficient stormwater management capacity which may result in local flooding or the inability of an asset to keep up with infiltration. These types of capacity failures can result in the decrease of water quality as additional pollutants are added to a system. Capacity failure also occurs if there is a lack of adequate flow or a significant reduction in source. Human use of recharge zones and over pumping can cause a reduction of discharge to streams and lead to declining or depleted aquifer levels.

Asset Condition

As the asset’s condition deteriorates, it may not be able to process/hold/treat the same capacity. For example, corrosion on the inside of pipes can reduce the cross-sectional area of the pipe. It is important, therefore, to give the assets a reasonable condition assessment and update it at least annually.

Experience with a Similar Asset

Although probability of failure is asset and site specific, some guidance regarding probability of failure can be gained by examining experience with similar assets. Other similar assets will give staff an idea about the capacity an asset can process.

Demand and Population

Demand and population are constantly fluctuating, at least a small amount, but occasionally there may be significant increases or decreases. A sudden increase in demand or rapid population growth can result in reduced pressure, inadequate supply or inadequate storage capacity. This may also occur if a large water user or wastewater customer is added to the service area, causing a rapid increase in demand.

Historical Knowledge

If staff have any additional knowledge regarding the asset, it should be considered when assigning the probability of failure rating. This type of information may include knowledge of design, construction or manufacturing practices used at the time the asset was installed or knowledge of materials or vegetation used. Capacity failures may have occurred previously during certain types of events or in certain conditions. For example, a failure in a sewage pump station that occurs every time the rainfall is more than 1 inch per day.

Major Potential Disruptions

Some parts of the country or areas within a city/town are more prone to major disruptions. These disruptions include natural hazards (e.g., earthquakes, hurricanes, flooding, fires), power failures, equipment failures, or transmission main failures and can result in a reduction in system capacity, supply pressure and changes to the flow. For example, if a major fire occurs there may be an increase in sediment/erosion for several years which could increase treatment costs and threaten a system’s raw water storage capacity.

Useful Life

Over time, some assets lose capacity due to sediment buildup, corrosion scale, or deterioration. There is no “magic age” at which an asset can be expected to experience capacity failure and it often happens gradually. An asset’s useful life is highly related to the conditions of use, the amount of maintenance, the original design, construction techniques, location within the system, and the type of material used in construction.

Level of Service Failure – the asset is still operational but is unable to meet the level of service required. Something physical does not have to happen to the asset for a level of service failure to occur. Some common level of service failures relate to the following categories: safety, water quality, recreation, habitat, and biodiversity. Changes in regulations or changes in customer demands and desires can also trigger this mode of failure. For example, customers may want to upgrade the water utility to provide fire flow, but the system currently has 4-inch pipe in the main transmission lines. The system would have to remove its 4-inch pipe, even if it is functioning well, and install 6-inch or larger pipe to accommodate the desire for fire flow. Another example of a level of service failure is a well that has contaminant levels, such as arsenic, nitrate, or radionuclides, that do not meet current regulations. The system may choose to abandon the well and drill a new one in an area with lower levels of the contaminant. Level of Service failures are closely tied with Level of Service goals. To review Level of Service goals, click here. For green assets, level of service failures are often subtle and occur gradually over a longer period of time compared to gray assets. For example, as a watershed undergoes development, the nonpoint source pollutant load may increase along with a gradual increase in erosion. These factors will reduce the water quality and the green asset may reach a point where it is no longer providing an acceptable level of service. However, in these types of situations, determining the point at which the asset is no longer providing the desired service level can be challenging and hard to quantify because of the gradual nature of the change.

O&M History

Knowledge of how the asset was operated and maintained will provide information about how likely the asset is to fail. If an asset is not maintained, it may not be able to meet the level of service required but still be operational. For example, if trash or sediment is not periodically removed from a green asset, the asset may still filter some water but it can no longer hold or filter the capacity necessary.

Repair History

Staff should track the repair history of an asset including when and where the repair occurred and why it was needed. This information should be as specific as possible to assist staff in understanding how a level of service failure occurred. Past failure is not completely predictive of future failure, but it can provide some indication of the probability of future failure, especially if detailed information on the past failures is collected and reviewed. Other information can also be tracked to improve system knowledge such as how the failure was determined, details about how the level of service was not met and field observations that may help explain the failure. Work orders and maintenance reports provide insight into what can happen with a given asset and they can also help a system figure out how much downtime is associated with a level of service failure.

Asset Design/Capabilities

The level of service required may exceed the design capacity or capabilities of an asset. If this is the case, the system may have to replace the asset with one that can meet the level of service needs. A level of service failure can also occur if the asset is routinely operating outside of its design specifications, whether the operation is above or below the design standard. Assets operate optimally when they are within design specifications.

Customer Desires or Expectations

The level of service desired by customers may exceed the capabilities of an asset. If this is the case, the system may have to replace the asset with one that can meet the desired level of service, if it is determined that the system wants to meet these standards. It is also possible for customer expectations to change over time. Customers may desire a higher level of stormwater management now than they did in the past. Therefore, assets that met service levels previously may no longer do so.

Asset Condition

As the asset’s condition deteriorates, it is less likely to perform at a high level thus reducing its level of service. As the asset continues to deteriorate, it will eventually reach a point where it is below the acceptable level of service. It is important, therefore, to give the assets a reasonable condition assessment and update at least annually so interventions can occur before the asset performance drops below the acceptable level of service.

Experience with a Similar Asset

Although probability of failure is asset and site specific, some guidance regarding probability of failure can be gained by examining experience with similar assets. If similar assets experienced level of service failures previously, it is an indication that those same failures may occur with similar assets in the future.

Useful Life

Over time, assets can deteriorate. There is no “magic age” at which an asset can be expected to fail, but it may drop below the acceptable level of service prior to reaching the end of its useful life. An asset’s useful life is also highly related to the conditions of use, the amount of maintenance, the original design, construction techniques, and the type of material used in construction. Useful life may have a role in determining when an asset’s performance drops below an acceptable level of service. However, it is not likely to be the most influential factor.

Major Potential Disruptions

Some parts of the country are more prone to major disruptions, including natural hazards like earthquakes, hurricanes, flooding or fires that can reduce the level of service of assets to the point of failure. For example, forest fires may result in a watershed with poor water quality and high turbidity source waters. Although these events are hard to predict, there are patterns in their occurrence and conditions that make them more likely to happen. In these cases, system staff should factor the inability to meet level of service during these events into an asset’s likelihood of failure. In some cases, such as short-term power outages, the level of service failure may be brief, while other types of events may cause longer lasting failures of this type. Another potential cause of disruption is development within the service area. Development can cause gradual loss of erosion control and impact source water quality, leading to level of service concerns. These types of impacts should be considered in probability of failure assessments if they have been experienced in the past, or if conditions are prime for them to occur in the future.

Financial Inefficiency – the asset is costing so much to operate and maintain that it is no longer economical to keep it in operation. When costs of operation and maintenance activities, as well as repairs, are considered on an individual asset basis, it is possible to determine the point at which it no longer makes economic sense to keep the asset in service. This is the point at which it is actually more efficient and cost effective to replace the asset than to continue to operate and maintain it. This is the case for both green and gray assets. To make a valid determination, data collected over an extended period of time is required. For systems just beginning Asset Management, it is unlikely that this level of detailed, historical data on operations and maintenance will be available on an individual asset basis. It may take time to get a program in place to track this information. It can be challenging to fully assess this failure mode for green infrastructure because many green assets have not been in service long enough for staff to have collected the necessary data to assess financial efficiency. Systems with green and gray assets who do not currently collect the relevant data to make this determination should consider doing so in the future. In the absence of asset-specific data, qualitative determinations can be made, or the determination can be based on current information only as a starting point.

O&M History

Knowledge of how the asset was operated and maintained will provide information about how likely the asset is to fail. The frequency and cost of O&M determines whether an asset is financially inefficient. If an asset requires frequent or costly maintenance because of its condition or poor design/installation, then it will reach a point where it is financially inefficient to keep it in operation.

Repair History

Staff should track the repair history of an asset including when and where the repair occurred and why it was needed. This information should be as specific as possible to assist staff in understanding if repairs are frequent and costly enough that it is more cost effective to replace the asset rather than continuing to repair it. Past failure is not completely predictive of future failure, but it can provide some indication of the probability of future failure, especially if detailed information on the past failures is collected and reviewed.

Replacement History

Knowledge of the cost and frequency with which the asset has been replaced will provide information about how long the asset will likely last before replacement becomes necessary. This information will assist staff in understanding if replacement was considered the more financially reasonable option in past instances. This type of analysis is more feasible for short-lived assets, such as pumps that may be replaced fairly frequently, than for long-lived assets, such as pipe.

Financial Records

If the replacement, repair or O&M history has not been well documented, utilize historical financial records to get the best information possible.

Asset Condition

As the asset’s condition deteriorates, it will likely need more O&M or repairs. It is important, therefore, to give the assets a reasonable condition assessment and update it at least annually. Keeping up to date on an asset’s condition will better inform financial decisions (whether to repair, rehabilitate or replace the asset) later.

Useful Life

Over time, assets can deteriorate. When comparing similar assets, the assets in service longer are likely to experience more repairs than those in service for a shorter period of time. As the asset remains in service, the time between repairs or routine or preventive maintenance activities may shorten, thus increasing the cost of the asset. This situation is likely to occur as the asset nears the end of its useful life, although it is highly dependent on the asset type and the specific circumstance of use. As the asset nears the end of its useful life, the calculation of financial inefficiency changes. At that point, additional expenditures on the asset are only going to enable the asset to stay in service for a short period of time (until it reaches the end of its useful life.) Therefore, major investments in repair or maintenance may not be worth it and the replacement may be necessary.

Experience with a Similar Asset

Although probability of failure is asset and site specific, some guidance regarding probability of failure can be gained by examining experience with similar assets. If similar assets have experienced financial inefficiency failures, it is more likely that this asset may as well. However, differences in circumstances between the similar assets and this asset must be taken into consideration to decide how likely this asset is to fail in this manner. These might include amount of use, operating environment, and design specifications.

Creating POFProbability of Failure ratings will guide users through the creation of probability of failure ratings; the ratings should be consistent so that different people are likely to arrive at the same or similar ratings.