HomeCriticality / Probability of Failure / Creating POF Ratings

Creating Probability of Failure Ratings

 

To calculate the criticality of an asset, the probability of failure of that asset needs to be quantified. Systems can quantify the probability of failure by creating a probability of failure rating structure that includes a numeric rating as well as a description of each rating. One of the best ways to develop these standardized criteria is to engage a cross-section of system personnel who have different viewpoints and different experiences with the assets. Staff should choose a rating scale, such as 1 to 5 or 1 to 10, and keep the descriptions broad enough so the ratings can apply to any assets (gray and green) in the system. Creating these ratings does not have to be a long, time-intensive activity. A small system should be able to complete the process of developing the rating structure by meeting a few times for a few hours each. A larger system may take longer and may wish to have a more expansive rating system.

Once developed, staff will use the structure to assign a rating to each asset based on how likely the asset is to fail. Three examples of possible rating structures are shown below. Two use a 1 to 5 scale and the third a 1 to 4 scale with the lowest number being the lowest probability and the highest number being the highest probability.

Table 1. Probability of failure ratings
with descriptions.

Rating
1
Description
Very Low Probability of Failure
Rating
2
Description
Low Probability of Failure
Rating
3
Description
Moderate Probability of Failure
Rating
4
Description
High Probability of Failure
Rating
5
Description
Very High Probability of Failure

Table 3. Example of a corporate level probability of failure rating. 

Rating
Very unlikely
Description
The event could happen but probably never will (less than 10%). Unlikely to occur within a 12-month period.
Rating
Unlikely
Description
The event could happen but very rarely (10%-50%). Might occur at some time in a 12-month period.
Rating
Likely
Description
The event could happen sometime (50%-90%). Will probably occur at some time within a 12-month period.
Rating
Very Likely
Description
The event could happen at any time (more than 90%). A strong probability of multiple occurences within a 12-month period.

Table 2. In-depth probability of failure ratings with descriptions.

Rating
1
Description
Asset is brand new or like new. Failure not anticipated within the foreseeable future.
Rating
2
Description
Asset is not brand new but shows no more than cosmetic signs of wear and tear. Asset failure is not anticipated in the near future. The asset receives regular maintenance.
Rating
3
Description
Asset shows signs of wear but has not yet entered a potential failure state. Asset has the potential to be maintained at a level 3 for some period of time if the proper maintenance is completed and repairs are made. Asset may show light rust, some light wear and tear, or be nearing, but not at, physical capacity.
Rating
4
Description
Asset is in potential failure, but not functional failure mode. Functional failure not expected within the next year (if so, should be PoF of 5). Potential failure means the asset is showing signs of failure, such as cracks, root intrusions, vibration, noise, excessive rust, but is still delivering all or most of the required service. The potential failure issues will need to be addressed to prevent a functional failure. Functional failure occurs when the asset is in one of the four failure modes.
Rating
5
Description
Already in functional failure mode (Mortality – already broken, collapsed; Level of Service - not doing what it’s supposed to; Capacity – not sufficiently sized; Financial Inefficiency – costing too much to continue to use) or expected to be in functional failure mode within 1 year. A failure of one of the four types is imminent, if the asset is not already in failure mode.

For those just starting out with Asset Management, Table 1 will be the simplest rating structure. Systems with more Asset Management experience should consider using a more in-depth rating system like Table 2. In Table 3, the levels (very unlikely, unlikely, likely and very likely) can easily be turned into quantitative values (1,2,3, and 4) so the probability of failure ratings assigned to each asset can then be used in a criticality calculation. Table 1 and Table 3 have different scales – five versus four – but will serve the same purpose. It does not matter how many ratings a system chooses to use as long as those ratings have consistent definitions and are agreed upon by system staff and management. Some systems prefer to have an even number of ratings to avoid the natural temptation to pick the middle value (e.g., a rating of 3 on a scale of 1 to 5). Systems should avoid creating a rating structure with so many levels (e.g., a scale of 1 to 50) that the difference between each is minute and hard to decipher. It can also take much more time to apply this type of rating structure to the assets and the additional time may not be worth the benefit.

Once a rating structure is created, it should be tested in the field using a variety of asset types to make sure it is understandable to those who will be applying it and to make sure it achieves the correct results. Following a successful test of the structure, the system should create standard operating procedures (SOPs) for the application of probability of failure ratings to each asset to ensure the process is performed consistently among staff members. The SOPsStandard Operating Procedures can also be used to train new staff members on the process and ensure that the ratings stay consistent when current staff members leave the utility or retire.

System staff will assign an asset a probability of failure rating based on the rating structure they created. Staff must consider all four failure modes when assigning a rating in order to make informed judgements about the probability of failure across all types of assets. If the system has the ability to collect and store the information, it is also valuable to identify which failure mode(s) are likely to dominate the analysis. For example, Asset A is likely to fail via level of service failure and the likelihood is rated a 4. Asset B is likely to fail via mortality and the likelihood is 2. There are no industry standards for probability of failure for given assets, so they are best defined by system personnel as described previously. However, the more applicable empirical data is collected on an asset, the more accurate and objective the probability of failure estimate will be and the more consistent they will be with industry best practice.

Collecting data to help determine the probability of failure for green assets may be more challenging initially because their inclusion in this type of analysis is a relatively new practice compared to gray assets. Additionally, collecting the appropriate data may require expertise that is outside the typical knowledge of a water or wastewater system (e.g., fire, forestry, geomorphology.)

When examining probability of failure of green and gray assets, it is important to look across the full spectrum of modes of failure and consider all the attributes that can inform the likelihood of failure under a particular failure mode. It is best to not use a single factor as the sole predictor of likelihood of failure. When taken collectively, the factors will provide a more robust analysis of the appropriate probability of failure rating for the asset.

It is not appropriate or advisable to compare asset probability of failure ratings from two unrelated facilities (e.g., a water system in Massachusetts and a water system in Arizona.) The intent is to compare assets within a system. However, if an entity is managing multiple systems in the same area (e.g., the city owns 4 treatment plants that serve its population), the same rating system can and should be used in each of these plants. The entity is likely to want to determine which plant(s) require the most investment and comparing on the same basis will be beneficial. The goal is to determine which of your assets are more likely to fail than other assets in your system. An asset that is nearing the end of its useful life, is in poor condition, has a long history of repairs, and a poor history of maintenance is highly likely to fail. Based on these characteristics, staff would assign that asset a 5 if they were using the Table 1 PoFProbability of Failure rating scale (above). On the other hand, an asset that has a long useful life, has no repair history, is in good condition, and has had routine and preventive maintenance, is unlikely to fail. Based on these characteristics, staff would assign that asset a 1 if they were using the Table 1 PoFProbability of Failure rating scale. In some cases, it may be easier to assess all the assets of a class at the same time for easy comparison during the rating process. For example, all the hydrants could be assessed, then all the pumps, and so on. However, in larger systems, this approach may be more time consuming than assessing the assets based on their geographic location.

Once all assets are assigned a probability of failure rating, staff should review the results to see if they make sense or if there seem to be anomalies with certain types of assets or specific assets. If necessary, adjustments can be made to the rating structure to ensure it adequately represents the situation.