Developing a pump spare strategy to reduce asset life cycle costs
One of the most important goals of asset performance management is to minimize the asset life cycle cost (LCC), which heavily depends on the availability of the critical equipment in the process unit. Improvement of the availability of a component in the process unit can be achieved by increasing the equipment reliability (mean time between failures, or MTBF) and/or improving the repair efficiency (mean time to repair, or MTTR).
Alternatively, identical equipment can be added to the system as a standby to the active one. In this way, the 1-out-of-2 system fails only if the active equipment fails, and the standby equipment fails while the active one is being repaired. Depending on the efficiency of swapping from the active to the standby, adding a standby unit is typically more effective than doubling the MTBF or halving the MTTR, not to mention that the latter is difficult to achieve.
In a chemical processing plant, a critical pump would shut down a process unit in case of a failure, and therefore is often spared. However, developing the spare pump strategy may not be as simple as adding a standby to every critical pump. Before finding the optimal solution, the following questions need to be answered.
1. Is the second pump truly a spare?
When the process requires two pumps to be operating together to meet the service demand, these pumps are either in series mode (i.e., failure of either one causes system to become unavailable) or load-sharing mode (failure of one results in partial production de-rate). In this case, having two pumps clearly does not help the system availability.
2. Is the spare justified?
The cost-effectiveness of adding a spare pump needs to be justified. The production savings brought by the reduced downtime due to the redundancy should be greater than the combined expenses of the initial purchase, additional space and piping, additional maintenance-related material and labor, cost of capital, etc.
3. Standby or shared spare?
If many critical pumps are identical (or interchangeable) in a processing plant, another option is to keep a number of spare pumps to be shared.
Compared to the strategy of adding a standby to each critical pump, the advantage of shared spares is the savings of the initial purchase cost, and the savings of the routine maintenance on spare pumps (the total number of shared spares should be much less than that of the critical pumps).
On the other hand, not having a standby pump right next to the active pump causes a delay in case of a failure, as the spare pump needs to be transported to the specific location and connected into the processing system. This delay will result in an additional production loss.
Objective data collection, an adequate reliability improvement program on-site and capable analytical tools are required to correctly answer the questions above.
1. Answer to "Is the second pump truly a spare?"
The answer will most likely come from a production engineer. If it is determined that losing one of the two pumps causes long-term production loss (even partially), the two pumps are not spared. The next decision to make will be whether or not to add one or two additional pumps as the spare to the existing ones, forming a 2-out-of-3 or a 1-out-of-2 pump system. The justification of spares can follow the same procedure below.
2. Answer to "Is the spare justified?"
The life cycle cost comparison among multiple strategies requires a statistical simulation. Accurate input data is key to making the right decision. The important input data include the following:
- Criticality, for consequence and frequency of failure. To achieve a successful criticality assessment, input from operations, production, maintenance, engineering, and environmental, health and safety (EHS) representatives should be collected to reach an agreement of the failure consequence and occurrence rate.
- Failure Mode and Effects Analysis (FMEA), which identifies dominant failure modes and possible risk mitigation tasks.
- Statistical distribution for each failure mode, which is ideally based on historical data, to provide the frequency and characteristic of the failure mode. If the historical data is scarce, the criticality result may be adopted.
- Maintenance data, such as labor/tool cost rate and duration hours.
- Spare information, including initial purchase price, storage cost, lead time, depreciation, etc.
Once the above information is collected, and fed into the reliability analysis tool, a Monte Carlo simulation can be performed to compare the LCC of the two options: single pump versus 1-out-of-2 pump system. Note that comparison should be based on the optimized LCC values, achieved by optimizing all preventive maintenance (PM) and predictive maintenance (PdM) tasks (whether enabled or disabled) and the task frequency.
3. Answer to "Standby or shared spare?"
In addition to the input data defined above, more assumptions need to be made for spares. Spares can be categorized into three levels:
- Level 1 spares are kept on site. The storage cost per piece per unit time is relatively high, but the logistic delay is the shortest and storage capacity is limited.
- Level 2 spares are kept in an off-site warehouse. The storage cost per piece per unit time is lower and the storage capacity is better, but the logistic delay is longer.
- Level 3 spares are directly purchased from vendors in the market, and therefore have no storage cost. However, the logistic delay is the longest, and the price is generally higher.
For "Standby," the initial number of pumps is twice the number of required pumps. It is also assumed that once a pump has been replaced by the standby, another pump from Level 2 storage will become standby, and the Level 2 spare will be replenished immediately.
For "Shared Spare," the initial number of pumps equals the number of required pumps. There will be a certain number of spare pumps at Level 1 (on-site) and Level 2 (warehouse), respectively. Both numbers are optimized using the simulation tool. Both Level 1 and Level 2 spares will be replenished immediately after being consumed.
Figure 2. Project comparison. Courtesy of Tong Zou
Keep in mind that one usually assumes that the spare (or standby) pump is 100 percent reliable at the beginning of its operation. In reality, regular maintenance tasks, such as shaft rotation and bearing lubrication, need to be performed periodically to keep the reliability of the non-operating pumps nearly perfect. If the equipment is in an active-standby arrangement, as opposed to a shared load or critical spare configuration, it is a good practice to rotate between the active and the standby in a 60 to 40 percent or 70 to 30 percent range.
Analyses are performed using Isograph Availability Workbench to simulate 10 identical critical pumps for 10 years’ lifetime, to demonstrate the comparison of above-mentioned "spare" strategies. All other input data is the same, except for the initial purchase cost, number of spares and logistic delay.
The results in Figure 2 indicate that the two options have similar LCC, although the Standby option has a slight advantage. The Shared Spare option has smaller initial purchase cost, higher storage cost and higher production loss. The Standby option has higher initial purchase cost, but as expected, all other Shared Spare cost components are lower compared to Standby. It should be noted that the LCC is affected by many factors and the comparison may favor the Shared Spare option if some of the parameters are changed.
The philosophy and procedure of the strategy development to minimize the life cycle cost of the critical equipment have been discussed. Two strategies, Shared Spare and Standby are compared with assumed parameters using commercial software. This general procedure can be applied to most chemical processing plants using plant-specific equipment, cost and labor parameters.
Dr. Tong Zou is a senior engineering specialist at T.A. Cook Consultants. Zou has more than 14 years of reliability engineering experience in power generation, automotive, oil and gas and petrochemical industries. Zou’s expertise includes reliability-centered maintenance, system and component reliability analysis, equipment life data analytics and reliability-based design optimization. Since joining T.A. Cook, Zou has been working on client projects focusing on reliability improvement.

