Availability ModelingAvailability is the percent of time that a system is fully functional. This is a calculation involving averages, and it can be envisioned as follows: There is an average time to failure, that is, the system runs fine for awhile, then some aspect of it fails. Sometimes a system runs fine for a long time, and sometimes it fails after a short time. The average time to failure, measured over a huge number of similar systems, is called the Mean Time To Failure or MTTF. (See the glossary for a discussion of mean time to failure and mean time between failures.) After a failure, the system either stops or runs in a sub-standard way for awhile while repairs (or replacements - we do not distinguish between a repair and a replacement) are being made. Sometimes repairs can be made quickly and sometimes, for various reasons, they take a long time. (They usually take longer than expected!) One has to measure the time to repair from the time the system failed until the time that is is fully functional and no additional repair or recovery effort is being done. The average time to repair, measured over a large number of repairs of various failures of similar systems is called the Mean Time To Repair or MTTR. Think of MTTF and MTTR as blocks of time.
The following formula should seem reasonable: A = Availability = MTTF/(MTTF+MTTR) MTTF is often called the system's reliability. Of course, a component of a system, for example a disk drive, has a mean time to failure; this is called the component's reliability. MTTR is the average system downtime. It is sometimes called the system's reparability, since it is a measure of how fast the system can be repaired or replaced. Since there are approximately 8766 hours in a year, the expected number of hours a 7 by 24 system is fully functional during a year is 8766×A, and the expected annual downtime is 8766-8766×A = 8766×(1-A). In general, for systems that only need to be fully functional during certain hours, e.g. during working hours, one computes fully functional time and outage time only for that time the system is expected to be fully functional. If there are a total of Y such hours in a year then the expected hours the system is fully functional is Y×A and the expected annual downtime is Y-Y×A=Y×(1-A). To calculate or model a system's reliability is to estimate both MTTF and MTTR. Both estimates provide significant challenges, but since MTTR almost always involves significant human involvement, it is easy to underestimate MTTR. Of course, people make mistakes, and these often cause system outages. These mistakes are also difficult to model. To improve a system's availability, one can improve MTTF or MTTR or both. Bristol Systems has many check lists of ways to analyze and improve a system's availability. Please email bristolsystems.com, username solutions, for a model of your system's availability. It is an important step in our availability services. |
|
Home | Products | Services | High Availability | Information | Employment | Contacts | Site Map Hosted by Bristol Systems Inc. |