ALaRI Hang Glider

Search form

Education and Innovation in Embedded Systems Design

USI Università della Svizzera italiana, USI Faculty of Informatics, Advanced Learning and Research Institute USI Università della Svizzera italiana USI Faculty of Informatics USI Advanced Learning and Research Institute
TitleImpact of Failure Prediction on Availability: Modeling and Comparative Analysis of Predictive and Reactive Methods
Publication TypeJournal Article
Year of Publication2018
AuthorsKaitović, I., and M. Malek
JournalIEEE Transactions on Dependable and Secure Computing
KeywordsAnalytical models, availability, Computational modeling, failure, fault tolerance, Fault tolerant systems, Mathematical model, modeling, Optimization, prediction, predictive, Predictive models, proactive, Servers

Predicting failures and acting proactively have a potential to improve availability as a correct prediction and a successful mitigation may bring a reward resulting in decrease of downtime and availability improvement. But, conversely, each incorrect prediction may introduce additional downtime (penalty). Therefore, depending on the quality of prediction and the system parameters, predictive fault-tolerance methods may improve or may degrade availability in comparison to the reactive ones. We first derive taxonomies of fault-tolerant techniques and policies to differentiate between reactive and proactive policies that are further classified as systematic and predictive. To evaluate whether a predictive policy improves availability or not, we derive an analytical model for availability quantification. We use Markov chains to extend steady-state availability equation to include: precision and recall, penalty and reward, mitigation success probability and potential failure rate increase due to the prediction load. We also derive A-measure to optimize failure prediction for increasing availability. In our conclusion, precision and recall have comparable impact on availability as changing MTTF and MTTR. To validate the model we also simulate and analyze availability of a virtualized server with exponential distribution of failure and repair rates.