0

Practical System Reliability

eBook

Erschienen am 27.03.2009, Auflage: 1/2009
CHF 81,00
(inkl. MwSt.)

Download

E-Book Download
Bibliografische Daten
ISBN/EAN: 9780470455388
Sprache: Englisch
Umfang: 300 S., 2.91 MB
E-Book
Format: PDF
DRM: Adobe DRM

Beschreibung

Learn how to model, predict, and manage system reliability/availability throughout the development life cycle

Written by a panel of authors with a wealth of industry experience, the methods and concepts presented here give readers a solid understanding of modeling and managing system and software availability and reliability through the development of real applications and products. The modeling and prediction techniques and tools are customer-focused and data-driven, and are also aligned with industry standards (Telcordia, TL 9000, ISO, etc.). Readers will get a clear understanding about what real-world reliability and availability mean through step-by-step discussions of:

System availabilityConceptual model of reliability and availabilityWhy availability varies between customersModeling availabilityEstimating parameters and availability from field dataEstimating input parameters from laboratory dataEstimating input parameters in the architecture/design stagePrediction accuracyConnecting the dots

This book can be used by system architects, engineers, and developers to better understand and manage the reliability/availability of their products; quality engineers to grasp how software and hardware quality relate to system availability; and engineering students as part of a short course on system availability and software reliability.

Autorenportrait

Eric Bauer is a manager of reliability engineering in Alcatel-Lucent's wireline business in Murray Hill, New Jersey. He has designed, modeled, and analyzed reliability for many different products and solutions, and architected and developed software for a variety of communications devices, platforms, and products.

Xuemei Zhang, PhD, is a principal member of the technical staff in the Network Design and Performance Analysis Department at AT&T Labs. She has been working on reliability and performance analysis of wireline and wireless communications systems and networks. Her major work and research areas are in system and architectural reliability and performance, product and solution reliability and performance modeling, and software reliability.

Douglas A. Kimber retired from Alcatel-Lucent as a staff reliability engineer. Throughout his career at Bell Labs, Lucent Technologies, and Alcatel-Lucent, he developed high reliability hardware and software platforms, applications, and systems, and then transitioned to reliability engineering where he did reliability modeling and analysis.

Inhalt

Preface.

Acknowledgments.

1 Introduction.

2 System Availability.

2.1 Availability, Service and Elements.

2.2 Classical View.

2.3 Customers View.

2.4 Standards View.

3 Conceptual Model of Reliability and Availability.

3.1 Concept of Highly Available Systems.

3.2 Conceptual Model of System Availability.

3.3 Failures.

3.4 Outage Resolution.

3.5 Downtime Budgets.

4 Why Availability Varies Between Customers.

4.1 Causes of Variation in Outage Event Reporting.

4.2 Causes of Variation in Outage Duration.

5 Modeling Availability.

5.1 Overview of Modeling Techniques.

5.2 Modeling Definitions.

5.3 Practical Modeling.

5.4 Widget Example.

5.5 Alignment with Industry Standards.

6 Estimating Parameters and Availability from Field Data.

6.1 Self-Maintaining Customers.

6.2 Analyzing Field Outage Data.

6.3 Analyzing Performance and Alarm Data.

6.4 Coverage Factor and Failure Rate.

6.5 Uncovered Failure Recovery Time.

6.6 Covered Failure Detection and Recovery Time.

7 Estimating Input Parameters from Lab Data.

7.1 Hardware Failure Rate.

7.2 Software Failure Rate.

7.3 Coverage Factors.

7.4 Timing Parameters.

7.5 System-Level Parameters.

8 Estimating Input Parameters in the Architecture/Design Stage.

8.1 Hardware Parameters.

8.2 System-Level Parameters.

8.3 Sensitivity Analysis.

9 Prediction Accuracy.

9.1 How Much Field Data Is Enough?

9.2 How Does One Measure Sampling and Prediction Errors?

9.3 What Causes Prediction Errors?

10 Connecting the Dots.

10.1 Set Availability Requirements.

10.2 Incorporate Architectural and Design Techniques.

10.3 Modeling to Verify Feasibility.

10.4 Testing.

10.5 Update Availability Prediction.

10.6 Periodic Field Validation and Model Update.

10.7 Building an Availability Roadmap.

10.8 Reliability Report.

11 Summary.

Appendix A System Reliability Report outline.

1 Executive Summary.

2 Reliability Requirements.

3 Unplanned Downtime Model and Results.

Annex A Reliability Definitions.

Annex B References.

Annex C Markov Model State-Transition Diagrams.

Appendix B Reliability and Availability Theory.

1 Reliability and Availability Definitions.

2 Probability Distributions in Reliability Evaluation.

3 Estimation of Confidence Intervals.

Appendix C Software Reliability Growth Models.

1 Software Characteristic Models.

2 Nonhomogeneous Poisson Process Models.

Appendix D Acronyms and Abbreviations.

Appendix E Bibliography.

Index.

About the Authors.

Informationen zu E-Books

Individuelle Erläuterung zu E-Books