This book explains why applications running on cloud might not deliver the same service reliability, availability, latency and overall quality to end users as they do when the applications are running on traditional (non-virtualized, non-cloud) configurations, and explains what can be done to mitigate that risk.
Figures xv
Tables and Equations xxi
1 INTRODUCTION 1
1.1 Approach 1
1.2 Target Audience 3
1.3 Organization 3
I CONTEXT 7
2 APPLICATION SERVICE QUALITY 9
2.1 Simple Application Model 9
2.2 Service Boundaries 11
2.3 Key Quality and Performance Indicators 12
2.4 Key Application Characteristics 15
2.5 Application Service Quality Metrics 17
2.6 Technical Service versus Support Service 27
2.7 Security Considerations 28
3 CLOUD MODEL 29
3.1 Roles in Cloud Computing 30
3.2 Cloud Service Models 30
3.3 Cloud Essential Characteristics 31
3.4 Simplifi ed Cloud Architecture 33
3.5 Elasticity Measurements 36
3.6 Regions and Zones 44
3.7 Cloud Awareness 45
4 VIRTUALIZED INFRASTRUCTURE IMPAIRMENTS 49
4.1 Service Latency, Virtualization, and the Cloud 50
4.2 VM Failure 54
4.3 Nondelivery of Configured VM Capacity 54
4.4 Delivery of Degraded VM Capacity 57
4.5 Tail Latency 59
4.6 Clock Event Jitter 60
4.7 Clock Drift 61
4.8 Failed or Slow Allocation and Startup of VM Instance 62
4.9 Outlook for Virtualized Infrastructure Impairments 63
II ANALYSIS 65
5 APPLICATION REDUNDANCY AND CLOUD COMPUTING 67
5.1 Failures, Availability, and Simplex Architectures 68
5.2 Improving Software Repair Times via Virtualization 70
5.3 Improving Infrastructure Repair Times via Virtualization 72
5.4 Redundancy and Recoverability 75
5.5 Sequential Redundancy and Concurrent Redundancy 80
5.6 Application Service Impact of Virtualization Impairments 84
5.7 Data Redundancy 90
5.8 Discussion 92
6 LOAD DISTRIBUTION AND BALANCING 97
6.1 Load Distribution Mechanisms 97
6.2 Load Distribution Strategies 99
6.3 Proxy Load Balancers 99
6.4 Nonproxy Load Distribution 101
6.5 Hierarchy of Load Distribution 102
6.6 Cloud-Based Load Balancing Challenges 103
6.7 The Role of Load Balancing in Support of Redundancy 103
6.8 Load Balancing and Availability Zones 104
6.9 Workload Service Measurements 104
6.10 Operational Considerations 105
6.11 Load Balancing and Application Service Quality 107
7 FAILURE CONTAINMENT 111
7.1 Failure Containment 111
7.2 Points of Failure 116
7.3 Extreme Solution Coresidency 122
7.4 Multitenancy and Solution Containers 124
8 CAPACITY MANAGEMENT 127
8.1 Workload Variations 128
8.2 Traditional Capacity Management 129
8.3 Traditional Overload Control 129
8.4 Capacity Management and Virtualization 131
8.5 Capacity Management in Cloud 133
8.6 Storage Elasticity Considerations 135
8.7 Elasticity and Overload 136
8.8 Operational Considerations 137
8.9 Workload Whipsaw 138
8.10 General Elasticity Risks 140
8.11 Elasticity Failure Scenarios 141
9 RELEASE MANAGEMENT 145
9.1 Terminology 145
9.2 Traditional Software Upgrade Strategies 146
9.3 Cloud-Enabled Software Upgrade Strategies 153
9.4 Data Management 158
9.5 Role of Service Orchestration in Software Upgrade 159
9.6 Conclusion 161
10 END-TO-END CONSIDERATIONS 163
10.1 End-to-End Service Context 163
10.2 Three-Layer End-to-End Service Model 169
10.3 Distributed and Centralized Cloud Data Centers 177
10.4 Multitiered Solution Architectures 183
10.5 Disaster Recovery and Geographic Redundancy 184
III RECOMMENDATIONS 191
11 ACCOUNTABILITIES FOR SERVICE QUALITY 193
11.1 Traditional Accountability 193
11.2 The Cloud Service Delivery Path 194
11.3 Cloud Accountability 197
11.4 Accountability Case Studies 200
11.5 Service Quality Gap Model 205
11.6 Service Level Agreements 210
12 SERVICE AVAILABILITY MEASUREMENT 213
12.1 Parsimonious Service Measurements 214
12.2 Traditional Service Availability Measurement 215
12.3 Evolving Service Availability Measurements 217
12.4 Evolving Hardware Reliability Measurement 226
12.5 Evolving Elasticity Service Availability Measurements 228
12.6 Evolving Release Management Service Availability Measurement 229
12.7 Service Measurement Outlook 231
13 APPLICATION SERVICE QUALITY REQUIREMENTS 233
13.1 Service Availability Requirements 234
13.2 Service Latency Requirements 237
13.3 Service Reliability Requirements 237
13.4 Service Accessibility Requirements 238
13.5 Service Retainability Requirements 239
13.6 Service Throughput Requirements 239
13.7 Timestamp Accuracy Requirements 240
13.8 Elasticity Requirements 240
13.9 Release Management Requirements 241
13.10 Disaster Recovery Requirements 241
14 VIRTUALIZED INFRASTRUCTURE MEASUREMENT AND MANAGEMENT 243
14.1 Business Context for Infrastructure Service Quality Measurements 244
14.2 Cloud Consumer Measurement Options 245
14.3 Impairment Measurement Strategies 247
14.4 Managing Virtualized Infrastructure Impairments 252
15 ANALYSIS OF CLOUD-BASED APPLICATIONS 255
15.1 Reliability Block Diagrams and Side-by-Side Analysis 256
15.2 IaaS Impairment Effects Analysis 257
15.3 PaaS Failure Effects Analysis 259
15.4 Workload Distribution Analysis 260
15.5 Anti-Affi nity Analysis 262
15.6 Elasticity Analysis 263
15.7 Release Management Impact Effects Analysis 267
15.8 Recovery Point Objective Analysis 268
15.9 Recovery Time Objective Analysis 270
16 TESTING CONSIDERATIONS 273
16.1 Context for Testing 273
16.2 Test Strategy 274
16.3 Simulating Infrastructure Impairments 277
16.4 Test Planning 278
17 CONNECTING THE DOTS 287
17.1 The Application Service Quality Challenge 287
17.2 Redundancy and Robustness 289
17.3 Design for Scalability 292
17.4 Design for Extensibility 292
17.5 Design for Failure 293
17.6 Planning Considerations 294
17.7 Evolving Traditional Applications 296
17.8 Concluding Remarks 301
Abbreviations 303
References 307
About the Authors 311
Index 313