1 Data and Case Studies 1
1.1 Case Study: Flight Delays 1
1.2 Case Study: BirthWeights of Babies 2
1.3 Case Study: Verizon Repair Times 3
1.4 Case Study: Iowa Recidivism 4
1.5 Sampling 5
1.6 Parameters and Statistics 6
1.7 Case Study: General Social Survey 7
1.8 Sample Surveys 8
1.9 Case Study: Beer and HotWings 9
1.10 Case Study: Black Spruce Seedlings 10
1.11 Studies 10
1.12 Google Interview Question: Mobile Ads Optimization 12
Exercises 16
2 Exploratory Data Analysis 21
2.1 Basic Plots 21
2.2 Numeric Summaries 25
2.2.1 Center 25
2.2.2 Spread 26
2.2.3 Shape 27
2.3 Boxplots 28
2.4 Quantiles and Normal Quantile Plots 29
2.5 Empirical Cumulative Distribution Functions 35
2.6 Scatter Plots 38
2.7 Skewness and Kurtosis 40
3 Introduction to Hypothesis Testing: Permutation Tests 47
3.1 Introduction to Hypothesis Testing 47
3.2 Hypotheses 48
3.3 Permutation Tests 50
3.3.1 Implementation Issues 55
3.3.2 One-sided and Two-sided Tests 61
3.3.3 Other Statistics 62
3.3.4 Assumptions 64
3.3.5 Remark on Terminology 68
3.4 Matched Pairs 68
Exercises 70
4 Sampling Distributions 75
4.1 Sampling Distributions 75
4.2 Calculating Sampling Distributions 80
4.3 The Central LimitTheorem 84
4.3.1 CLT for Binomial Data 86
4.3.2 Continuity Correction for Discrete Random Variables 89
4.3.3 Accuracy of the Central Limit Theorem 91
4.3.4 CLT for SamplingWithout Replacement 92
Exercises 93
5 Introduction to Confidence Intervals: The Bootstrap 103
5.1 Introduction to the Bootstrap 103
5.2 The Plug-in Principle 110
5.2.1 Estimating the Population Distribution 112
5.2.2 How Useful Is the Bootstrap Distribution? 113
5.3 Bootstrap Percentile Intervals 118
5.4 Two-Sample Bootstrap 119
5.4.1 Matched Pairs 124
5.5 Other Statistics 128
5.6 Bias 131
5.7 Monte Carlo Sampling: The Second Bootstrap Principle 134
5.8 Accuracy of Bootstrap Distributions 135
5.8.1 Sample Mean: Large Sample Size 135
5.8.2 Sample Mean: Small Sample Size 137
5.8.3 Sample Median 138
5.8.4 MeanVariance Relationship 138
5.9 HowMany Bootstrap Samples Are Needed? 140
Exercises 141
6 Estimation 149
6.1 Maximum Likelihood Estimation 149
6.1.1 Maximum Likelihood for Discrete Distributions 150
6.1.2 Maximum Likelihood for Continuous Distributions 153
6.1.3 Maximum Likelihood for Multiple Parameters 157
6.2 Method of Moments 161
6.3 Properties of Estimators 163
6.3.1 Unbiasedness 164
6.3.2 Efficiency 167
6.3.3 Mean Square Error 171
6.3.4 Consistency 173
6.3.5 Transformation Invariance 175
6.3.6 Asymptotic Normality of MLE 177
6.4 Statistical Practice 178
6.4.1 Are You Asking the Right Question? 179
6.4.2 Weights 179
Exercises 180
7 More Confidence Intervals 187
7.1 Confidence Intervals for Means 187
7.1.1 Confidence Intervals for a Mean, Variance Known 187
7.1.2 Confidence Intervals for a Mean, Variance Unknown 192
7.1.3 Confidence Intervals for a Difference in Means 198
7.1.4 Matched Pairs, Revisited 204
7.2 Confidence Intervals in General 204
7.2.1 Location and Scale Parameters 208
7.3 One-sided Confidence Intervals 212
7.4 Confidence Intervals for Proportions 214
7.4.1 AgrestiCoull Intervals for a Proportion 217
7.4.2 Confidence Intervals for a Difference of Proportions 218
7.5 Bootstrap Confidence Intervals 219
7.5.1tConfidence Intervals Using Bootstrap Standard Errors 219
7.5.2 Bootstrapt Confidence Intervals 220
7.5.3 Comparing Bootstrapt and Formulat Confidence Intervals 224
7.6 Confidence Interval Properties 226
7.6.1 Confidence Interval Accuracy 226
7.6.2 Confidence Interval Length 227
7.6.3 Transformation Invariance 227
7.6.4 Ease of Use and Interpretation 227
7.6.5 Research Needed 228
Exercises 228
8 More Hypothesis Testing 241
8.1 Hypothesis Tests for Means and Proportions: One Population 241
8.1.1 A Single Mean 241
8.1.2 One Proportion 244
8.2 Bootstrap t-Tests 246
8.3 Hypothesis Tests for Means and Proportions: Two Populations 248
8.3.1 Comparing Two Means 248
8.3.2 Comparing Two Proportions 251
8.3.3 Matched Pairs for Proportions 254
8.4 Type I and Type II Errors 255
8.4.1 Type I Errors 257
8.4.2 Type II Errors and Power 261
8.4.3 P-Values versus Critical Regions 266
8.5 Interpreting Test Results 267
8.5.1 P-Values 267
8.5.2 On Significance 268
8.5.3 Adjustments for Multiple Testing 269
8.6 Likelihood Ratio Tests 271
8.6.1 Simple Hypotheses and the NeymanPearson Lemma 271
8.6.2 Likelihood Ratio Tests for Composite Hypotheses 275
8.7 Statistical Practice 279
8.7.1 More Campaigns with No Clicks and No Conversions 284
Exercises 285
9 Regression 297
9.1 Covariance 297
9.2 Correlation 301
9.3 Least-Squares Regression 304
9.3.1 Regression Toward the Mean 308
9.3.2 Variation 310
9.3.3 Diagnostics 311
9.3.4 Multiple Regression 317
9.4 The Simple LinearModel 317
9.4.1 Inference for𝛼 and𝛽 322
9.4.2 Inference for the Response 326
9.4.3 Comments about Assumptions for the Linear Model 330
9.5 Resampling Correlation and Regression 332
9.5.1 Permutation Tests 335
9.5.2 Bootstrap Case Study: Bushmeat 336
9.6 Logistic Regression 340
9.6.1 Inference for Logistic Regression 346
Exercises 350
10 Categorical Data 359
10.1 Independence in Contingency Tables 359
10.2 Permutation Test of Independence 361
10.3 Chi-square Test of Independence 365
10.3.1 Model for Chi-square Test of Independence 366
10.3.2 2 × 2 Tables 368
10.3.3 Fishers Exact Test 370
10.3.4 Conditioning 371
10.4 Chi-square Test of Homogeneity 372
10.5 Goodness-of-fit Tests 374
10.5.1 All Parameters Known 374
10.5.2 Some Parameters Estimated 377
10.6 Chi-square and the Likelihood Ratio 379
Exercises 380
11 Bayesian Methods 391
11.1 Bayes Theorem 392
11.2 Binomial Data: Discrete Prior Distributions 392
11.3 Binomial Data: Continuous Prior Distributions 400
11.4 Continuous Data 406
11.5 Sequential Data 409
Exercises 414
12 One-way ANOVA 419
12.1 Comparing Three or More Populations 419
12.1.1 The ANOVA F-test 419
12.1.2 A Permutation Test Approach 428
Exercises 429
13 Additional Topics 433
13.1 Smoothed Bootstrap 433
13.1.1 Kernel Density Estimate 435
13.2 Parametric Bootstrap 437
13.3 The Delta Method 441
13.4 Stratified Sampling 445
13.5 Computational Issues in Bayesian Analysis 446
13.6 Monte Carlo Integration 448
13.7 Importance Sampling 452
13.7.1 Ratio Estimate for Importance Sampling 458
13.7.2 Importance Sampling in Bayesian Applications 461
13.8 The EM Algorithm 467
13.8.1 General Background 469
Exercises 472
Appendix A Review of Probability 477
A.1 Basic Probability 477
A.2 Mean and Variance 478
A.3 The Normal Distribution 480
A.4 The Mean of a Sample of RandomVariables 481
A.5 Sums of Normal Random Variables 482
A.6 The Law of Averages 483
A.7 Higher Moments and the Moment-generating Function 484
Appendix B Probability Distributions 487
B.1 The Bernoulli and Binomial Distributions 487
B.2 The Multinomial Distribution 488
B.3 The Geometric Distribution 490
B.4 The Negative Binomial Distribution 491
B.5 The Hypergeometric Distribution 492
B.6 The Poisson Distribution 493
B.7 The Uniform Distribution 495
B.8 The Exponential Distribution 495
B.9 The Gamma Distribution 497
B.10 The Chi-square Distribution 499
B.11 The Students t Distribution 502
B.12 The Beta Distribution 504
B.13 The F Distribution 505
Exercises 507
Appendix C Distributions Quick Reference 509
Solutions to Selected Exercises 513
References 525
Index 531