You are on page 1of 196

Chapter 1

Data Collection and Exploring


Univariate Distributions

1.2 Types of Data and Frequency Distribution Tables


1.1 a Qualitative
b Quantitative
c Quantitative
d Quantitative
e Qualitative
f Quantitative
g Quantitative
h Qualitative
i Quantitative
j Qualitative
1.3 a Years of Formal Education
Income Category 12 14 16 18 20 +
0 − 29, 999 0.620 0.473 0.323 0.220 0.148
30, 000 − 59, 999 0.313 0.408 0.397 0.417 0.298
60, 000 − 89, 999 0.054 0.093 0.174 0.227 0.270
90, 000 and above 0.014 0.025 0.106 0.136 0.284
b The relative frequency of higher-income categories increased with the increasing number of years
of formal education

1.3 Tools for Describing Data: Graphical Methods


1.5 a Complaints 5, 12, and 10 (cleaning of public highways, working hours, and screening/fencing
respectively) each comprised at least 10% of the total number of complaints.
b Complaints 5, 12, 10, 4, 7, 8, 9, and 1 (cleaning of public highways, working hours, screen-
ing/fencing, water courses affected by construction, blue routes and restricted times of use, tem-
porary and permanent diversions, TMP, and property damage) comprise a cumulative total of
80% of the complaints.

1
2 Chapter 1: Data Collection and Exploring Univariate Distributions

1.7 a Pareto charts for lead pollution sources in 1980, 1990 and 2000:
Chapter 1: Data Collection and Exploring Univariate Distributions 3

b Lead emissions seem to have decreased since 1980, especially in the areas of transportaion and
miscellaneous fuel combustion sources.
c The evidence seems to suggest that we are releasing lead pollutants into our environment at a
decreased rate since 1980.

1.9 a Bar Chart of Alabama Aerospace Employment:

The largest number were employed by the information technology services, followed by engineering
and RFD services, and missile space vehicle manufacturing.
b Bar Chart of number of employees per company amongst Alabama Aerospace fields:

Although information technology services employed the largest number of employees, they were
not, on average, large employers. Engineering RFD services and missile space vehicle manufactur-
ing employed fewer people than the information technology services, yet they employed far more
people on average per company.
4 Chapter 1: Data Collection and Exploring Univariate Distributions

1.11 a Bar chart of crude Steel production by region in 2004 and 2003:

b In general, the crude oil production has increased from 2003 to 2004.

1.13 The data are grouped together in a histogram, losing identity of individual observations, which are still
retained by dotplot. A small number of observations makes it difficult to notice any patterns. Gaps in
the data are visible from a dotplot but are not identified from a histogram.
0 + 27 + 12 + 0
1.15 a = .078 = 7.8%
500
b There were no rods of length .999, and an abnormally large amount of rods of length 1.000.
This may indicate that someone may have been inappropriately placing .999 rods into the 1.000
category to prevent them from being declared defective.

1.17 a Yes, in 1890, most of the population was in the younger age ranges. In 2005, a larger percentage
of the population are in the upper age ranges. This might suggest that there have been some sort
of medical advances to improve life expectancy and quality of life over time.
b Percent of population under 30 in 1890
= 25% + 22% + 18% = 65%.
Percent of population under 30 in 2005
= 13.3% + 14.5% + 13.4% = 41.2%
c The percentage of older population has increased. In 1890, the percentage of population in
different age categories decreased steadily with the increasing age. In 2005, it is fairly evenly
distributed across different age groups except for the two oldest age groups.
Chapter 1: Data Collection and Exploring Univariate Distributions 5

1.19 a Histograms and dotplots displaying indices of industrial production in 1990 and 1998:
6 Chapter 1: Data Collection and Exploring Univariate Distributions

There is a considerable increase in average index from 1990 to 1998, indicating an increase in
industrial product by most of the countries in general. The indexes were more spread out in 1990
compared to 1998. The distribution is right-skewed in 1998. There might be an outlier on the
upper end in 1990.
b Histogram and dotplot of the difference in production from 1990 to 1998:

Most countries showed improvement (an increase of up to 40 points) in the industrial production;
one country in particular showed a tremendous amount of improvement. Only two countries
showed a decrease.
Chapter 1: Data Collection and Exploring Univariate Distributions 7

1.4 Tools for Describing Data: Numerical Measures


1.21 a Dotplot for percentage change in crude oil import:

b
(−38.39) + (−20.34) + (−46.42) + . . . + (−11.54)
x̄ =
10
−6.99
= = −.699
r10
((−38.39 − (−.699))2 + . . . + (−11.54 − (−.699))2 )
s=
9
r
22376.5
= = 49.86
9
c Minitab output follows:
Descriptive Statistics: % Change
Variable N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
% Change 10 -0.699 15.8 49.9 -46.4 -36.9 -24.5 56.7 92.4

(−20.34) + (−28.59)
Median = = −24.47
2
IQR = 56.7 − (−36.9) = 93.6

d Probably not, because the distribution is skewed with outliers on the higher end that affect the
values of mean and standard deviation. Minitab output follows:
Descriptive Statistics: % Change
Variable N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
% Change 9 -11.0 13.3 39.9 -46.4 -37.4 -28.6 22.4 57.9

(−38.39) + (−20.34) + (−46.42) + . . . + (−11.54)


x̄ =
9
−99.3402
= = −11.0378
r 9
(−38.39 − (−.699))2 + . . . + (−11.54 − (−.699))2
s=
8
r
12756.4
= = 39.93
8
Median = −28.59
IQR = 22.4 − (−37.4) = 59.8

1.23 The word ‘average’ here probably refers to the mean, in which case a few winters with very deep snow
pack (right skewed data) would have made the mean larger than the median (and hence more than
50% of the data would be below the mean)
8 Chapter 1: Data Collection and Exploring Univariate Distributions

(4.0)(30) + (4.2)(33)
1.25 a Composite Mean = = 4.10476
63
(4.2)(30) + (2.7)(29) + (3.0)(29) + (4.2)(30) + (3.0)(30)
b Composite Mean = = 3.4277
30 + 29 + 29 + 30 + 30
1.27 a
2.8 + 3.1 + 4.7 + . . . + 9.0
x̄MP = = 5.78
r 12
(2.8 − 5.78)2 + (3.1 − 5.78)2 + . . . + (9.0 − 5.78)2
sMP =
11
r
40.6625
= = 1.923
11

0.9 + 1.2 + 1.3 + . . . + 3.4


x̄B = = 1.68
r 12
(0.9 − 1.68)2 + (1.2 − 1.68)2 + . . . + (1.3 − 1.68)2
sB =
11
r
4.77667
= = 0.659
11
b Because the value 3.4 is high compared to the rest of the measurements for the Baytex LC50, the
mean and standard deviation become abnormally large by the inclusion of this data point.
1.29 a
12 + 8 + 3 + . . . + 15
x̄SDB = = 10.75
r 12
(12 − 10.75)2 + (8 − 10.75)2 + . . . + (15 − 10.75)2
sSDB =
11
r
192.25
= = 4.181
11

14 + 15 + 15 + . . . + 22
x̄FOB = = 15
r 12
(14 − 15)2 + (15 − 15)2 + . . . + (22 − 15)2
sFOB =
11
r
182
= = 4.068
11
b The variation in percent bridges recorded among southeastern states seem to be comparable
for structurally deficient and functionally obsolete bridges, however, there are a higher mean
percentage of functionally obsolete bridges than structurally deficient ones.
Chapter 1: Data Collection and Exploring Univariate Distributions 9

1.5 Summary Measures and Decisions


1.31 a Histograms and boxplots of percent on-time arrivals and departures:

Both, the arrival and departure time distributions are left-skewed, arrival times more so than
the departure times. The median percentage of on-time departures is higher than the median
percentage of on-time arrivals. Both the distributions have about the same range. Both the
distributions have outliers on the lower end, indicating a low-performing airport (or airports).
1
b = 3.125%
32
0
c = 0%
32
d For arrival data, we find that x̄ = 81.33 and s = 4.558. For departure data, we find that x̄ = 85.23
and s = 3.417.
Arrivals Departures
k (x̄ − ks, x̄ + ks) % Data in Interval (x̄ − ks, x̄ + ks) % Data in Interval
1 (76.772, 85.888) 78.1% (81.813, 88.647) 71.9%
2 (72.214, 90.446) 93.75% (78.396, 92.064) 96.9%
10 Chapter 1: Data Collection and Exploring Univariate Distributions

Departure data seems to agree more strongly with the empirical rule, which says that around
68% should lie within 1 standard deviation of the mean and 95% should lie within 2 standard
deviations of the mean.
e The range representing values within 5% of the mean is
(81.33 − 0.05(81.33), 81.33 + .05(81.33)) = (77.2635, 85.3965). 24 or 75% of airports have percent
on-time arrivals in this range.
f The range representing values within 5% of the mean is
(85.23− 0.05(85.23), 85.23+ .05(85.23)) = (80.9685, 89.4915). 28 or 87.5% of airports have percent
on-time arrivals in this range.
g Looking at the boxplots, we see that for arrival times, the three lowest: Chicago O’Hare, Newark
Int and New York LaGuardia qualify as outliers, and for departure times, Chicago O’Hare is an
outlier.
80.5 − 81.33
h zATLarrivals = = −0.1821
4.558
83.5 − 85.23
zATLdepartures = = −0.5063
3.417
67 − 81.33
zCHIarrivals = = −3.1439
4.558
73.4 − 85.23
zCHIdepartures = = −3.4621
3.417
Atlanta is -0.1821 standard deviations below the mean for percent on-time arrivals and -0.5063
standard deviations below the mean for percent on-time departures. Atlanta is better with on-
time arrivals than on-time departures. Chicago O’Hare is -3.1439 standard deviations below the
mean for percent on-time arrivals and -3.4621 standard deviations below the mean for percent
on-time departures. O’Hare is also better with on time arrivals than on-time departures.
1.33 a Histograms and Boxplots for motor vehicle deaths in 1980 and 2002:
Chapter 1: Data Collection and Exploring Univariate Distributions 11

The skewness in the distribution and the abundance of outliers in the 1980 data indicate that the
median and IQR will describe these datasets better than the mean and median.
b Washington, DC; Idaho; Montana; West Virginia; Wyoming; Arizona; New Mexico; Louisiana;
and Nevada. These states, except for DC, have low population densities, which may mean that
medical teams must travel large distances to provide help to accident victims. In DC, medical
teams should be able to arrive at accidents much more quickly.
c Based on the data, even though more vehicles are probably using the highways in 2002 than
in 1980, the median rate of motor vehicle deaths has decreased, which may indicate that safety
measures have improved in that time.
12 Chapter 1: Data Collection and Exploring Univariate Distributions

1.6 Supplementary Exercises

1.35 a Histograms of temperatures for Central Park and Newnan:

b Boxplots of temperatures for Central Park and Newnan:


Chapter 1: Data Collection and Exploring Univariate Distributions 13

The distribution of annual temperatures in Central Park is slightly left-skewed. The temperatures
ranged from about 50◦ F to 57◦ F with a mean about 54◦ F. There are no outliers. The distribution
of temperatures at Newnan is slightly right-skewed. The temperatures ranged from about 58◦ F
to 66◦ F with a mean about 62◦ F. There are no outliers.
c The shapes of the two distributions indicate that Central Park has seen more years with warmer
temperatures and Newnan more years with cooler temperatures during the last century. On the
average, Newnan is warmer than the Central Park. The range of temperatures is about the same
at both locations.
1.37 Bar chart of classification of voters by income:

The percentage of eligible voters who voted in the 2000 presidential election increased steadily with
the household income group. From the lowest income group, the lowest percentage of voters voted,
whereas from the highest income group the highest percentage of voters voted in this election.
14 Chapter 1: Data Collection and Exploring Univariate Distributions

1.39 Bar chart of bridge collapses by size of crowd:

The median number of people on collapsing bridges was between 26 and 150; the data are skewed to
the right, so most of the bridges had a relatively small crowd when they collapsed; the spread is small,
a vast majority of the collapses occurred with a relatively small number of people on the bridge. The
crowd size on collapsing bridges ranged from less than 26 to more than 750.
1.41 a Bar Chart showing energy, max peak demand, and thermal savings over time:

b Every month the energy savings are the highest and the thermal savings are the lowest. The
energy savings show a cycle with highest savings during the summer months and lowest savings
during the winter months. On the other hand, thermal savings are highest during the winter
months and lowest during the summer months, showing exactly opposite cycles. The maximum
peak demand savings are higher in general during summer months and lower in the winter months.
Chapter 1: Data Collection and Exploring Univariate Distributions 15

c Month Energy Peak Thermal Total


Nov 2001 71.49 1.77 5.09 78.35
Dec 2001 61.43 37.39 9.57 108.39
Jan 2002 54.47 50.10 8.52 113.09
Feb 2002 94.84 56.71 8.47 160.02
Mar 2002 104.19 75.28 6.56 186.03
Apr 2002 132.77 63.33 3.17 199.27
May 2002 166.18 79.92 1.66 247.76
Jun 2002 164.24 38.40 0.60 203.24
Jul 2002 154.17 81.12 0.87 236.16
Aug 2002 148.62 56.71 0.81 206.14
Sep 2002 140.58 56.97 0.16 197.71
Oct 2002 67.35 31.09 4.35 102.79

78.35 + 108.38 + 113.09 + . . . + 102.79


x̄Total = = 169.91
r 12
(78.35 − 169.91)2 + (108.38 − 169.91)2 + . . . (102.79 − 169.91)2
sTotal =
11
r
34768.484
= = 56.221
11
d Boxplot of total savings:

No outliers.
16 Chapter 1: Data Collection and Exploring Univariate Distributions

1.43 Pareto charts of SO pollution sources:


Chapter 1: Data Collection and Exploring Univariate Distributions 17

Fuel combustion is the largest contributor of sulfur dioxide emissions. Although the amount of contri-
bution decreased over the years, it is still a major contributor. Amount of contribution by industrial
processes decreased over the years, but the percentage of total emission increased over the years. The
percent contribution by transportation increased slightly.
18 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
Chapter 2

Exploring Bivariate Distributions and


Estimating Relations

2.1 Two-Way Table for Categorical Data


2.1 a Bar chart of location versus percent mortality rate:

b The mortality rate is highest among the middle-seat passengers compared to the front or rear-seat
passengers. The mortality rate is lowest among the rear-seat passengers.

19
20 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

2.3 Bar chart of type of sewage disposal by geographic location:

A public sewer is used most commonly in all four geographical areas. Other methods of sewage disposal
are the least used in all four geographical areas. The western region has the highest percentage of public
sewage users and the southern region the lowest. The southern region has the highest rate of septic
system users.

2.2 Time Series Analysis


2.5 a Time series plot for the total SO2 emissions from 1989 to 2000:

The total sulfur dioxide estimates decreased steadily until 1994; then a sudden drop was observed
from 1994 to 1995. After 1995, they increased steadily till 1998 and then decreased.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 21

b Time series plot of SO2 emissions by category from 1989 to 2000:

The sulfur dioxide emission estimates from fuel combustion, particularly from electrical utilities,
decreased steadily until 1994 and dropped considerably in 1995. They increased for the next three
years and then decreased. All other categories are minor contributors of sulfur dioxide emissions,
and their amounts decreased slightly over the years, except for transportation, which showed an
increase from 1995 to 1996 and increased slightly thereafter.
2.7 a Time series plot of energy cost for each sector:

b The average fuel price showed an increasing trend in residential and commercial sectors, with sud-
den increases in the early 1980s. Both sectors showed very similar trends with similar prices. The
transportation sector has higher average fuel prices. Although the industrial and transportation
sectors showed similar trends in average fuel prices over the years, the difference in the average
fuel prices increased slightly. In both of these sectors, the prices increased till the mid-80s and
then experienced a drop followed by a decade of stable prices. Then the prices showed a steady
increasing trend.
22 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

2.9 a Time series plot for total U.S. energy consumption:

b Total U.S. energy consumption rose from 1990 to 1996, then fell until 1998, then rose again to
1999.
c Time series plot of hydroelectric power consumption as a percentage of total consumption:

The percentage of hydroelectric power fell from 1990 to 1994, then rose again until 1997, after
which it declined again.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 23

2.11 a Time series plot for urban and rural federal-aid highways:

b From 1980 to 1998, the highways constructed in urban areas (in thousand miles) have increased
steadily and those in the rural areas have decreased steadily. However, rural area highways still
represent a major portion of federally funded highways, consisting of about four times that of
urban miles.
2.13 Time series plots for age-adjusted deaths caused various diseases:

The age-adjusted death rate per 100,000 population due to heart disease decreased steadily until the
year 2001, reducing it to about half of what it was in 1960. The death rate due to cancer, on the other
hand, showed a slight increase until 1995 and then a steady decrease. The death rate due to diabetes
showed a slight decrease with some fluctuations. The death rates due to influenza and liver disease
showed a slight decrease over the years.
24 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

2.3 Scatterplots: Graphical Analysis of Association between


Measurements
2.15 a Scatterplot of median weekly earnings vs years of schooling:

There is extremely strong positive correlation between median weekly earnings and years of school-
ing. As the years of schooling increase so do the median earnings. Although there is a slight
curvature, a linear fit seems like a very good fit.
b Scatterplot of unemployment vs years of schooling:

There is very strong negative correlation between the years of schooling and unemployment rate.
The higher the number of years of schooling, the lower the unemployment rate. Linear will be a
good fit but a non-linear fit (like a quadratic) will be even better.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 25

2.17 a Scatterplot of per capita expenditures vs per capita tax revenue:

b There is a moderate positive correlation between the per capita expenditures and per capita tax
revenue. Generally, higher per capita tax revenue resulted in an increased per capita expenditure.
There is one outlier that distorts the data.
c Scatterplot of the difference in expenditures and taxes vs expenditures:

There does not seem to be any relation between expenditure and the differences. There is one
outlier state that gives a wrong impression of positive relation.
26 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

d Scatterplot of per capita expenditures vs per capita tax revenue with Alaska removed:

The scatterplot shows a positive relation between expenditure and tax revenue.
2.19 a Scatterplot of rankings by executives versus rankings by professors:

b There is a strong positive relation between ranking by executives and professors. In general skills
ranked by professors are also ranked higher by executives.

2.4 Correlation: Estimating the Strength of a Linear Relation


35.385
2.21 a r= = 0.708
50
b There is a strong positive correlation between the death rates in the two years. States that had
high motor vehicle death rates in 1980 tended to have high death rates in 2002. States with low
death rates in 1980 tended to have low death rates in 2002.
c The linear association and presence of possible outliers is easily visible from scatterplot but cannot
be assessed from the correlation coefficient.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 27

d Strength of relation is easily determined from the value of the correlation coefficient but not
necessarily from the scatterplot.
−0.993
2.23 a rREV/EXP = = −0.020
49
−4.437
rREV/TAX = = −0.091
49
32.479
rTAX/EXP = = 0.663
49

Revenue Taxes Expenditures


Revenue 1.000
Taxes -0.091 1.000
Expenditures -0.020 0.663 1.000
b There is extremely weak negative correlation between revenue and taxes and between revenue and
expenditure, however there is fairly strong positive correlation between expenditure and taxes.
c States Alaska, California, Maryland, and New York are possible outliers affecting the relation,
weakening the strength of the calculated relationship.
29.9692
2.25 r = = 0.789
39

The correlation coefficient of 0.789 and the scatterplot from exercise 2.19 indicate a strong positive
relationship between professors’ responses and executives’ responses.
−7.87
2.27 rmesh/wout = = −0.984
8
−7.94
rmesh/with = = −0.993
8
−7.72
rmesh/diff = = −0.965
8
a r = −0.984 There is a very strong negative correlation between mesh size and iterations without
reconjugation. (see tables)
b r = −0.993 There is a very strong negative correlation between mesh size and iterations with
reconjugation. (see tables)
c r = −0.965 There is a very strong negative correlation between mesh size and the difference in
iterations without and with conjugation.

2.5 Regression: Modeling Linear Relationships


2.29 a x̄tax = 2282 ȳexp = 4186
X X
(xi − x̄)(yi − ȳ) = 32077142 (xi − x̄)2 = 92791986
P
(xi − x̄)(yi − ȳ) 32077142
β̂1 = = = 0.346
(xi − x̄)2
P
92791986
β̂0 = ȳ − β̂1 x̄ = 4186 − (0.346)(2282) = 3397

REGRESSION EQUATION: Expenditures = 3397 + 0.346 ∗ (Taxes)


28 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

b A 1 dollar increase in per capita tax revenue tended to result in an increased per capita expenditure
of about 35 cents.

2.31 a x̄prof = 3.7228 ȳexec = 3.9026


X X
(xi − x̄)(yi − ȳ) = 7.419 (xi − x̄)2 = 10.869
P
(xi − x̄)(yi − ȳ) 7.419
β̂1 = = = 0.683
(xi − x̄)2
P
10.869
β̂0 = ȳ − β̂1 x̄ = 3.9026 − (0.683)(3.7228) = 1.36

REGRESSION EQUATION: Executives = 1.36 + 0.683 ∗ (Professors)


b For every unit increase in rank by professors, the rank by executives tended to increase by about
0.683.
c Executives = 1.36 + 0.683(4.0) = 4.092
2.33 a Scatterplot of body fat vs body density:

There is a very strong negative linear relationship between body fat and body density. The
relationship is linear.
b x̄density = 1.0449 ȳ%fat = 23.88
X X
(xi − x̄)(yi − ȳ) = −2.619 (xi − x̄)2 = 0.00568
P
(xi − x̄)(yi − ȳ) −2.619
β̂1 = = = −461
(xi − x̄)2
P
0.00568
β̂0 = ȳ − β̂1 x̄ = 23.88 − (−461)(1.0449) = 505

REGRESSION EQUATION: % Fat = 505 − 461 ∗ (Density)


c % Fat = 505 − 461(1.050) = 20.95
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 29

2.35 a x̄static = 15.50 ȳflothru = 9.45


X X
(xi − x̄)(yi − ȳ) = 1546.55 (xi − x̄)2 = 2360.24
P
(xi − x̄)(yi − ȳ) 1546.55
β̂1 = = = 0.655
(xi − x̄)2
P
2360.24
β̂0 = ȳ − β̂1 x̄ = 9.45 − (0.655)(15.50) = −0.71

REGRESSION EQUATION: FloThru = −0.71 + 0.655 ∗ (Static)


b Scatterplot and regression line of flow-through LC50 vs static LC50:

From the scatterplot we can see that the data are separated into 3 distinct clusters. It is difficult
to determine if the relationship is a linear one because of the lack of data in between the clusters.

2.6 The Coefficient of Determination


2.37 a SSyy = 1207.3 SSE = 0.8
SSyy − SSE 1207.3 − 0.8
r2 = = = 99.9%
SSyy 1207.3
 X
1
r= zx zy = −0.9997
n−1
b There is an extremely strong negative linear association between the % body fat and the body
density. Almost all (99.9%) of variation in the % body fat can be explained using a linear relation
between the % body fat and body density.
30 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

2.39 a Scatterplots of y1 vs x and y2 vs x:

b The weigh-in-motion readings seem to have a much stronger linear relationship with the static
weights of vehicles after calibration than before.
 X
1 8.685
c ry1 vsx = z x zy = = 0.965
n−1 9
 X
1 8.964
ry2 vsx = z x zy = = 0.996
n−1 9
The correlation of weigh-in-motion readings and static weight is much stronger after calibra-
tion than before calibration. This means that the weigh-in-motion readings after calibration are
better predictors of the static weight.
d Yes. If the r-value is 1, then there is an exact linear relationship between y2 and x; i.e. y2 = a+bx.
However, unless a = 0 and b = 1, the readings could still disagree. (e.g. if the weigh-in-motion
always gave exactly half of the static weight.)
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 31

2.41 a x̄police = 1016.6 ȳarrests = 1059.5


X X
(xi − x̄)(yi − ȳ) = 35215533 (xi − x̄)2 = 27653074.4
P
(xi − x̄)(yi − ȳ) 35215533
β̂1 = = = 1.2735
(xi − x̄)2
P
27653074.4
β̂0 = ȳ − β̂1 x̄ = 1059.5 − (1.2735)(1016.6) = −235.1159

REGRESSION EQUATION: Arrests in 1982 = −235.1159 + 1.2735 ∗ (NumberofLawEnforcement)


b Hypotheses: H0 : β2 = 0 Ha : β 2 > 0
Test Statistics: Assume that the errors are independent,normal (0, σ 2 ).
(35, 215, 533)2
45, 917, 560.5 −
SSE 27, 653, 074 1, 071, 417.4
s2 = = = = 133, 927.026
n−2 8 8
1.2735
t= √ √ = 18.30
133, 927.026/ 27, 653, 074.4
Rejection Region: t > t0.05 = 1.860 (degrees of freedom = 8)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the mean
number of arrests increases as the number of law enforcement employees increases at α = 0.05.
SSE 1, 071, 417.4
c r2 = 1 − =1− = 0.9767
SSyy 45, 917, 560.5
About 97.7% of the variability in the number of drug arrests is accounted for by a linear relation-
ship with the number of law enforcement employees at α = 0.05.
2.43 Since x1 and x2 both have a strong positive correlation, we would expect the slope of the regression
line for x1 and y to have the same sign as the slope of the regression line for x2 and y.

2.7 Residual Analysis: Assessing the Adequacy of the Model


2.45 a Residual plot for per capita expenditures vs per capita tax revenue:

b Most of the residuals seem to have a negative linear correlation. Also, there is one distant outlier.
32 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

c Because the residuals are not randomly distributed around 0, a linear relationship may not be
appropriate.

2.47 a Residual plot for executive ranking vs professor ranking:

b The residual plot indicates the possibility of two groups. The first part indicates a decreasing
trend among the residuals. The latter part shows no pattern.
c A linear model may not be appropriate. The possibility of groups must be investigated.
2.49 a Residual plot of percent body fat vs body density:

b The residual plot clearly shows a U-shaped pattern among residuals.


c The linear model is not appropriate.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 33

2.51 a Scatterplot of gasoline prices vs crude oil prices:

There is a fairly strong positive linear relation between crude oil and gasoline prices.
b x̄crude = 19.53 ȳgas = 86.87
X X
(xi − x̄)(yi − ȳ) = 1893.59 (xi − x̄)2 = 642.22
P
(xi − x̄)(yi − ȳ) 1893.59
β̂1 = = = 2.95
(xi − x̄)2
P
642.22
β̂0 = ȳ − β̂1 x̄ = 86.87 − (2.95)(19.53) = 29.3

REGRESSION EQUATION: Gasoline = 29.3 + 2.95 ∗ (Crude)


c Residual plot of gasoline prices vs crude oil prices:

d Fairly randomly distributed residuals indicate a linear model may be appropriate.


34 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

2.8 Transformations
2.53 a x̄x/a = 0.337 ȳStress = 289.4
X X
(xi − x̄)(yi − ȳ) = −229.02 (xi − x̄)2 = 0.93
P
(xi − x̄)(yi − ȳ) −229.02
β̂1 = = = −247
(xi − x̄)2
P
0.93
β̂0 = ȳ − β̂1 x̄ = 289.4 − (−247)(0.337) = 343

REGRESSION EQUATION: Stress = 373 − 247 ∗ (x/a)


b SSyy = 68022 SSE = 11446
SSyy − SSE 68022 − 11446
r2 = = = 83.2%
SSyy 68022

83.2 percent of the variation in hoop stress can be explained as a linear relationship with x/a
c Scatterplot of hoop stress vs x/a:

The scatterplot clearly indicates a curvature to the data. Nonlinear is more appropriate.
d Of the transformations discussed in this chapter, power transformation seems to work best. Com-
paring the ln of the hoop stress measurements with the ln of the x/a measurements, we get:

ln Stress = 5.21 − 0.240 ln(x/a)

So, ln(Stress) = 5.21 − 0.240 ln(.30) = 5.499

And, raising both sides as an exponent of e, Stress = e5.499 = 244.45

So, when x/a is 0.30, the hoop stress should be around 244.45, according to our model.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 35

2.55 a Scatterplots of relative water depth vs tidal velocity at stations 16 and 18:

b No. There is a clear curvature in all three scatterplots.


36 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

c Using an exponential transformation we get:

Station 16: ln Relative Water Depth = -7.417 + 6.295 Velocity

Station 18: ln Relative Water Depth = -6.599 + 5.738 Velocity

2.10 Supplementary Exercises

2.57 a Scatterplots and correlation coefficients indicate relationships of varying strengths.


b ln Pb = −5.64 + 0.584 CO
ln Pb = −9.47 + 349 NO2
ln Pb = −10.6 + 72.7 Ozone
ln Pb = −6.27 + 528 SO2
NO2 = 0.0126 + 0.0013 CO
Ozone = 0.0820 + 0.0057 CO
SO2 = 0.0017 + 0.0010 CO
Ozone = 0.0334 + 3.9486 NO2
SO2 = −0.0056 + 0.6382 NO2
Ozone = 0.0753 + 5.26 SO2
c Time series plots of air quality measures:
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 37

CO and SO2 show steady decline. Pb declined sharply during the first decade and then declined
slowly over the next decade. NO2 remained fairly constant during the first decade and declined
over the next. Overall, ozone level declined, with a few ups and downs.
2.59 Yes. The number of semiactive dampers used increased with the height as well as the number of stories:

ln dampers = 2.42 + 0.0583 Stories


ln dampers = 2.40 + 0.0131 Height
2.61 a From a scatterplot we see that PDA% and limit load have a strong negative linear correlation.
b x̄LL = 0.428 ȳPDA = 50.00
X X
(xi − x̄)(yi − ȳ) = −49.5 (xi − x̄)2 = 0.417
P
(xi − x̄)(yi − ȳ) −49.5
β̂1 = = = −119
(xi − x̄)2
P
0.417
β̂0 = ȳ − β̂1 x̄ = 50.00 − (−119)(0.428) = 101

REGRESSION EQUATION: PDA = 101 − 119 ∗ (LimitLoad)


38 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

2.63 a Time series plot of number of households in the US from 1970 to 2002:

The number of households in the US has increased steadily from 1970 to 2002.
b Bar chart of percentage of household sizes in 1900 and 2002:

The distribution of household sizes in 2000 is right skewed with a median household size of 2
persons. Household sizes in 1900 are less skewed with a median household size of 4 persons, and
a large number of households with 7 or more persons. In 1900 about 20% of households had 7 or
more persons, compared to 1.4% in 2000. In 1900, only about 5% of households had 1 person,
compared to over 25% in 2000. Overall, household size has decreased considerably.
c The number of households has increased, but the number of persons per household has decreased.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 39

2.65 a Scatterplot of zeta potential against the solution pH for each NaNO3 concentration:

b The pH and the zeta potential display strong negative linear correlations for both concentrations.
40 Chapter 2: Exploring Bivariate Distributions and Estimating Relations

c For 0.01M NaNO3 :

x̄pH = 5.80 ȳzeta = 0.43


X X
(xi − x̄)(yi − ȳ) = −363.6 (xi − x̄)2 = 17.28
P
(xi − x̄)(yi − ȳ) −363.6
β̂1 = = = −21.0
(xi − x̄)2
P
17.28
β̂0 = ȳ − β̂1 x̄ = 0.43 − (−21.0)(5.80) = 122

REGRESSION EQUATION: Zeta = 122 − 21.0 ∗ (pH)

For 0.001M NaNO3 :

x̄pH = 5.81 ȳzeta = −8.22


X X
(xi − x̄)(yi − ȳ) = −499.8 (xi − x̄)2 = 24.45
P
(xi − x̄)(yi − ȳ) −499.8
β̂1 = P 2
= = −20.4
(xi − x̄) 24.45
β̂0 = ȳ − β̂1 x̄ = −8.22 − (−20.4)(5.81) = 110

REGRESSION EQUATION: Zeta = 110 − 20.4 ∗ (pH)


Chapter 3

Obtaining Data

3.1 Overview of Methods of Data Collection


3.1 a Observational - The data was from past records where other factors were not necessarily controlled.
b We cannot necessarily make generalizations about how the units will perform in the future, but
we may be able to determine the liklihood that the outcomes occurred by chance alone.

3.3 a Experiment - The data was gathered in a controlled environment to compare before and after the
classes.
b Different engineers and inspectors may have different levels of knowlede. The pretest was given
to determine the existing level of knowledge, in order to measure the level of improvement as a
result of training.

3.5 Experiment - The data was gathered in a controlled manner to compare the four thread types.

3.2 Planning and Conducting Surveys


3.7 Because people may not wish to disclose intimate personal details about their lives, or may have a
tendency to lie about them, randomizing the choice of individuals will probably not remove this type
of bias.
3.9 It is possible that a household’s income affects the response rate. Since the foods consumed in a
household may also be affected by income, certain foods may be over or under represented in the
survey.

3.11 e.g. the computer system may help alleviate bias from people who may be embarrassed or otherwise
unwilling to report having watched certain shows.

3.4 Planning and Conducting an Observational Study


3.13 E.G. Engineers should randomly choose 3 of the 12 landfills for each type of cover, use the assigned
cover type at selected landfills and observe percolation rates at each landfill.
3.15 E.G. Randomly assign available devices that use this propellant to three groups. Run those in group 1
at 600◦ C, those in group 2 at 800◦ C, and those in group 3 at 100◦ C for the same time, keeping all other
factors as similar as possible. Record the amount of wear on each device and compare by temperature.

41
42 Chapter 3: Obtaining Data

3.17 a E.G. Divide the 30 small businesses into two groups randomly. Tell one group that they would be
charged peak hour rates and tell the other group that they would be charged the regular rates.
Measure their electricity usage and compare.
b No. The businesses would need to be informed of the charges the utility was imposing.

3.5 Supplementary Exercises


3.19 a Reaction yield.
b Reaction temperatures: 100, 150 and 200◦ F
c Randomly assign the three temperatures: 100, 150 and 200◦ to the twelve time slots. Run
experiments at the assigned temperatures, measure the yield and compare.
d Four
3.21 a Observational. The study treatments were not randomized or controlled.
b If true, then there is a confounding factor and the women who breast-fed for more than a year
have a lower chance of diabetes because they happen to lead a healthy lifestyle, not because they
breast-fed.
3.23 a Observational.
b No, this is not an experiment.
c Sodium is a confounding factor in the study
d No, the study was limited to mostly white female nurses, which is not a representative sample.
3.25 a Sample survey. The treatments were not randomly assigned or controlled.
b No. It is possible that obesity causes sleeplessness or that healthy people just happen to sleep
more than obese people.
Chapter 4

Probability

4.2 Sample Space and Relationships Among Events


4.1 a Occupation Male Female Total
Engineer 2.579 0.121 2.700
Computer specialist 0.484 0.216 0.700
Note: All numeric values are in millions.
Occupation White Black Asian Native American Other Total
Engineer 2.446 0.043 0.151 0.011 0.049 2.7
Computer Specialist 0.618 0.026 0.046 0.001 0.009 0.7
Note: All numeric values are in millions.

4.3 Printer
No Yes Total
No 13 7 20
Modem Yes 2 3 5
Total 15 10 25

4.5 Verification of A(B ∪ C) = AB ∪ AC:

Dark grey region is A(B ∪ C)

43
44 Chapter 4: Probability

Shaded region is AB ∪ AC

Verification of A ∪ (BC) = (A ∪ B)(A ∪ C):

Shaded region is A ∪ (BC)

Dark grey region is (A ∪ B)(A ∪ C)

4.3 Definition of Probability


4.7 Discussion answers. Part (c) is definitely a subjective statement since there is no scientific basis to
prove or disprove the probability.
Chapter 4: Probability 45

4.9 a L = left turn, R = right turn, S = straight


1
b P(L) = P(R) = P(S) =
3
1 1 2
c P(turn) = P(L ∪ R) = P(L) + P(R) = + =
3 3 3
1
4.11 a
3
1 1 6
b + =
3 15 15
1 1 19
c + =
3 16 48
1 1 2
d + =
3 3 3
4.13 Let B denote the event that the assembly has a bushing defect and S the event of a shaft defect.
a P (B) = 0.06 + 0.02 = 0.08
b P (B ∪ S) = 0.06 + 0.02 + 0.08 = 0.16
c P (BS ∪ SB) = 0.06 + 0.08 = 0.14
d P (B ∪ S) = 1 − 0.16 = 0.84

4.4 Counting Rules Useful in Probability


4.15 Denote an outcome as an ordered pair of Roman numerals, where the first Roman numeral of the
pair signifies the entrance the first customer used and the second signifies the entrance the second
customer used.
a The simple events are (I, I), (I, II), (II, I), (II, II).
1
b P(both use door I) = P[(I, I)] =
4
1 1 1
P(both use same door) = P[(I, I)] + P[(II, II)] = + =
4 4 2
4.17 P(both are nondefective) = P(nondefective on first draw)P(nondefective on second draw)
 
3 2 3
= =
5 4 10
 
number of ways three nondefectives
can be chosen
4.19 P(none are defective) =  
total number of ways a sample
of three can be chosen
 
4
3 4 1
= = =
6 20 5
3
4.21 a The number of ways of partitioning eight taxis into three groups containing 2, 5, and 1 taxis is

8! 8 · 7 · 6 · 5! 8·7·6
= = = 168.
2!5!1! 2 · 5! 2
46 Chapter 4: Probability

b There are eight possible choices for the taxi at airport C. Therefore, if the choice is made at
random, P(Jones ends up at airport C) = 1/8.
 
employee ranked number
4.23 a P
one is selected
  
number of ways number of ways two
employee ranked number other employees can be chosen
one can be chosen from the remaining four
=  
total number of ways
 a sample of three 
can be chosen
  
1 4
1 2 3
=   =
5 5
3
 
highest ranked employee  
employee ranked number
b P  among those selected  =1−P
one is selected
has rank ≤ 2
3 2
=1− =
 5 5 
employees ranked 4 and 5
c P
are selected
  
number of ways number of ways one
 employee ranked  other employee can be chosen
4 and 5 can be chosen from the remaining three
=  
total number of ways
 a sample of three 
can be chosen
  
2 3
2 1 3
=   =
5 10
3
 
number of ways three
 orders may go to 
different distributors
 
all orders go to
4.25 a P =
different distributors (total number of outcomes)
P35 5·4·3 60
= 3 = = = 0.48
5 125 125  
number of ways to choose the
 distributor who will receive 
all three orders
 
all orders go to
b P =
same distributor (total number of outcomes)
 
5
1 5 1
= 3 = = = 0.04
5 125 25
Chapter 4: Probability 47

     
exactly two of the number of ways to choose the  total
c P three orders go to one =  distributor who will receive  number of
particular distributor the two orders outcomes
  
number of ways the other number of ways of choosing the
 order may go to any of  two orders going to the particular
the other four distributors distributor from the three others
×
(total number of outcomes)
   
5 4 3
1 1 2 5·4·3 60
= 3
= = = 0.48
5 125 125
4.27 a The number of ways of partitioning nine wrenches into three groups, each containing 3 wrenches,
is
9!
= 1, 680
3!3!3!
 
number of ways of partitioning the seven new
wrenches into groups of 1, 3, and 3 wrenches
b Required probability =  
number of ways of partitioning nine
wrenches into three equal groups
 
7!
1!3!3! 140
= = = 0.0833
9! 1, 680
3!3!3!

4.5 Conditional Probability and Independence


4.29 Let D = event that the person has the disease, and T = event of a test result positive for the
disease.
P (D)P (T |D) (0.01)(0.90)
P (D|T ) = =
P (D)P (T |D) + P (D)P (T |D) (0.01)(0.90) + (0.99)(0.10)
= 0.0833
4.31 Let E = the event that a selected executive has an engineering degree,
Let M = the event that a selected executive has a master’s degree,
Let B = the event that a selected executive has a bachelor’s degree.
172 + 55 + 7 234
a P (E) = = = 0.734
172 + 55 + 7 + 28 + 56 + 1 319
P (M E) 55/319 55
b P (M |E) = = = = 0.235
P (E) (172 + 55 + 7)/319 234
P (EB) 172/319 172
c P (E|B) = = = = 0.86
P (B) (172 + 28)/319 200
Since P (E|B) 6= P (E), we can conclude that having an engineering degree and having a bachelor’s
degree are not independent.
P (AB) 0.15
4.33 a P (A|B) = = = 0.5
P (B) 0.3
b No, since P (A|B) 6= P (A), we can conclude that A and B are not independent.
48 Chapter 4: Probability

4.35 Let L = the event that a selected keyboard has a faulty letter key.
Let S = the event that a selected keyboard is produced in the South Carolina facility.
15 + 75 90
P (L) = = = 0.36
15 + 75 + 45 + 30 + 40 + 45 250
P (LS) 75/250 75
P (L|S) = = = = 0.50
P (S) (75 + 30 + 45)/250 150
Since P (L) 6= P (L|S), we can conclude that the events of selecting a keyboard with a faulty letter key
and selecting a keyboard produced in South Carolina are not independent.

4.6 Rules of Probability


4.37 Denote an outcome as an ordered pair of Roman numerals, where the first Roman numeral of
the pair signifies the firm (either I, II, or III) that receives the computer-paper contract, and the
second signifies the firm that receives the disks contract. The sample space is {(I, I), (I, II), (I,
III), (II, I), (II, II), (II, III), (III, I), (III, II), (III, III)}. Assume that the outcomes are equally
likely.
P [(I, II) ∪ (I, III) ∪ (II, I) ∪ (III, I)] 4/9 2
a = =
P [(I, II), (I, III), (II, I), (II, III), (III, I), (III, II)] 6/9 3
b P[(I, I)] = 1/9

P [(I, II), (I, III)] 2/9 1


c = =
P [(I, II), (I, III), (II, II), (II, III), (III, II), (III, III)] 6/9 3
4.39 a 0.16
b 1 − 0.16 = 0.84
c 0.19 + 0.42 + 0.35 = 0.96
d No.
 
18

2 18 · 17
4.41 a  = = 0.8053
20
 
20 · 19
2
b P(at least one is nondefective) = 1 − P(both are defective)
 
2  
2 2 378
=1−   =1− = = 0.9947
20
 
20 · 19 380
2
P (AB) P (A) 0.8053
c P (A|B) = = = = 0.8096
P (B) P (B) 0.9947
4.43 a Let Ai = event that ith resistor has resistance between 9.5 and 10.5 ohms,
i = 1, 2. Then
P (Ai ) = 1 − 0.05 − 0.10 = 0.85, and
P (both have actual values between 9.5 and 10.5)
= P (A1 A2 ) = P (A1 )P (A2 ) = (0.85)(0.85) = 0.7225.
Chapter 4: Probability 49

b Let Ei = event that ith resistor has resistance in excess of 10.5 ohms,
i = 1, 2. Then P (at least one has an actual value greater than 10.5)
= P (E1 E2 ∪ E1 E 2 ∪ E 1 E2 ) = P (E1 )P (E2 ) + P (E1 )P (E 2 ) + P (E 1 )P (E2 )
= (0.1)(0.1) + 2(0.1)(0.9) = 0.19 or
P (at least one has an actual value greater than 10.5)
= 1 − P (E 1 E 2 ) = 1 − (0.9)2 = 0.19.
4.45 Let Ci = event that relay i closes properly, i = 1, 2. Then
P (current flows in series system)
= P (both relays are closed)
= P (C1 C2 ) = P (C1 )P (C2 ) = (0.9)(0.9) = 0.81
P (current flows in parallel system)
= P (at least one of the relays is closed)
= P (C1 ∪ C2 ) = P (C1 ) + P (C2 ) − P (C1 C2 )
= 0.9 + 0.9 − (0.9)(0.9) = 0.99
4.47 Let F denote the event that a worker fails to learn the skill correctly.
P (A)P (F |A) (0.70)(0.20)
P (A|F ) = =
P (F |A)P (A) + P (F |B)P (B) (0.20)(0.70) + (0.10)(0.30)

= 0.8235
4.49 a The column percentages are based on the column totals and hence do not predict the percentages
of the work force as a whole.
b 2.385 million.
c Combined category
Type of Employer Percentage
Industry 59.25
Educational institution 21.57
Nonprofit organization 3.72
Federal government 8.84
Military 0.62
Other government 4.46
Other and unknown 1.53
This distribution is not as skewed as the one for engineers. It is similar to the overall distribution
of employment by type of employer.
d 0.514 million.
e The distributions for physical scientists and for social scientists are similar, whereas the one for
engineers is skewed.
50 Chapter 4: Probability

4.7 Odds and Odds Ratios


4.51 a Yes, the cholesterol levels appear balanced between the two groups.
b
Cholesterol Aspirin or Placebo

12
≤ 159
788
49
160 − 209
3, 098

69
210 − 259
2, 879

37
≥ 260
1, 152

Cholesterol Odds Ratio


   
2 9
≤ 159 / = 0.236
382 406
   
12 37
160 − 209 / = 0.309
1, 587 1, 511
   
26 43
210 − 259 / = 0.608
1, 435 1, 444
   
14 23
≥ 260 / = 0.596
582 570

c It appears that the aspirin helps to prevent M.I.’s as long as the subject has a low cholesterol
level. Otherwise, as the cholesterol level increases, aspirin appears to lose its effectiveness.
24 3
4.53 a P (P |M ) = =
24 + 16 5
24 2
b P (M |P ) = =
24 + 36 5
  
24 24 + 36 24 + 16
c P (P M ) = = = P (P )P (M ). Therefore, events P and M are inde-
100 100 100
pendent.
  
36 60 36 + 24
d P (P F ) = = = P (P )P (F ). Therefore, events P and F are independent.
100 100 100
Chapter 4: Probability 51

4.8 Supplementary Exercises

4.55 Denote a sample consisting of the ith defective and the j th nondefective as
Di Ni (i = 1, 2, 3, j = 1, 2, 3, 4), two nondefectives as
Nj Nj ′ (j, j ′ = 1, 2, 3, 4, j 6= j ′ ), and two defectives as
Di Di′ (i, i′ = 1, 2, 3, i 6= i′ ).
a The outcomes are
N1 N2 N2 N3 N3 N4 N4 D 1 D1 D2
N1 N3 N2 N4 N3 D 1 N4 D 2 D1 D3
N1 N4 N2 D 1 N3 D 2 N4 D 3 D2 D3
N1 D 1 N2 D 2 N3 D 3
N1 D 2 N2 D 3
N1 D 3
b A = {N1 N2 , N1 N 3 , N1 N 4 , N2 N 3 , N2 N 4 , N3 N 4 }
c Assigning equal probabilities to the 21 outcomes, we have
1 2
P (A) = 6 · =
21 7
4.57 a Using the multiplication rule, there are n1 n2 = 6 · 6 = 36 outcomes in S.
b Denote the event of rolling i on the first die and j on the second die as (i, j), i = 1, ..., 6, j = 1, ..., 6,
and denote the event of rolling a seven as A. Then A = {(1,6), (6,1), (5,2), (2,5), (3,4), (4,3)}.
Assigning equal probabilities to each of the elements in S, we then have
1 1
P (A) = 6 · = .
36 6
4.59 Denote the event that the person is traveling on business as B, on a major airline as M, on a
private airline as R, and on a commercially owned plane not belonging to an airline as C. Note
that P(M) + P(R) + P(C) = 1
a P(B) = P(BM) + P(BR) + P(BC)
= P(M)P(B|M) + P(R)P(B|R) + P(C)P(B|C)
= (0.6)(0.5) + (0.3)(0.6) + (0.1)(0.9) = 0.57
b P(BR) = P(R)P(B|R) = (0.3)(0.6) = 0.18
c P(B|C) = 0.9
d From parts (a) and (b) we have
P (BR) 0.18
P (R|B) = = = 0.3158.
P (B) 0.57
4.61 Using the multiplication rule,
9 · 10 · 10 · 10 · 10 · 10 · 10 = 9,000,000
4.63 Using the multiplication rule, 3 · 3 · 2 = 18 experimental runs are needed.
  
number of ways of number of ways of
drawing an ace drawing a face card
4.65 a P(draw an ace and face card) =  
total number of ways
of drawing two cards
4 · 12
=   = 0.0362
52
2
52 Chapter 4: Probability

   
number of ways of 13
drawing 5 spades 5
b P(draw 5 spades) =   =   = 0.000495
total number of ways 52
of drawing 5 cards 5
 
number of ways of drawing
(number of suits)
5 cards from a given suit
c P(draw 5 cards of the same suit) =  
total number of ways
of drawing 5 cards
 
13
4
5
=   = 0.001981
52
2
1 1 1 1 1
4.67 P(match) = P(two tails) + P(two heads) = · + · =
2 2 2 2 2

a Denote a match in the ith trial as Mi , i = 1, 2, 3. Then


 3
1 1
P (M1 M2 M3 ) = P (M1 )P (M2 )P (M3 ) = = .
2 8
b Denote a tail in the ith toss as Ti , i=1, ..., 6. Then
 6
1 1
P (T1 T2 T3 T4 T5 T6 ) + P (T1 )P (T2 ) . . . P (T6 ) = = .
2 64
1
c No, since the probability of answering true (or false) without collusion is, hopefully, not , but
2
depends on the correct answer.
4.69 Denote the events of exposure as E, inoculation as I, and having flu as F.
P(inoculated person gets flu) = P (EF |I) = P (E)P (F |IE) = (0.6)(0.2)
= 0.12
P(noninoculated person gets flu) = P (EF |I) = P (E)P (F |IE) = (0.6)(0.9)
= 0.54
Denote the event that the inoculated person gets the flu as IF , and the event that the noninocu-
lated person gets the flu as NF . Then

P(at least one person gets flu) = P (IF N F ∪ I F NF ∪ IF NF )


= P (IF )P (N F ) + P (I F )P (NF ) + P (IF )P (NF )
= (0.12)(1 − 0.54) + (1 − 0.12)(0.54) + (0.12)(0.54)
= 0.5952
Chapter 4: Probability 53

4.71 Let Ni , Si , Ei , Wi denote, respectively, the events that the patrolman goes north, south, east, and
west, and the ith intersection, i = 1, 2, ... . Then P (Ni ) = P (Si )
1
= P (Ei ) = P (Wi ) = .
4
a P(patrolman reaches boundary in eight blocks)
= P [(N1 N2 N3 N4 N5 N6 N7 N8 ) ∪ (S1 S2 S3 S4 S5 S6 S7 S8 )
∪(E1 E2 E3 E4 E5 E6 E7 E8 ) ∪ (W1 W2 W3 W4 W5 W6 W7 W8 )]
= P (N1 N2 N3 N4 N5 N6 N7 N8 ) + P (S1 S2 S3 S4 S5 S6 S7 S8 )
+P (E1 E2 E3 E4 E5 E6 E7 E8 ) + P (W1 W2 W3 W4 W5 W6 W7 W8 )
  8  7
1 1
=4 =
4 4
b If the patrolman initially goes north, there are nine unique routes by which he can return to the
starting point after walking exactly four blocks; i.e., he can take any of the following routes:
N 1 N 2 S3 S4 N 1 E2 S3 W 4 N 1 W 2 S3 E4
N 1 S2 N 3 S4 N 1 S2 S3 N 4 N 1 S2 E 3 W 4
N 1 S2 W 3 E 4 N 1 E 2 W 3 S4 N 1 W 2 E3 S4
Similarly, for each of the other three directions he can take initially, there are nine unique routes
that will return him to the starting point. Therefore, P(returning to the starting point in four
blocks)
 4
1
=(number of distinct routes) × P(given distinct route) = 4 · 9 ·
4
 3
1
=9 .
4
4.73 Eight minutes are available for typing blood, allowing time for screening up to four donors. Denote
the event of the ith donor having Rh-positive blood as Ai , i = 1, 2, 3, 4. Then the probability that the
victim will be saved

= P (A1 ∪ A1 A2 ∪ A1 A2 A3 ∪ A1 A2 A3 A4 )
= P (A1 ) + P (A1 A2 ) + P (A1 A2 A3 ) + P (A1 A2 A3 A4 )
= (0.4) + (0.6)(0.4) + (0.6)2 (0.4) + (0.6)3 (0.4) = 0.8704.
P [(A ∪ B) ∩ C] P (AC ∪ BC)
4.75 P [(A ∪ B)|C] = = (by the distributive law)
P (C) P (C)
P (AC) + P (BC) − P (ABC) P (AC) P (BC) P (ABC)
= = + −
P (C) P (C) P (C) P (C)
= P (A|C) + P (B|C) − P (AB|C)
54 Chapter 4: Probability

4.77 Denote a head on toss i as Hi , tails as Ti , i = 1, 2. Then


1
P (A) = P (H1 ) =
2
1
P (B) = P (H2 ) =
2
1 1 1 1 1
P (C) = P (H1 H2 ) + P (T1 T2 ) = · + · =
2 2 2 2 2
1 1
P (AB) = P (H1 H2 ) = · = P (A)P (B)
2 2
1 1
P (AC) = P (H1 H2 ) = · = P (A)P (C)
2 2
1 1
P (BC) = P (H1 H2 ) = · = P (B)P (C)
2 2
However,
1 1 1 1 1 1
P (ABC) = P (H1 H2 ) = · 6= P (A)P (B)P (C) = · · = .
2 2 2 2 2 8
Therefore, A, B, and C are not independent.
4.79 P(best tire chosen was ranked third among original eight)
  
number of ways of choosing number of ways of choosing other tires
the tire rated third from among those rated 4th − 8th
=  
total number of ways of
choosing 4 tires from 8
 
5

3 1
=   = .
8 7
4
4.81 Denote the event that relay i doesn’t close properly as Di , i = 1, 2, 3, 4. For design A, the current
doesn’t flow if relays 1 and 2 are open or if relays 3 and 4 are open. Thus, for design A,

P(current flows) = 1 − P (current doesn’t flow)


= 1 − P (D1 D2 ∪ D3 D4 )
= 1 − [P (D1 )P (D2 ) + P (D3 )P (D4 ) − P (D1 D2 D3 D4 )]
= 1 − [(0.1)(0.1) + (0.1)(0.1) − (0.1)4 ] = 0.9801.

For design B, the current flows if relays 1 and 2 are closed or if relays 3 and 4 are closed. Thus,
for design B,

P(current flows) = P (D1 D2 ∪ D3 D4 )


= P (D1 )P (D2 ) + P (D3 )P (D4 ) − P (D1 D2 D3 D4 )
= [(0.9)(0.9) + (0.9)(0.9) − (0.9)4 ] = 0.9639.

Therefore, design A has a higher probability of current flowing when the switch is thrown.
4.83 a P (A ∪ B) ≤ 1
i.e., P (A) + P (B) − P (AB) ≤ 1
i.e., P (AB) ≥ P (A) + P (B) − 1.
Chapter 4: Probability 55

b P(exactly one of the events occurs)

= P (AB ∪ BA)
= P (AB) + P (BA)
= [P (A) − P (AB)] + [P (B) − P (AB)]
= P (A) + P (B) − 2P (AB)

P (AB) P (ABC)
4.85 P (A)P (B|A)P (C|AB) = P (A) · · = P (ABC)
P (A) P (AB)
4.87 LetN = the event that a selected individual knows nothing at all about engineering.
LetV = the event that a selected individual is very unlikely to become an engineer.
LetA = the event that a selected individual knows a lot about engineering.
LetL = the event that a selected individual is very likely to become an engineer.
179 + 132 + 53 + 8 + 4 376
a P (V ) = = = 0.664
566 566
179/566 179
b P (N |V ) = = = 0.476
(179 + 132 + 53 + 8 + 4)/566 376
10/566
c P (L|A) = = 0.417
24/566
3/566
P (L|N ) = = 0.016
190/566
So, there is a greater chance that one is very likely to become an engineer if one knows a lot
about engineering, than given that one knows nothing at all about engineering.
56 Chapter 5: Discrete Probability Distributions
Chapter 5

Discrete Probability Distributions

5.1 Random Variables and Their Probability Distributions


  
number of ways of number of ways of
choosing 3 out of 4 choosing 0 out of 6
5.1 P (X = 0) = P (3 males chosen) =  
total number of ways
of choosing 3 out of 10
  
4 6
3 0 1
=   =
10 30
3
  
4 6
2 1 3
P(X = 1) = P(2 males and 1 female chosen) =   =
10 10
3
  
4 6
1 2 1
P(X = 2) =   =
10 2
3
  
4 6
0 3 1
P(X = 3) =   =
10 6
3

57
58 Chapter 5: Discrete Probability Distributions

 
3
5.3 P(X = 0) = (0.363)0 (0.637)3 = 0.2585
0
 
3
P(X = 1) = (0.363)1 (0.637)2 = 0.4419
1
 
3
P(X = 2) = (0.363)2 (0.637)1 = 0.2518
2
 
3
P(X = 3) = (0.363)3 (0.637)0 = 0.0478
3
This answer assumes independence of up-at-bats. This assumption may not be reasonable, since
pitchers may change between up-at-bats. Boggs might get tired as the game progresses, etc. It
appears that it is not unusual for a good hitter to go 0 for 3 in one game, since the probability of
1
this for Boggs is more than .
4

5.5 a Let F = the event that an accident is fatal


Let N = the event that an accident is non-fatal
The sample space for an experiment involving four accidents is then:
FFFF FFFN FFNF FFNN
FNFF FNFN FNNF FNNN
NFFF NFFN NFNF NFNN
NNFF NNFN NNNF NNNN
Thus, along with the fact that P (F) = 21/100 = 0.21 and P (N) = 1 − 0.21 = 0.79, we can
caluclate the probabilities of each event:
X P (X)
0 P (4 non-fatal) = P (N)4 = (0.79)4 = 0.38950
1 P (3 non, 1 fatal) = 4P (N)3 P (F) = 4(0.79)3 (0.21) = 0.41415
2 P (2 non, 2 fatal) = 6P (N)2 P (F)2 = 6(0.79)2 (0.21)2 = 0.16514
3 P (1 non, 3 fatal) = 4P (N)P (F)3 = 4(0.79)(0.21)3 = 0.02926
4 P (4 fatal) = P (F)4 = (0.21)4 = 0.00194
b P (At least one fatal) = 1 − P (0) = 1 − 0.38950 = 0.61050
   x  3−x
3 1 2
5.7 p(x) = ; x = 0, 1, 2, 3
x 3 3
   y  3−y
3 1 14
p(y) = ; y = 0, 1, 2, 3
y 15 15
x p(x) y p(y)
0 8/27 0 2,744/3,375
1 12/27 1 588/3,375
2 6/27 2 42/3,375
3 1/27 3 1/3,375

P (X + Y = 0) = P (X = 0)P (Y = 0)

8 2744
= · = 0.24090
27 3375

P (X + Y = 1) = P (X = 0)P (Y = 1) + P (X = 1)P (Y = 0)

8 588 12 2744
= · + · = 0.41297
27 3375 27 3375
Chapter 5: Discrete Probability Distributions 59

P (X + Y = 2) = P (X = 1)P (Y = 1) + P (X = 2)P (Y = 0)
+ P (Y = 2)P (X = 0)

12 588 6 2744 8 42
= · + · + · = 0.26179
27 3375 27 3375 27 3375

P (X + Y = 3) = P (X = 0)P (Y = 3) + P (X = 1)P (Y = 2)
+ P (X = 2)P (Y = 1) + P (X = 3)P (Y = 0)

8 1 12 42 6 588 1 2744
= · + · + · + + = 0.07445
27 3375 27 3375 27 3375 27 3375

P (X + Y = 4) = P (X = 1)P (Y = 3) + P (X = 2)P (Y = 2)
+ P (X = 3)P (Y = 1)

12 1 6 42 1 588
= · + · + · = 0.00935
27 3375 27 3375 27 3375

P (X + Y = 5) = P (X = 2)P (Y = 3) + P (X = 3)P (Y = 2)

6 1 1 42
= · + · = 0.00053
27 3375 27 3375
1 1
P (X + Y = 6) = P (X = 3)P (Y = 3) = · = 0.00001
27 3375
  
number of ways of number of ways of
choosing x from 2 choosing 2 − x from 2
5.9 a. p(x) =  
total number of ways of
choosing a sample of 2 from 4
  
2 2
x 2−x
=   x = 0, 1, 2; i.e.,
4
2
x p(x)

1
0
6
2
1
3
1
2
6
60 Chapter 5: Discrete Probability Distributions

  
1 3
x 2−x
b. p(x) =   x = 0, 1; i.e.,
4
2
x p(x)

1
0
2
1
1
2

c. P(X = 0) = 1

5.2 Expected Values of Random Variables


5.11 Let X = number on the ticket drawn and Gi = net gain for box i, i = I, II. Then Gi = X − 2.
2
     
P 1 1 1
a. E(GI ) = (x − 1)pI (x) = (−1) +0 +1 =0
x=0 3 3 3
2
     
1 1 1 2
E(G2I ) = (x − 1)2 pI (x) = (1)
P
+0 +1 =
x=0 3 3 3 3
2 2
V (GI ) = E(G2I ) − [E(GI )]2 = − 0 =
3  3
2
   
P 3 1 1
b. E(GII ) = (x − 1)pII (x) = (−1) +0 +3 =0
x=0 5 5 5
2
     
3 1 1 12
E(G2II ) = (x − 1)2 pII (x) = (1)
P
+0 +9 =
x=0 5 5 5 5
12 12
V (GII ) = E(G2II ) − [E(GII )]2 = −0=
5 5

c. Box II, since for Box I the highest possible net gain is $1 with a probability of 1/3, but for Box
II the highest possible gain is $3 with a probability of 1/5. Note that V (GI ) < V (GII ); i.e., Box
I net gain varies less from E(Gi ) = 0 than Box II net gain.
Chapter 5: Discrete Probability Distributions 61

5.13 Let X = age of death of a person infected with the AIDS virus through 1995. Then we can
estimate the mean of X from Figure 5.8 by letting the possible values of X be the mean ages for
the different age groups in the pie chart. Hence,
E(X) = 6P (X = 6) + 21P (X = 21) + 34.5P (X = 34.5)
+ 44.5P (X = 44.5) + 54.5P (X = 54.5) + 60P (X = 60)
= 6(0.01) + 21(0.18) + 34.5(0.45) + 44.5(0.25) + 54.5(0.08) + 60(0.04)
= 37.25 (Answers may vary.)
Var(X) = E(X 2 ) − (E(X))2
E(X 2 ) = 36(0.01) + 441(0.18) + 1190.25(0.45) + 1980.25(0.25)
+ 2970(0.08) + 3600(0.04) = 1492.02
Var(X) = 1492.02 − (37.25)2 = 104.458 (Answers may vary.)
1 1
( ) ( )
Std(X) = (Var(X)) 2 = (104.46) 2 = 10.22 (Answers may vary.)
The median should be toward the upper end of the 30−39 age spectrum, thus the median and
mean are comparable.

5.15 E(number of sales) = 0 · p(0) + 1 · p(1) + 2 · p(2) = 0(0.7) + 1(0.2) + 2(0.1) = 0.4
V(number of sales) = 02 · p(0) + 12 · p(1) + 22 · p(2) − [E(number of sales)]2
= 0(0.7) + 1(0.2) + 4(0.1) − (0.4)2 = 0.44
p √
Standard deviation of Sales = V(sales) = 0.44 = 0.6633.
5.17 a. Let X = weekly number of breakdowns. Using Tchebysheff’s theorem, we have
1
P (µ − kσ < X < µ + kσ) ≥ 1 − 2
k
1 √
For 1 − 2 = 0.9, we have k = 10, and thus the desired interval is
k √ √
(µ − kσ, µ + kσ) = [4 − 10(0.8), 4 + 10(0.8)] = (1.4702, 6.5298).

8−µ 8−4
b. Eight breakdowns is = = 5 standard deviations from the mean. The interval
σ 0.8
1
(µ − 5σ, µ + 5σ) or (0, 8) must contain at least 1 − 2 = 0.96 of the probability. Thus, at
5
most 4% of the probability mass can exceed 8 breakdowns and the director is safe in his claim.

5.19 a. Let X = battery performance period. Using Tchebysheff’s theorem, we have


1
P (µ − kσ < X < µ + kσ) ≥ 1 − 2
k
1 √
Solving 1 − 2 = 0.9, for k = 10; thus the desired interval is
√ k √ √ √
(µ − 10σ, µ + 10σ) = [100 − 10(5), 100 + 10(5)] = (84.1886, 115.8114).

80 − 100
b. No, since 80 6⊂ (84.1886, 115.8114). Also, note that 80 is = −4 standard deviations
5
from the mean. Then P (X ≤ µ − 4σ) ≤ P (|X − µ|
1
≥ 4σ) ≤ 2 = 0.0625. Therefore, one would expect less than 6.25% of batteries to die out in less
4
than 80 minutes.
62 Chapter 5: Discrete Probability Distributions

5.4 The Binomial Distribution


 
4
5.21 a. P (X = 2) = (0.2)2 (0.8)2 = 0.1536
2
4
 
4
(0.2)x (0.8)4−x
P
b. P (X ≥ 2) = P (X = 2) + P (X = 3) + P (X = 4) =
x=2 x
     
4 2 2 4 3 4
= (0.2) (0.8) + (0.2) (0.8) + (0.2)4
2 3 4
= 0.1536 + 0.0256 + 0.0016 = 0.1808
2
 
4
(0.2)x (0.8)4−x
P
c. P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2) =
x=0 x
     
4 4 4 1 3 4
= (0.8) + (0.2) (0.8) + (0.2)2 (0.8)2
0 1 2
= 0.4096 + 0.4096 + 0.1536 = 0.9728
d. E(X) = np = 4(0.2) = 0.8
e. V(X) = np(1 − p) = 4(0.2)(0.8) = 0.64

5.23 Let X = number of underfilled boxes. Then X has a binomial distribution with parameters
n = 25, p as given.
a. P (X ≤ 2) = 0.537
b. P (X ≤ 2) = 0.098

5.25 a. E(X) = np = 20(0.8) = 16


b. V(X) = np(1 − p) = 20(0.8)(0.2) = 3.2

5.27 P (X ≥ 5) = 1 − P (X ≤ 4) > 0.9 ⇒ P (X ≤ 4) < 0.1


Using the formula
4
 
n
(0.8)x (0.2)n−x
P
P (X ≤ 4) =
x=0 x
we find
n P (X ≤ 4)
6 0.34464
7 0.14803
8 0.05628

Therefore, at least n = 8 people must donate blood for the probability of having at least 5 Rh+
donors to be greater than 0.9.
Chapter 5: Discrete Probability Distributions 63

5.29 Let X = number of firms out of a sample of five that say ”quality of life” is an important factor.
Then, assuming independence among firms, X has a binomial distribution with parameters n = 5,
p = 0.55, and
5
 
5
(0.55)x (0.45)5−x
P
P (X ≥ 3) =
x=3 x
     
5 3 2 5 4 1 5
= (0.55) (0.45) + (0.55) (0.45) + (0.55)5
3 4 5
= 0.3369 + 0.2058 + 0.0503 = 0.5931.

5.31 Let X = number of radar sets out of n that detect an intruding aircraft. Then X has binomial
distribution with parameters n, and p = 0.9.
 
2
a. P (X ≥ 1) = 1 − P (X = 0) = 1 − (0.9)0 (0.1)2 = 0.99
0
 
4
b. P (X ≥ 1) = 1 − P (X = 0) = 1 − (0.9)0 (0.1)4 = 0.9999
0

5.33 Let X = number of components out of the four that last longer than 1000 hours. The probability
that a given component lasts longer than 1000 hours is 0.8; thus X has a binomial distribution
with parameters n = 4, p = 0.8.
 
4
a. P (X = 2) = (0.8)2 (0.2)2 = 0.1536
2
   
4 0 4 4
b. P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − (0.8) (0.2) − (0.8)1 (0.2)3
0 1
= 1 − 0.0016 − 0.0256 = 0.9728

5.35 Y has a binomial distribution with parameters n = 4, p = 0.1.


E(C) = E(3Y 2 + Y + 2) = 3E(Y 2 ) + E(Y ) + 2
= 3(V (Y ) + [E(Y )]2 ) + E(Y ) + 2 = 3np(1 − p) + 3(np)2 + np + 2
= 3(4)(0.1)(0.9) + 3[(4)(0.1)]2 + 4(0.1) + 2 = 3.96

5.37 Let X = number of defective motors out of ten in the warehouse. Then X is binomially distributed
with n = 10, p = 0.08. Let Y = net gain = (selling price for the ten motors) − (twice the selling
price of a motor) · X = 10(100) − 200X
= 1,000 − 200X.
E(Y ) = E(1, 000 − 200X) = 1, 000 − 200E(X) = 1, 000 − 200np = 1, 000
−200(10)(0.08) = 840
64 Chapter 5: Discrete Probability Distributions

5.5 The Geometric and Negative Binomial Distributions

5.39 a. P (Y ≥ 4) = 1 − P (Y ≤ 3) = 1 − P (Y = 2) − P (Y = 3)
   
2−1 2 0 3−1
=1− (0.4) (0.6) − (0.4)2 (0.6)1 = 1 − 0.16 − 2(0.4)2 (0.6)
2−1 2−1
= 0.648

b. P (Y = y) is nonzero only for y = r, r + 1, r + 2, . . . . Therefore, for r = 4,


P∞
P (Y ≥ 4) = p(y) = 1.
y=4

5.41 Let Y = the trial on which the third nondefective engine is found. Then Y has a negative binomial
distribution, with p = 0.9, r = 3.
   
y−1 r 4
a. P (Y = 5) = p(5) = p (1 − p)y−r = (0.9)3 (0.1)2 = 0.04374
r−1 2
b. P (Y ≤ 5) = P (Y = 3) + P (Y = 4) + P (Y = 5) = p(3) + p(4) + p(5)
     
3−1 3 0 4−1 3 1 5−1
= (0.9) (0.1) + (0.9) (0.1) + (0.9)3 (0.1)2
3−1 3−1 3−1

= 0.729 + 0.2187 + 0.04374 = 0.99144

5.43 a. Let Y be defined as in Exercise 3.40. Then Y has a geometric distribution with p = 0.9, and
1 10
E(Y ) = =
p 9
1−p 0.1
V (Y ) = = = 0.1235.
p2 (0.9)2

b. Let Y be defined as in Exercise 3.41. Then Y has a negative binomial distribution with parameters
p = 0.9, r = 3, and
r 30
E(Y ) = = = 3.33
p 9
r(1 − p) 0.3
V (Y ) = = = 0.3704.
p2 (0.9)2

5.45 Let total cost = C. Then C = 20Y , and


   
r 3
E(C) = 20E(Y ) = 20 = 20 = 150.
p 0.4
r(1 − p) (3)(0.6)
V (C) = V (20Y ) = 202 V (Y ) = 400 = 400 = 4,500
p2 (0.4)2
p √
Standard deviation of C = V (C) = 4, 500 = 67.082
Using Tchebysheff’s theorem, we have P (C > 350) = P (C − 150 > 350 − 150)
   
200
≤ P (|C − 150| ≥ 200) = P |C − 150| ≥ 67.082
67.082
1
≤ a = 0.1125. Therefore, it is unlikely that the cost will exceed $350.
200
67.082
Chapter 5: Discrete Probability Distributions 65

Note also that P (C > 350) may be computed exactly as


P (C > 350) = P (20Y > 350) = P (Y > 17.5) = 1 − P (Y ≤ 17)
17
 
y−1
(0.4)3 (1 − 0.4)x−3 .
P
=1−
x=3 3 − 1

5.47 a. Let Y = number of the well in which oil was first struck. Then Y has a geometric distribution
with parameter p = 0.2, so
P (Y = 3) = (1 − p)(3−1) p1 = (0.8)2 (0.2)1 = 0.128.
b. Let Y = number of the well in which the third oil strike occurs. Then Y has a negative binomial
distribution with parameters p = 0.2, r = 3, so
 
5−1
P (Y = 5) = (0.2)3 (0.8)2 = 0.03072.
3−1
The solutions to parts (a) and (b) require the assumption of independence of the wells.

5.49 Let Y = number of tires that must be selected in order to get four good ones. Then Y has a
negative binomial distribution with parameters p = 0.9, r = 4.
 
6−1
a. P (Y = 6) = (0.9)4 (0.1)2 = 0.06561
4−1
r 4
b. E(Y ) = = = 4.4444
p 0.9
r(1 − p) 4(0.1)
c. V (Y ) = = = 0.4938
p2 (0.9)2

5.51 a. Let Y = number of customers it takes to sell the three white appliances. Then Y has a negative
1
binomial distribution with parameters p = , r = 3, and
2
   3  5−3   
5−1 1 1 1 1 3
P (Y = 5) = =6 = .
3−1 2 2 8 4 16
b. Let X = number of customers it takes to sell the brown appliances. Then X has the same dis-
1
tribution as Y, a negative binomial distribution with parameters p = , r = 3, and P(X = 5) =
2
3
P(Y = 5) = .
16
   3  3−3  3
3−1 1 1 1 1
c. P (Y = 3) = = =
3−1 2 2 2 8
d. P(all the whites ordered before all browns) = P(Y ≤ 5)
3 1
= p(3) + p(4) + p(5) = + p(4) +
16 8
   3  
3 4−1 1 1 1
= + +
16 3−1 2 2 8
3 3 1 1
= + + =
16 16 8 2
66 Chapter 5: Discrete Probability Distributions

5.6 The Poisson Distribution


5.53 Let Y = number of calls arriving in a given one-minute period. Then Y has a Poisson distribution
with parameter λ = 4.
40
a. P (Y = 0) = p(0) = e−4 = e−4 = 0.0183
0!
b. P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − F (1) = 1 − 0.092 = 0.908
c. Let X = number of calls arriving in a given two-minute period. Then X has a Poisson distribution
with parameter λ = 2(4) = 8 and P (X ≥ 2)
= 1 − F (1) = 1 − 0.003 = 0.997.

5.55 Let Y = number of fatalities per 109 vehicle miles with NMSL in effect. Then Y has a Poisson
distribution with parameter λ = 16.
a. P (Y ≤ 15) = F (15) = 0.467
b. P (Y ≥ 20) = 1 − P (Y ≤ 19) = 1 − F (19) = 1 − 0.812 = 0.188

5.57 a. Let Y = number of teleport inquiries in one millisecond. Then Y has a Poisson distribution with
parameter λ = 0.2 and
(0.2)0 −0.2
P (Y = 0) = e = e−0.2 = 0.8187.
0!
b. Let X = number of teleport inquiries in three milliseconds. Then X has a Poisson distribution
with parameter λ = 3(0.2) = 0.6 and
(0.6)0 −0.6
P (X = 0) = e = e−0.6 = 0.5488.
0!

5.59 Let Y = number of customer arrivals in a given hour. Then Y has a Poisson distribution with
λ = 8.
a. P (Y = 8) = P (Y ≤ 8) − P (Y ≤ 7) = 0.593 − 0.453 = 0.140
b. P (Y ≤ 3) = 0.042
c. P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − 0.003 = 0.997

5.61 a. Let X = number of customers that arrive in a given two-hour period of time. Then X has a
Poisson distribution with λ = 2(8) = 16 and
162 −16
P (X = 2) = e = 128e−16 = 1.44 × 10−5 .
2!
Chapter 5: Discrete Probability Distributions 67

b. The two one-hour time periods are nonoverlapping, and therefore X = total number of customers
that arrive in the given two-hour time period has a Poisson distribution with λ = 2(8) = 16, and,
as for part (a), P (X = 2)
= 1.44 × 10−5 .

Consistent with this answer, note the following. Let Y1 = number of customers that arrive 1−2
pm, and Y2 = number of customers that arrive 3−4 pm. Then Y1 and Y2 are each distributed as
Poisson with λ = 8 and

P (Y1 + Y2 = 2) = P (Y1 = 0, Y2 = 2) + P (Y1 = 1, Y2 = 1) + P (Y1 = 2, Y2 = 0)


 0  2   1 2
2 8 −8 8 −8 8 −8
= 2 · p(0)p(2) + [p(1)] = 2 e e + e
0! 2! 1!
= 7.2 × 10−6 + 7.2 × 10−6 = 1.44 × 10−5 .

5.63 Let X = number of imperfections in an eight-square yard sample. Then X has a Poisson distri-
bution with λ = 8(4) = 32. Let C = 10X = cost of repair. Then
E(C) = 10E(X) = 10λ = 10(32) = 320
V (C) = (10)2 V (X) = 100λ = 100(32) =3,200
p √ √
The standard deviation of C = V (C) = 3, 200 = 40 2 = 56.5695

∞ λy e−λ P∞ λ(y−2) P ∞ λx
= λ2 e−λ y=2 = λ2 e−λ x=0
P
5.65 E[Y (Y − 1)] = y(y − 1)
y=0 Y! (y − 2)! x!
2 −λ λ 2
=λ e e =λ
V (Y ) = E(Y 2 ) − [E(Y )]2 = E(Y 2 ) − E(Y ) + E(Y ) − [E(Y )]2
= E[Y (Y − 1)] + E(Y ) − [E(Y )]2
= λ2 + λ − (λ)2 = λ

5.67 a. Let Y = number of cars arriving in the first hour.


P (Y ≥ 12) = 1 − P (Y ≤ 11) = 1 − 0.999 = 0.001

b. Let X = the number of cars arriving in a given eight hours. Then X has a Poisson distribution
with λ = 8(4) = 32 and

11 11
X 32x e−32 X 32x
P (X ≤ 11) = = e−32
x=0
x! x=0
x!
322 322 3211
 
= e−32 1 + 32 + + + ... +
2 6 11!
= e−32 (1.345732 × 109 ) = 0.000017.
68 Chapter 5: Discrete Probability Distributions

5.7 The Hypergeometric Distribution


5.69 Let Y = number of nondefectives in sample of 5. Then Y has a hypergeometric distribution with
parameters k = 6, n = 5, N = 10, and
  
6 4
5 0 1
P (Y = 5) = p(5) =   = .
10 42
5

5.71 Let Y = number of local firms selected. Then Y has a hypergeometric distribution with parameters
k = 4, n = 3, N = 6.
a. P (at least one not local) = P (not all local) = 1 − P (Y = 3) = 1 − p(3)
  
4 2
3 0 4 4
=1−   =1− =
6 20 5
3
  
4 2
3 0 1
b. P (Y = 3) =   =
6 5
3

5.73 Y has a hypergeometric distribution with parameters N = 10, n = 3, and k as given.


a. k = 2
y p(y)
  
2 8
0 3 7
0   =
10 15
3
  
2 8
1 2 7
1   =
10 15
3
  
2 8
2 1 1
2   =
10 15
3
Chapter 5: Discrete Probability Distributions 69

b. k = 4
y p(y)
  
4 6
0 3 1
0   =
10 6
3
  
4 6
1 2 1
1   =
10 2
3
  
4 6
2 1 3
2   =
10 10
3
  
4 6
3 0 1
3   =
10 30
3

5.75 Let Y = number of misfiring plugs among the four removed. Then Y has a hypergeometric
distribution with N = 8, n = 4, k = 2, and
  
2 6
2 2 3
P (Y = 2) =   = .
8 14
4

5.77 Let Y = number of accounts past due that the auditor sees. Then Y has a hypergeometric
distribution with N = 8, n = 3, k as given, and
    
k 8−k 8−k
0 3 3
P (Y ≥ 1) = 1 − P (Y = 0) = 1 −   =1− .
8 56
3
 
8−2
3 20 9
a. k = 2; P (Y ≥ 1) = 1 − =1− =
 56  56 14
8−4
3 4 13
b. k = 4; P (Y ≥ 1) = 1 − =1− =
56
  56 14
8−7
3
c. k = 7; P (Y ≥ 1) = 1 − = 1 − 0 = 1, since the auditor must choose at least two
56
past-due accounts.
70 Chapter 5: Discrete Probability Distributions

5.79 Note that Y has a hypergeometric distribution with parameters N = 20, n = 5, and k as given.
a. k = 0: P (Y ≤ 1) = 1
b. k = 1: P (Y ≤ 1) = 1
c. k = 2:      
2 20 − 2 2 20 − 2
0 5−0 1 5−1 21 15 18
P (Y ≤ 1) = p(0) + p(1) =   +   = + =
20 20 38 38 19
5 5
d. k = 3:      
3 17 3 17
0 5 1 4 91 105 49
P (Y ≤ 1) = p(0) + p(1) =   +   = + =
20 20 228 228 57
5 5
e. k = 4:      
4 16 4 16
0 5 1 4 1092 1820 728
P (Y ≤ 1) = p(0) + p(1) =   +   = + =
20 20 3876 3876 969
5 5

5.81 Let Y = number of defectives from line I. Then Y has a hypergeometric distribution with param-
eters N = 10, n = 5, and
  
4 6
2 3 10
P (Y = 2) =   = .
10 21
5

5.8 The Moment-Generating Function


n n
   
n y n
M (t) = E(etY ) = ety p (1 − p)(n−y) = (pet )y (1 − p)(n−y)
P P
5.83
y=0 y y=0 y
t n
= [pe + (1 − p)]
since, using the binomial theorem, we have
n  
X n
ay b(n−y) = (a + b)n
y
y=0

Therefore,
E(Y ) = M ′ (0) = n[pet + (1 − p)]n−1 pet |t=0 = np
E(Y 2 ) = M ′′ (0) = n(n − 1)[pet + (1 − p)]n−1 (pet )2 + n[pet + (1 − p)]n−1 pet |t=0


= n(n − 1)p2 + np
V (Y ) = E(Y 2 ) − [E(Y )]2 = n(n − 1)p2 + np − (np)2 = np(1 − p)

5.85 MY (t) = E(etY ) = E(et(aX+b) ) = E[etb e(at)X ] = etb E[e(at)X ] = etb MX (at)
Chapter 5: Discrete Probability Distributions 71

5.10 Supplementary Exercises

5.87 Binomial probability histograms for n=5 and p=0.1, 0.5 and 0.9
72 Chapter 5: Discrete Probability Distributions

5.89 Let Y = number of radar sets out of the five that detect the plane. Then Y has a binomial
distribution with parameters n = 5, p = 0.9.
 
5
P (Y = 4) = (0.9)4 (0.1)1 = 0.32805
4
and  
5
P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.9)0 (0.1)5 = 0.99999
0
 
5 0
5.91 P (Y ≤ a) = p (1 − p)5 = (1 − p)5
0
a. (1 − p)5 = (1)5 = 1
b. (1 − p)5 = (0.9)5 = 0.5905
c. (1 − p)5 = (0.7)5 = 0.1681
d. (1 − p)5 = (0.5)5 = 0.03125
e. (1 − p)5 = (0)5 = 0

5.93 Operating characteristic curve for n=5 and a=1:

p P (Y ≤ a)
0.05 0.9774
0.10 0.9185
0.20 0.7373
0.30 0.5282
0.40 0.3370
Operating characteristic curve for n=25 and a=5:
Chapter 5: Discrete Probability Distributions 73

p P (Y ≤ a)
0.05 0.9988
0.10 0.9666
0.20 0.6167
0.30 0.1935
0.40 0.0294

a. n = 25, a = 5
b. n = 25, a = 5

5.95 Let Y = number of colonies in a given dish. Then Y has a Poisson distribution with a mean of
λ = 12
a. P (Y ≥ 10) = 1 − P (Y ≤ 9) = 1 − 0.242 = 0.758
p √ √
b. E(Y ) = λ = 12, and the standard deviation of Y = V (Y ) = λ = 12
= 3.4641
c. Using Tchebysheff’s theorem, we have
p p 1
P [E(Y ) − 2 V (Y ) < Y < E(Y ) + 2 V (Y )] ≥ 1 − 2 = 0.75
2
I.e., the desired interval is [12 − 2(3.4641), 12 + 2(3.4641)] = (5.0718, 18.9282).

5.97 Here, Y has a Poisson distribution with mean λp = 100(0.05) = 5.


a. E(Y ) = λp = 5
b. P (Y = 0) = 0.007
c. P (Y > 5) = 1 − P (Y ≤ 5) = 1 − 0.616 = 0.384
74 Chapter 5: Discrete Probability Distributions

5.99 Let Y = number of left-turning vehicles out of n vehicles arriving while the light is red. Then Y
has a binomial distribution with parameters n = 5, p = 0.2, P (Y ≤ 3) = 0.993. This number may
be computed directly as
P (Y ≤ 3) = 1 − P (Y ≥ 4) = 1 − P (Y = 4) − P (Y = 5)
   
5 5
=1− (0.2)4 (0.8)1 − (0.2)5 (0.8)0 = 0.993.
4 5

5.101 a. Using the binomial theorem, we have


n
 
n y
p (1 − p)n−y = [p + (1 − p)]n = 1n = 1
P P
p(y) =
y=0 y
∞ ∞ ∞
(1 − p)y−1 p = p (1 − p)y−1 = p (1 − p)y
P P P P
b. p(y) =
y=1 y=1 y=0

   
1 P x 1
=p since a = for a < 1
1 − (1 − p) x=0 1−a
 
1
=p =1
p
∞ λx ∞ λy e−λ ∞ λy
= eλ , we have = e−λ
P P P P
c. Noting that p(y) =
x=0 x! y=0 y! y=0 y!

= e−λ eλ = 1.

5.103 Let Y = total number of requests for welding units until the third brand-A unit is used. Then Y
has a negative binomial distribution with parameters r = 3, p = 0.7, and
 
5−1
P (Y = 5) = (0.7)3 (0.3)(5−3) = 6(0.7)3 (0.3)2 = 0.18522.
3−1

5.105 Let Y = number of people that have to be interviewed before encountering a consumer who prefers
brand A. Then Y has a geometric distribution with parameter p = 0.6.
P (Y = 5) = (1 − p)5−1 p = (0.4)4 (0.6) = 0.01536
and
P (Y ≥ 5) = 1 − P (Y ≤ 4)
= 1 − p(1) − p(2) − p(3) − p(4)
= 1 − (0.6) − (0.4)(0.6) − (0.4)2 (0.6) − (0.4)3 (0.6)
= 1 − (0.6) − (0.24) − (0.096) − (0.0384)
= 0.0256

5.107 Note that Y has a binomial distribution with parameters n = 1,000, p = 0.9, so that
E(Y ) = np =1,000(0.9) = 900, and
V (Y ) = np(1 − p) =1,000(0.9)(0.1) = 90.
Using Tchebysheff’s theorem with k = 2, we have P (µ − 2σ < Y < µ
1
+ 2σ) > 1 − 2 ; i.e.,
√ 2 √
P (900 − 2 90 < Y < 900 + 2 90) = P (881.026 < Y < 918.974) ≥ 0.75.
Chapter 5: Discrete Probability Distributions 75

1
5.109 a. Note that Y has a binomial distribution with parameters n = 4, p = , and distribution function
   y  4−y 3
4 1 2
p(y) = .
y 3 3
y p(y)
 4
2
0
3
   3  4
1 2 2
1 4 =2
3 3 3
  2   2  3
1 2 2
2 6 =
3 3 3
 3  1  3
1 2 1 2
3 4 =
3 3 3 3
 4
1
4
3
 3   4
1 2 1 1
b. P (Y ≥ 3) = p(3) + p(4) = + =
3 3 3 9
 
1 4
c. E(Y ) = np = 4 =
3 3
  
1 2 8
d. V (Y ) = np(1 − p) = 4 =
3 3 9

5.111 a. Here, Y has a hypergeometric distribution with parameters N = 100, n = 20, k = 40, and
  
40 60
10 10
p(10) =   = 0.1192.
100
20
b. Here, Y has a binomial distribution with parameters n = 20, p = 0.40.
p(10) = F (10) − F (9) = 0.872 − 0.755 = 0.117
Thus it appears that N is large enough that the binomial probability function is a good approxi-
mation to the hypergeometric probability function.
76 Chapter 5: Discrete Probability Distributions

5.113 Let Y = number of items sold on a given day, P = daily profit, and X = number of items stocked.
Note that for Y ≤ X
P = 1.2Y − X
E(P ) = 1.2E(Y ) − X
and
E(Y |X = 1) =1
E(Y |X = 2) =2
E(Y |X = 3) = 2p(2) + 3P (Y ≥ 3) = 2(0.1) + 3(0.9) = 2.9
E(Y |X = 4) = 2p(2) + 3p(3) + 4p(4) = 2(0.1) + 3(0.4) + 4(0.5) = 3.4.
Hence
E(P |X = 1) = 1.2(1) − 1 = 0.2
E(P |X = 2) = 1.2(2) − 2 = 0.4
E(P |X = 3) = 1.2(2.9) − 3 = 0.48
E(P |X = 4) = 1.2(3.4) − 4 = 0.08.

Therefore, expected profit is maximized at X = 3.

5.115 Number of combinations = 26 · 26 · 10 · 10 · 10 · 10 = 6,760,000


     
1 2 10
E(winnings) = $100,000 + $50, 000 + $1, 000
6, 760, 000 6, 760, 000 6, 760, 000
= $0.031065
Therefore, it appears that the expected value of the coupon is considerably less than the price
of a stamp. However, one might also consider what the probability of winning would be if the
coupon weren’t mailed back.
Chapter 6

Continuous Probability Distributions

6.1 Continuous Random Variables and Their Probability Distri-


butions
6.1 a Let Y = number of field plots out of ten that contain the insect. Y is a discrete random variable.
b Let Y = number of defects in a given sampled section. Let X = number of sampled sections
containing at least 5 defects. Both Y and X are discrete random variables.
c Let Y = number of grains seen in a given cross section. Then Y is a discrete random variable.
d Let A = area proportion covered by grains of a certain size. A is a continuous random variable.
Z ∞ Z 6 6
3 2 3 2 x3
6.3 a P (X > 3) = f (x)dx = (8x − x − 12)dx = (4x − − 12x)
3 3 32 32 3 3
27
= = 0.84375
32
Z 6 6
3 2 3 2 x3
b 0.5 = P (X > b) = (8x − x − 21)dx = (4x − − 12x)
b 32 32 3 b
3 b3
=0− (4b2 − − 12b)
32 3
i.e., b is the solution to 0 = b3 − 12b2 + 36b − 16; i.e., b = 4. Alternatively, note that the density
of X is symmetric about x = 4, so P (X > 4) = P (X
1
≤ 4) = .
2

77
78 Chapter 6: Continuous Probability Distributions

6.5 a Graph of f (x):



 0 x<5
5 x−7 3 x−7
(x − 7)3

3 3 w
Z Z
b F (x) = (7 − y)2 dy = w2 dw = = +1 5≤x≤7

 x 8 8 −2 8 −2
8

1 x>7

3
(6 − 7) 7
c P (X < 6) = F (6) = +1=
8 8
P (X < 5.5) (5.5 − 7)3 + 8 7 37
d P (X < 5.5|X < 6) = = / =
P (X < 6) 8 8 56
Z 1 1
1 1 3
2xdx = x2

6.7 a P (X > ) = =1− =
2 1/2 1/2 4 4
   
1 1 1
  P X> , X> P >
1 1 2 4 2
b P X > |X > =   =  
2 4 1 1
P X> P X>
4 4
3
4
=Z 1 4 =
5
2xdx
1/4
   
1 1 1
  P X> , X> P X>
1 1 4 2 2
c P X> |X > =   =   =1
4 2 1 1
P X> P X>
2 2
Chapter 6: Continuous Probability Distributions 79


 0 x<0

Z x
2
d F (x) = 2ydy = x 0≤x≤1

 0
1 x>1

Yes, F (x) is continuous.


Graph of F (x):

6.2 Expected Values of Continuous Random Variables


61 61
61
x x2 1
Z Z
2 2
6.9 E(X) = xf (x)dx = dx = = 4 [(61) − (59) ] = 60
59 59 2 4 59
Z 61 Z 61 2 61
x x3 1 21, 602
E(X 2 ) = x2 f (x)dx = dx = 3
= 6 [(61) − (59) ] =
3
59 59 2 6 59 6
21, 602 1
V (X) = E(X 2 ) − [E(X)]2 = − (60)2 =
6 3
Z ∞ Z 6  6
3 x4 8x3

3 2

6.11 E(X) = xf (x)dx = x(x − 2)(6 − x)dx = − − + 6x
−∞ 32 2 32 4 3 2
= 4 hundred calories
 7
3 7 3 49x2 14x3 x4
Z 
2
6.13 a E(X) = x(7 − x) dx = − + = 5.5
8 5 8 2 3 4 5
 7
3 7 2 3 49x3 7x4 x5
Z 
E(X 2 ) = x (7 − x)2 dx = − + = 30.4
8 5 8 3 2 5 5
V (X) = E(X 2 ) − [E(X)]2 = 30.4 − (5.5)2 = 0.15
b Using Tchebysheff’s theorem, with k = 2, we have
p p 1
P [E(X) − 2 V (X) < X < E(X) + 2 V (X)] ≥ 1 − = 0.75
(2)2
p p
so the desired interval is E(X) ± 2 V (X) = (5.5 ± 2 0.15) = (4.7254, 6.2746).
80 Chapter 6: Continuous Probability Distributions

5.5 5.5  5.5


x3

3 3
Z Z
(7 − x)2 dx = 49x − 7x2 +

c P (X < 5.5) = f (x)dx =
−∞ 5 8 8 3
5
= 0.5781
We would expect to see about 58% of the pH measurements to be below 5.5.

6.3 The Uniform Distribution



 0 x<a
 x
Z 1 x−a
6.15 a F (x) = dy = a≤x≤b

 a b−a b−a
1 x>b

b
1 b−c
Z
b P (X > c) = dy =
c b−a b−a
b−d
P (X > d, X > c) P (X > d) b−d
c P (X > d|X > c) = = = b−a =
P (X > c) P (X > c) b − c b−c
b−a
6.17 Here, X has a uniform distribution with parameters a = 0, b = 500.
Z 500
1 500 − 475 1
a P (X > 475) = dx = =
475 500 − 0 500 20
Z 25
1 20 − 0 1
b P (X < 25) = dx = =
0 500 − 0 500 20
Z 250
1 250 − 0 1
c P (X < 250) = dx = =
0 500 − 0 500 2
6.19 Let X = time in secondsZthat the call arrived. Then X has a uniform distribution with a = 0, b
60
1 60 − 15 3
= 60, and P (X > 15) = dx = = .
15 60 − 0 60 4
6.21 Let X = hour of operation in which the defective board was produced. Since the number of
defectives is Poisson, then, given that one defective was produced, the actual time of occurrence
is equally likely in any small subinterval of time of a given size, and thus X has a uniform
distribution with parameters a = 0, b = 8.
Z 1
1 1
a P (X < 1) = dx =
0 8 8
Z 8
1 1
b P (X > 7) = dx =
7 8 8
Z 5
1
dx
P (4 < X ≤ 5, X > 4) P (4 < X ≤ 5) 4 8
c P (4 < X ≤ 5|X > 4) = = =Z 8
P (X > 4) P (X > 4) 1
dx
4 8
1
=
4
6.23 Let X = measurement error. Then X has a uniform distribution with parameters a = −0.02, b =
0.05.
Z 0.01
1 2
a P (−0.01 < X < 0.01) = dx =
−0.01 0.05 − (−0.02) 7
Chapter 6: Continuous Probability Distributions 81

a+b 0.05 + (−0.02)


b E(X) = = = 0.015
2 2
2
(b − a) [0.05 − (−0.02)]2 49
V (X) = = = = 0.0004083.
12 12 120, 000
6.25 Let X = time of arrival measured from the beginning of the 30-minute period. Since the number
of arrivals is Poisson, the time of the arrival is equally likely in any subinterval of time of a given
size in the 30 minutes, and thus X has a uniform distribution with parameters a = 0, b = 30, and
Z 30
1 30 − 25 1
P (X > 25) = dx = = .
25 30 30 6
6.27 Let X = stopping distance. Then X has a uniform distribution with parameters
a and b.
a+b

a+b
 Z a+b/2
1 −a 1
a P (X − a < b − X) = P X < = dx = 2 =
2 a b − a b − a 2
3b + a

3b + a
 Z b
1 b−
b P [X − a > 3(b − X)] = P X > = dx = 4
4 (3b+a)/4 b − a b−a
1
=
4
6.29 Let X = cycle time. Then X has a uniform distribution with parameters a = 50, b = 70.

a+b 50 + 70
a E(X) = = = 60
2 2
(b − a)2 (70 − 50)2 100
V (X) = = =
12 12 3
b Let T = number of trucks needed. Then we have E(X)/T = 15; i.e., 60/15 = 4 trucks are needed.

6.4 The Exponential Distribution


6.31 Let X = magnitude of the next earthquake.
a P (X > 3) = 1 − F (3) = 1 − (1 − e−3/θ ) = e−3/2.4 = e−5/4 = 0.2865.
b P (2 < X < 3) = F (3) − F (2) = 1 − e−3/2.4 − 1 − e−2/2.4 = e−5/6 − e−5/4


= 0.1481.
6.33 Let X = water demand. Then X has an exponential distribution with parameter θ = 100.
a P (X > 200) = 1 − F (200) = 1 − 1 − e−200/100 = e−2 = 0.1353


b Let k = maximum water-producing capacity. Then 0.01 = P (X > k)


= 1 − F (k)
= 1 − 1 − e−k/100 = e−k/100 . Hence k = − 100 ln 0.01


= 460.52 cfs.
82 Chapter 6: Continuous Probability Distributions

Z ∞
6.35 a Note that since x(n−1) e−(x/θ) dx = Γ(n)θn , then for k integer valued we have
0

1 1
Z
E(X k ) = xk e−x/θ dx = Γ(k + 1)θ(k+1) = θk k!
θ 0 θ

So
E(X) = (10)1! = 10
E(X 2 ) = (10)2 2! = 200
E(X 3 ) = (10)3 3! = 6,000
E(X 4 ) = (10)4 4! = 240,000

and
E(C) = 100 + 40E(X) + 3E(X 2 ) = 100 + 40(10) + 3(200) = 1,100
E(C 2 ) = E [100 + 40(X) + 3(X 2 )]2


= 10,000 + 8,000 E(X)+ 2,200 E(X 2 ) + 240E(X 3 ) + 9E(X 4 )


= 10,000 + 8,000(10) + 2,200(200) + 240(6,000) + 9(240,000)
= 4,130,000
V (C) = E(C 2 ) − [E(C)]2 = 4,130,000 −(1, 100)2 = 2,920,000
b P (C > 2,000) = P (3X 2 + 40X + 100 > 2,000) = P (3X 2 + 40X−1,900 > 0)
= P [(X − r1 )(X − r2 ) > 0]
10 √ 10 √
where r1 = (−2 + 61) = 19.3675, and r2 = (−2 − 61) = −32.7. Therefore
3 3
P (C > 2, 000) = P (X − r1 > 0, X − r2 > 0) + P (X − r1 < 0, X − r2 < 0)
= P (X > r1 , X > r2 ) + P (X < r1 , X < r2 ) = P (X > r1 ) + P (X < r2 )
= P X(> r1 ) = 1 − (1 − e−r1 /10 ) = e−1.93675 = 0.1442
6.37 Let X = tire lifelength. Then X has an exponential distribution with parameter θ = 30.
a P (X > 30) = 1 − F (30) = 1 − (1 − e−30/30 ) = e−1 = 0.3679
b Using the result of Exercise 4.30, we have
P (X > 30|X > 15) = P (X > 30 − 15)
= 1 − F (15) = 1 − (1 − e−15/30 ) = e−1/2 = 0.6065.
6.39 If the number of breakdowns has a Poisson distribution with parameter λ = 0.5, then the waiting
1 1
time between breakdowns, X, has an exponential distribution with parameter θ = = = 2.
λ 0.5
a P (X > 1) = 1 − F (1) = 1 − (1 − e−1/2 ) = e−1/2 = 0.6065
 
1
b P X> = 1 − (1 − e−0.5/2 ) = e−1/4 = 0.7788
2
c No, because of the result of Exercise 4.30, the ”memoryless” property of the exponential distri-
bution.
Chapter 6: Continuous Probability Distributions 83

6.41 Let X = the weekly rainfall totals.


a P (X > 2) = 1 − F (2) = 1 − (1 − e−2/1.6 ) = e−5/4 = 0.2865
b Let Y = number of weeks out of the next two in which rainfall doesn’t exceed 2 inches. Then Y
has a binomial distribution with parameters n = 2,
p = P (X ≤ 2) = F (2) = 1 − e−5/4 . Then
 
2 2 −5/4 0
P (Y = 2) = 1 − e−5/4 e (1 − e−5/4 )2 = 0.5091.
2
6.43 Let X = repair time.
a P (X > 10) = F (10) = 1 − e−10/22 = 0.3653
b P (30 < X < 60) = F (60) − F (30) = 1 − e−60/22 − (1 − e−30/22 ) = e−15/11
−e−30/11 = 0.1903
c Let k = the desired constant such that P (X > k) = 1 − F (k) = 1 − (1
−e−k/22 ) = e−k/22 = 0.10. Solving for k, we have k = −22 ln (0.10) = 50.66 minutes.

6.5 The Gamma Distribution


6.45 a Let X = summer rainfall total.
E(X) = αβ = 1.6(2) = 3.2
V (X) = αβ 2 = 1.6(2)2 = 6.4
b Using Tchebysheff’s theorem with k = 2, we have 0.75
1 h p p i
= 1 − 2 ≤ P E(X) − 2 V (X) < X < E(X) + 2 V (X)
2 √ √
= P (3.2 − 2 6.4 < X < 3.2 + 2 6.4)
= P (−1.86 < X < 8.26). Since rainfall totals are nonnegative, we have the interval (0, 8.26).
6.47 For k integer valued, note that
Z ∞
1
E(Y k ) = y (α+k−1) e−y/β dy
0 Γ(α)β α
β k Γ(α + k) ∞ 1 β k Γ(α + k)
Z
= α+k
y (α+k−1) e−y/β dy =
Γ(α) 0 Γ(α + k)β Γ(α)
2
2β Γ(α + 2) 2β 2 (α + 1)!
a E(L) = 30E(Y ) + 2E(Y 2 ) = 30αβ + = 30αβ +
Γ(α) (α − 1)!
2
2(2) (24)
= 30(3)(2) + = 276
(2)
V (L) = E(L2 ) − [E(L)]2 = E[(30Y + 2Y 2 )2 ] − (276)2
= 900E(Y 2 ) + 120E(Y 3 ) + 4E(Y 4 ) − (276)2
Γ(5) Γ(6) Γ(7)
= 900(2)2 + 120(2)3 + 4(2)4 − (276)2
Γ(3) Γ(3) Γ(3)
24 120 720
= 900(4) + 120(8) + 4(16) − (276)2 = 47, 664
2 2 2
1
b Using Tchebysheff’s theorem, we want k such that 1 − 2 = 0.89; i.e., k ≈ 3. Then the de-
h p p i k √ √
sired interval is E(L) − 3 V (L), E(L) + 3 V (L) = (276 − 3 47, 664, 276 + 3 47, 664) =
(−378.963, 930.963). Since L is nonnegative, the interval is (0, 930.963).
84 Chapter 6: Continuous Probability Distributions

6.49 Let Xi = time to completion of the given task, i = 1, 2. Then Xi has a Gamma distribution
with parameters α = 1, β = 10, and Y = X1 + X2 has a Gamma distribution with parameters
α = 2(1) = 2, β = 10.

a E(Y ) = αβ = 2(10) = 20 and V (Y ) = αβ 2 = 2(10)2 = 200


X1 + X2 Y 1
b Let A = average time to completion of the two tasks = = . Then E(A) = E(Y ) =
 2 2 2 2
1 1 1
(20) = 10 and V (A) = V (Y ) = (200) = 50
2 2 4
6.51 a Let Y = maximum river flow. Then Y has a Gamma distribution with parameters α = 1.6, β=
150. Therefore we have
E(Y ) = αβ = 1.6(150) = 240
V (Y ) = αβ 2 = 1.6(150)2 = 36, 000
p √
Std dev (Y ) = V (Y ) = 36, 000 = 189.74.
1 8
b Using Tchebysheff’s theorem, we want k such that 1 − 2 = ; i.e., k = 3. Then the de-
p p k 9
√ √
sired interval is [E(Y ) − 3 V (Y ), E(Y ) + 3 V (Y )] = (240 − 3 36, 000, 240 + 3 36, 000) =
(−329.21, 809.21). Since Y is nonnegative, the interval is (0, 809.21).
6.53 Since service times have a Gamma distribution with parameters α = 1, β = 3.2, then the service
time for three waiting customers, Y , has a Gamma distribution with parameters α = 3(1) = 3, β =
3.2, and
E(Y ) = αβ = 3(3.2) = 9.6
V (Y ) = αβ 2 = 3(3.2)2 = 30.72
1 1

 y 2 e−x/3.2 = y 2 e−y/3.2 y>0
f (y) = 3.23 Γ(3) 65.536
0 y ≤ 0.

6.6 The Normal Distribution


6.55 a P (0 ≤ Z ≤ 1.2) = 0.3849
b P (−0.9 ≤ Z ≤ 0) = P (0 ≤ Z ≤ 0.9) = 0.3159
c P (0.3 ≤ Z ≤ 1.56) = P (0 ≤ Z ≤ 1.56) − P (0 ≤ Z ≤ 0.3) = 0.4406
− 0.1179 = 0.3227
d P (−0.2 ≤ Z ≤ 0.2) = 2P (0 ≤ Z ≤ 0.2) = 2(0.0793) = 0.1586
e P (−2.00 ≤ Z ≤ −1.56) = P (1.56 ≤ Z ≤ 2.00) = P (0 ≤ Z ≤ 2.00)
− P (0 ≤ Z ≤ 1.56)
= 0.4772 − 0.4406 = 0.0366
6.57 Let X = amount spent on maintenance and repairs. Then X has a normal distribution with
parameters µ = 400, σ = 20, and
 
X − 400 450 − 400
P (X > 450) = P > = P (Z > 2.5) = 0.5 − 0.4938
20 20
= 0.0062.
Chapter 6: Continuous Probability Distributions 85

6.59 Let X = diameter. Then X has anormal distribution  with parameters µ = 1.005, σ = 0.01, and
0.98 − 1.005
P (X < 0.98) + P (X > 1.02) = P Z <
0.01
 
1.02 − 1.005
+ P Z>
0.01
= P (Z < −2.5) + P (Z > 1.5) = (0.5 − 0.4938) + (0.5 − 0.4332)
= 0.0730.
6.61 Let X = resistances of wires produced by Company A. Then X has a normal distribution with
parameters µ = 0.13, σ = 0.005.
 
0.12 − 0.13 X − 0.13 0.14 − 0.13
a P (0.12 < X < 0.14) = P < <
0.005 0.005 0.005
= P (−2 < Z < 2) = 2P (0 < Z < 2) = 2(0.4772) = 0.9544
b Let Y = number of wires of a sample of four from Company A that meet specifications. Then Y
has a binomial distribution with parameters n = 4, p = 0.9544, and
 
4
P (Y = 4) = (0.9544)4 (1 − 0.9544)0 = 0.8297.
4
   
−5 − 0 5−0
6.63 P (|X| > 5) = P (X < −5) + P (X > 5) = P Z < +P Z >
10 10
= 2P (Z > 0.5) = 2(0.5 − 0.1915) = 0.6170
   
−10 − 0 10 − 0
P (|X| > 10) = P (X < −10) + P (X > 10) = P Z < +P Z >
10 10
= 2P (Z > 1) = 2(0.5 − 0.3413) = 0.3174
6.65 Let X = monthly sickleave time. Then X has a normal distribution with parameters µ =
200, σ = 20.
 
X − 200 150 − 200
a P (X < 150) = P < = P (Z < −2.5) = 0.5 − 0.4938
20 20
= 0.0062
x0 − 200 set
b Let x0 = desired time budgeted. Then P (X > x0 ) = P (Z > ) →= 0.1. Note that
20
x0 − 200
P (X > 1.28) = 0.1, so we have = 1.28, i.e., x0 = 225.6 hours.
20
6.67 Let X = amountof fill per 
box. Then X has a normal distribution with parameters µ, σ = 1, and
16 − µ set 16 − µ
P (X > 16) = P Z →= 0.01. Note that P (Z > 2.33) = 0.01, so we have = 2.33;
1 1
i.e., µ = 13.67 ounces.
6.69 a Yes, it does appear that the total points can be modeled by a normal distribution.
b According to the empirical rule, 68% of the data should lie one standard deviation above and
below the mean and 95% of the data should lie within two standard deviations above and below
the mean. Hence, consider the interval (x̄ − s, x̄ + s) = (143 − 26, 143 + 26) = (117, 169).
Notice that more than 77% of the games had total scores within (117, 169). Now consider the
interval (x̄ − 2s, x̄ + 2s) = (143 − 2(26), 143 + 2(26)) = (91, 195). Notice that less than 5%
of the total scores fall outside of this region.
c No and no. A score of 200 is greater than two standard deviations away from the mean. Such a
score should occur less than 2.5% of the time, according to the empirical rule. A score of 250 is
greater than three standard deviations away from the mean, making it even less likely to occur.
86 Chapter 6: Continuous Probability Distributions

d About 4 games.
6.71 a Q-Q plots for male and female distributions:

b Q-Q plots are relatively linear. Normal distribution is a good fit.


c From the Q-Q plots we can see that the mean height for males ≈ 70 (at z = 0), the standard
deviation for males ≈ 3 (the slope of the line), the mean height for females ≈ 65, and the standard
deviation for females ≈ 2.
Chapter 6: Continuous Probability Distributions 87

6.7 The Lognormal Distribution


σ2 0.52
   
6.73 a E(X) = exp µ + = exp 2.5 + = 13.8
2 2
V (X) = exp(2µ + σ 2 )exp(σ 2 − 1) = exp(2(2.5) + 0.52 )(exp(0.52 ) − 1) = 54.13
2.996 − 2.5
b The z-value for ln 20 = 2.996 is = 1 The standard normal table gives P (Z > 1) =
0.5
1 − P (Z < 1) = 1 − 0.8413 = 0.1587. Less than 16% of disks last over 20 weeks.
c The z-score corresponding to P (Z < a0 ) = 0.98 is approximately 2.05. Solving for x we have:
ln(x) − µ
= 2.05
σ
ln(x) − 2.5
= 2.05
0.5
ln(x) = 2.05(0.5) + 2.5 = 3.525
x = exp(3.525) = 33.95
So 98% of disks fail before 33.95 weeks.
ln(x) − µ ln 26 − 3.5
6.75 a If 6 months ≈ 26 weeks, z = = = −0.24
σ 1
P (Z > −0.24) = 1 − P (Z < −0.24) = 1 − 0.4052 = 0.5948, So about 59.5% of patients survive
for more than 6 months.
ln(x) − µ ln 52 − 3.5
b If 12 months ≈ 52 weeks, z = = = 0.45
σ 1
P (Z > 0.45) = 1 − P (Z < 0.45) = 1 − 0.6736 = 0.3264, So about 32.6% of patients survive for
more than 1 year.
c The z-score corresponding to P (Z > a0 ) = 0.05 is approximately 1.64. Solving for s we have:
ln(x) − µ
= 1.64
σ
ln(x) − 3.5
= 1.64
1
ln(x) = 1.64 + 3.5 = 5.14
x = exp(5.14) = 170.7
So only 5% of patients will survive for 170.7 weeks or more.

6.77 a p(lifetime > 217) = 0.95


b The z value for P (Z > a0 ) = 0.95 is approximately -1.64. Thus, we have:
ln(x) − µ ln(217) − µ
−1.64 = =
σ 5
Solving this for µ we have µ = ln(217) + 1.64(5) = 13.58, and thus
σ2 52
   
E(X) = exp µ + = exp 13.58 + = 2.12 × 101 1
2 2
88 Chapter 6: Continuous Probability Distributions

6.8 The Beta Distribution

α
6.79 We were shown in the previous section that E(X) = .
(α + β)
1 Z 1
Γ(α + β) α−1 Γ(α + β) α+2−1
Z
E(X 2 ) = x2 x (1 − x)β−1 dx = x (1 − x)β−1 dx
0 Γ(α)Γ(β) 0 Γ(α)Γ(β)
Γ(α + β)Γ(α + 2) 1 Γ(α + β + 2) α+2−1
Z
= x (1 − x)β−1 dx
Γ(α)Γ(α + 2 + β) 0 Γ(α + 2)Γ(β)
Γ(α + β)Γ(α + 2) Γ(α + β)(α + 1)αΓ(α) α(α + 1)
= = =
Γ(α)Γ(α + 2 + β) Γ(α)(α + β + 1)(α + β)Γ(α + β) (α + β)(α + β + 1)
Then we have
 2
α(α + 1) α
V (X) = E(X 2 ) − [E(X)]2 = −
(α + β)(α + β + 1) α+β
2
α(α + 1)(α + β) − α (α + β + 1) α
= = .
(α + β)2 (α + β + 1) (α + β)2 (α + β + 1)
 1
Z 1
2 3 4
6.81 a P (X > 0.4) = 12x (1 − x)dx = 4x − 3x = 0.8208
0.4 0.4
α 3
b E(V ) = 5 − 0.5E(X) = 5 − (0.5) = 5 − (0.5) = 4.7
α+β (3 + 2)
αβ (3)(2)
V (V ) = (0.5)2 V (X) = (0.25) = (0.25)
(α + β)(α + β + 1) (3 + 2)2 (3 + 2 + 1)
= 0.01
6.83 Let X = measurement error. Then X has a Beta distribution with parameters α = 1, β = 2.
Z 0.5 Z 0.5
Γ(1 + 2) (1−1)
a P (X < 0.5) = x (1 − x)(2−1) dx = 2(1 − x)dx
0 Γ(1)Γ(2) 0
0.5

= (2x − x2 ) = 0.75
0
α 1
b E(X) = =
α+β 3
αβ 2 1
V (X) = 2
= 2
=
(α + β) (α + β + 1) (1 + 2) (1 + 2 + 1) 18
r
p 1
So the standard deviation of X is V (X) = = 0.2537.
18
6.85 Let X = proportion of pure iron, with X having a Beta distribution with parameters α = 3, β =
1.
Z 1 Z 1 1
Γ(4) 7
x(3−1) (1 − x)(1−1) dx = 3x2 dx = x3 =

a P (X > 0.5) =
0.5 Γ(3)Γ(1) 0.5 0.5 8
b Let Y = number of samples out of three that have less than 30% pure iron. 0.3Then Y has a binomial
Z 0.3
2 3

distribution with parameters n = 3, p = P (X < 0.3) = 3x dx = x = 0.027, and
0 0
 
3
P (Y = 2) = (0.027)2 (1 − 0.027)1 = 0.002128.
2
Chapter 6: Continuous Probability Distributions 89

6.9 The Weibull Distribution


6.87 Let X = maximum flood level. Then X has a Weibull distribution with parameters θ = 0.6, γ=
1.5.  
1.5
a P (X > 0.5) = 1 − F (0.5) = 1 − 1 − e−(0.5) /0.6 = 0.5547
1.5
b P (X < 0.8) = F (0.8) = 1 − e−(0.8) /0.6
= 0.6966

6.89 Let X = the ultimate tensile strength of the steel wire. Then X has a Weibull distribution with
parameters γ = 1.2, θ = 270, and
 1.2

P (X > 300) = 1 − F (300) = 1 − 1 − e−(300) /270 = 0.03091.

6.91 Let X = pressure in thousands of pounds exerted on the tank. Then X has a Weibull distribution
with parameters γ = 1.8, θ = 1.5, and
 1.8

P (X > 2) = 1 − F (2) = 1 − 1 − e−(2) /1.5 = 0.09813.

6.93 Scatterplot of LF(x) vs ln(x) for bleed system failure times:

Estimating the slope and intercept from the scatter plot, we see that the slope = γ ≈ 5.13, and
the y-intercept = ln(θ) ≈ 35.9
Z ∞  m 3/2 r Z ∞
2 −v 2 (m/2KT ) 1 2 2 2KT
6.95 a E(V ) = v4π v e dv = 2 v 2 v (2−1) e−v /θ dv for θ = ;
0 2πKT πθ 0 θ m
i.e.,
r
1
E(V ) = 2 E(X 2 )
πθ
2KT
where X has a Weibull distribution with parameters γ = 2, θ = ; i.e.,
r m
1 
V (X) + [E(X)]2

E(V ) = 2
πθ
r ( 2/2     2   2 !)
1 2KT 1 1 2KT 1
=2 Γ 1+ − Γ 1+ + Γ 1+
πθ m 2 2 m 2
r   r
m 2KT π π 2KT
=2 1− + =2 .
2πKT m 4 4 mπ
90 Chapter 6: Continuous Probability Distributions

m  m 3/2 ∞ 4 −v2 (m/2KT )


 
1 m
Z
2 2
b E mV = E(V ) = 4π v e dv
2 2 2 2πKT 0
 m 3/2 ∞Z
= mπ y 3/2 e−y(m/2KT ) dy
2πKT 0
 m 3/2  5   2KT 5/2 Z ∞ (m/2KT )5/2
= mπ Γ   y (5/2)−1 e−y(m/2KT ) dy
2πKT 2 m 0 5
Γ
2
 m 3/2  5   2KT 5/2  3/2       
1 3 1 1 2KT

= mπ Γ = mπ Γ
2πKT 2 m π 2 2 2 m
3
= KT
2

6.10 Reliability
6.97 Rs (t) = P (X1 > t, X2 > t, . . . , Xn > t)
= P (X1 > t)P (X2 > t) · · · P (Xn > t)
= [P (X > t)]n
= [e−t/θ ]n
= e−nt/θ , t > 0
Z ∞ ∞
−nt/θ −θ −nt/θ
E(S) = e dθ = e
0 n
0
θ
=
n
6.99 Consider a parallel system. Then, we need the number of relays n such that
.999 = 1 − [1 − .9]n = 1 − (.1)n .
So, n = 3.

6.11 The Moment-Generating Functions for Continuous Ran-


dom Variables
6.101 Since the exponential distribution with parameter θ is the same as a Gamma Distribution with
parameters α = 1, β = θ, then from Exercise 4.95 we have the desired moment-generating function
M (t) = (1 − θt)−1 , and
θ
d
2θ2

(1 − θt)2
E(X 2 ) = M ′′ (0) = = 3
= 2θ2
dt
t=0 (1 − θt) t=0
Therefore
V (X) = E(X 2 ) − [E(X)]2 = 2θ2 − θ2 = θ2 .
Chapter 6: Continuous Probability Distributions 91

∞ ∞
1 1
Z Z
tZ 2 tZ 2 2 2
6.103 M Z2 (t) = E(e )= e √ e−x /2 dz = √ e−x (1−2t)/2 dz
−∞ 2π −∞ 2π

1
Z
−x2 (1−2t)/2
= (1 − 2t)−1/2 p e dz
2π(1 − 2t)−1/2
−∞
Since the integrand is a normal density function with parameters µ = 0,
σ = (1 − 2t)−1/2 , we have
MZ 2 (t) = (1 − 2t)−1/2 .
Therefore, using the result of Exercise 4.95 and the uniqueness property of the moment-generating
1
function, Z 2 has a Gamma distribution with parameters α = , β = 2.
2

6.13 Supplementary Exercises

2  2
cy 3 y2
  
=c 8 +2
Z
2

6.105 a 1= (cy + y)dy = +
0 3 2
0 3
3
i.e., c = −
8

 0 y<0
Z y

b F (y) = (x − (3x2 /8))dx = (y 2 /2) − (y 3 /8) 0≤y≤2

 0
1 y>2

c Graphs of f (x) and F (x):


92 Chapter 6: Continuous Probability Distributions

d F (−1) = 0
F (0) = 0
1 1 3
F (1) = − =
2 8 8
(1/2)2 (1/2)3
   
1 1 1 1 7
e P 0<Y < =F − F (0) = − = − =
2 2 2 8 8 64 64
Z 2  2
  3 4
 2
3y y 3y =8−3=7

f E(Y ) = y y− dy = −
0 8 3 32
0 3 2 6
Z 2  2
  4 5
 2
3y y 3y = 4 − 12 = 8
y2 y −

E(Y 2 ) = dy = −
0 8 4 40 0 5 5
 2
8 7 43
V (Y ) = E(Y 2 ) − [E(Y )]2 = − = = 0.2389
5 6 180
 
X − 2.4 3 − 2.4
6.107 Let X = student GPA. P (X > 3) = P > = P (Z > 1.2) = 0.5 − 0.3849 =
0.5 0.5
0.1151
6.109 Let Y = number of students out of three that possess a GPA in excess of 3.0. Then Y has a
binomial distribution with parameters n = 3, p = P (X > 3) = 0.1151, and
 
3
P (Y = 3) = (0.1151)3 (1 − 0.1151)0 = (0.1151)3 = 0.001525.
3
6.111 Let Y = number of defective bearings out of a sample of five. Then Y has a binomial distribution
with parameters n = 5, p = P (bearing is scrap) = 0.073 (from Exercise 4.105), and
 
5
P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.073)0 (1 − 0.073)5 = 1 − (0.927)5
0
= 0.3155.
Chapter 6: Continuous Probability Distributions 93

1
Γ(α + β) α−1 Γ(α + β) 1 α+k−1
Z Z
6.113 E(X k ) = xk x (1 − x)β−1 dx = x (1 − x)β−1 dx
0 Γ(α)Γ(β) Γ(α)Γ(β) 0
Γ(α + β)Γ(α + k) 1 Γ(α + β + k) α+k−1
Z
= x (1 − x)β−1 dx
Γ(α + β + k)Γ(α) 0 Γ(α + k)Γ(β)
Γ(α + β)Γ(α + k)
=
Γ(α + β + k)Γ(α)
Therefore,
Γ(α + β)Γ(α + 1) α
E(X) = =
Γ(α + β + 1)Γ(α) α+β
Γ(α + β)Γ(α + 2) (α + 1)α
E(X 2 ) = =
Γ(α + β + 2)Γ(α) (α + β + 1)(α + β)
 2
(α + 1)α α
V (X) = E(X 2 ) − [E(X)]2 = −
(α + β + 1)(α + β) α+β
2
(α + β)(α + 1)α − α (α + β + 1)
=
(α + β + 1)(α + β)2
αβ
= .
(α + β + 1)(α + β)2
6.115 Let Xi = lifetime of component i, i = 1, 2, 3.
Z 200 200
1 −x/100
dx = e−x/100 = 1 − e−2

P (Xi < 200) = e
0 100 0
Then Y , the number of components that fail in 200 hours, has a binomial distribution with
parameters n = 3, p = P (Xi < 200) = 1 − e−2 , and
   
3 0 3 3 1
P (Y ≤ 1) = p (1 − p) + p (1 − p)2 = (e−2 )3 + 3(1 − e−2 )(e−2 )2
0 1
= 3e−4 − 2e−6 = 0.04999.

6.117 a Note that X has a Beta distribution with parameters α = 3, β = 5, so that


Γ(α + β) Γ(3 + 5) 5, 040
k= = = = 105.
Γ(α)Γ(β) Γ(3)Γ(5) (2)(24)
α 3
b E(X) = = = 0.375
α+β 8
αβ (3)(5) 5
V (X) = = = = 0.02604
(α + β + 1)(α + β)2 (3 + 5 + 1)(3 + 5)2 192
1
6.119 Since the interview time for an applicant has a Gamma distribution with parameters α = 1, β = ,
2
1
then the sum of three interview times, Y , has a Gamma distribution with parameters α = 3, β = ,
2
and, integrating by parts, we have
  Z ∞
3 1 29 −3/2
P Y > = 3
y 2 e−2y dy = e = 0.8088.
4 3/4 Γ(3)(1/2) 8
94 Chapter 6: Continuous Probability Distributions

6.121 Let Y = waiting time for supplies. Then Y has a uniform distribution with parameters a = 1, b
= 4. Let C = cost of delay. Then
(
100 1≤y≤2
C=
100 + 20(y − 2) 2 < y ≤ 4
and
2 Z 4 Z 4
1 1 y−2
Z
E(C) = 100dy + [100 + 20(y − 2)] dy = 100 + 20 dy
1 4−1 2 4−1 2 3
 4
20 y 2
  
20
= 100 + − 2y = 100 + (2) = 113.33
3 2 2 3
6.123 Let Y = weekly downtime. Then Y has a gamma distribution with parameters α = 3, β = 2, and
Z 10 Z 5
1 2 −y/2 1
P (Y ≤ 10) = 3
y e dy = w2 e−w dw = P (W ≤ 5)
0 Γ(3)2 0 Γ(3)
Y
where W = has a gamma distribution with parameters α = 3, β = 1. Then, for X a random
2
variable having a Poisson distribution with parameter λ = 5, by the result presented in Exercise
4.116, we have
P (Y ≤ 10) = P (W ≤ 5) = 1 − P (X ≤ 2) = 1 − 0.125 = 0.875.
 
ln 4 − 4
6.125 a P (X ≤ 4) = P (Y ≤ ln 4) = P Z ≤ = P (Z ≤ −2.61) = 0.5
1
−0.4955 = 0.0045
 
ln 8 − 4
b P (X > 8) = P (y > ln 8) = P Z > = P (Z > −1.92) = 0.5
1
+0.4726 = 0.9726
Z ∞
1 ∞ ty −|y| 1 0 y(t+1) 1 ∞ −y(1−t)
Z Z Z
6.127 M (t) = E(etY ) = ety f (y)dy = e e dy = e dy + e dy
−∞ 2 −∞ 2 −∞ 2 0
0 ∞ !  
1 1 y(t+1) −1 −y(1−t)2 1 1 1 1
= e + e = + =
2 t+1
−∞ 1−t
0 2 t+1 1−t 1 − t2
Thus,

′ 2t
E(Y ) = M (0) = =0
(1 − t2 ) t=0
Chapter 7

Multivariate Probability Distributions

7.1 Independent Random Variables

7.1 Firms

Contract 1 Contract 2 Probability (x1 , x2 )

1
I I (2,0)
9
1
I II (1,1)
9
1
I III (1,0)
9
1
II I (1,1)
9
1
II II (0,2)
9
1
II III (0,1)
9
1
III I (1,0)
9
1
III II (0,1)
9
1
III III (0,0)
9

95
96 Chapter 7: Multivariate Probability Distributions

x1

0 1 2

1 2 1
0
9 9 9
2 2
x2 1 0
9 9
1
2 0 0
9
4 4 1
p(x1 )
9 9 9

2
P (X1 = 1, X2 = 1) 1
c P (X1 = 1|X2 = 1) = = 9 =
P (X2 = 1) 2 2 2
+
9 9
7.3 a X1
0 1
0 0.0635 0.0775
1 0.1007 0.0556
X2 2 0.1630 0.0653
3 0.1691 0.0549
4 0.1929 0.0574
b
X1 P (X1 |X2 = 0) P (X1 |X2 = 1) P (X1 |X2 = 2)
0 0.4502 0.6455 0.7139
1 0.5498 0.3558 0.2861

X1 P (X1 |X2 = 3) P (X1 |X2 = 4)


0 0.7548 0.7707
1 0.2452 0.2293

Obviously, the older the child is, the better his/her chance of survival in a car accident without
wearing a seat belt.
c
X2 P (X2 |X1 = 0) P (X2 |X1 = 1)
0 0.0921 0.2495
1 0.1461 0.1788
2 0.2365 0.2102
3 0.2453 0.1768
4 0.2799 0.1847
No, this implies that if a child survives, then s/he will probably be older.
Chapter 7: Multivariate Probability Distributions 97

Z 1
1dx2 = 1 0 ≤ x1 ≤ 1

7.5 a f1 (x1 ) = 0
0 otherwise

  1/2
1 1
Z
b P X1 ≤ = 1dx1 =
2 0 2
c f1 (x1 )f2 (x2 ) = 1 = f (x1 x2 ), for 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1. Therefore, X1 and X2 are independent.
  Z 1/4 Z 3/4 Z 3/4 Z 1−x1
3 3
7.7 a P X 1 ≤ , X2 ≤ = 2dx2 dx1 + 2dx2 dx1
4 4 0 0 1/4 0
    Z 3/4
1 3 3 1 7
=2 + 2(1 − x1 )dx1 = + =
4 4 1/4 8 2 8
    
1 1 1 1 1
b P X 1 ≤ , X2 ≤ =2 =
2 2 2 2 2
 
1 1 1
  P X 1 ≤ , X2 ≤
1 1 2 2 2
c P X1 ≤ |X2 ≤ =   = Z 1/2 Z 1−x
2 2 1 2
P X2 ≤ 2dx1 dx2
2 0 0
1 2
= Z 1/2 =
3
2 2(1 − x2 )dx2
0
  1/2 1  1/2 
1 1 3x1 15 21
Z Z Z
7.9 a P X1 < , X2 > = (x1 + x2 )dx2 dx1 =
+ dx1 =
2 0 4 1/4 0 4 32 64
Z 1 Z 1−x1 Z 1 2 1−x1

x
b P (X1 + X2 ≤ 1) = (x1 + x2 )dx2 dx1 = (x1 x2 + 2 ) dx1
0 0 0 2 0
Z 1
1 1
= (1 − x21 )dx1 =
0 2 3
c Note that   
1 1
f1 (x1 )f2 (x2 ) = x1 + x2 + 6= x1 + x2 = f (x1 , x2 )
2 2
for 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1.
Therefore, X1 and X2 are not independent.
98 Chapter 7: Multivariate Probability Distributions

7.11 Note that


Z ∞
1 1   ∞
f1 (x1 ) = x1 e−x1 /2 e−x2 /2 dx2 = x1 e−x1 /2 −e−x2 /2
0 8 4 0
1
= x1 e−x1 /2 for 0 < x1 < ∞
4
and, integrating by parts, we have
Z ∞  ∞ Z ∞ 
1 −x1 /2 −x2 /2 1 −x2 /2 −x1 /2
−x1 /2
f2 (x2 ) = x1 e e dx1 = e −2x1 e +2 e dx1
0 8 8 0 0
 ∞ 
1 1
= e−x2 /2 −4e−x1 /2 = e−x2 /2 for 0 < x2 < ∞.
8 0 2
a Note that
1 1 1
x1 e−x1 /2 e−x2 /2 = x1 e−(x1 +x2 )/2 = f (x1 , x2 )
f1 (x1 )f2 (x2 ) =
4 2 8
for 0 < x1 < ∞, 0 < x2 < ∞.
Therefore, X1 and X2 are independent.
∞ ∞
1 1 −x2 /2
Z Z
b P (X1 > 1, X2 > 1) = P (X1 > 1)P (X2 > 1) = x1 e−x1 /2 dx1 e dx2
1 4 1 2
∞ ∞ 
1 ∞ −x1 /2
 
1
Z

−x1 /2

−x2 /2
= − x1 e + e dx1 −e
2
1 2 1
1
−1
 
1 −1/2 3e
= e + e−1/2 e−1/2 = = 0.5518
2 2
1
7.13 Let Xi = arrival time for friend i, 0 ≤ xi ≤ 1, i = 1, 2. The two friends will meet if |x1 − x2 | <
3
and
  Z 1/6 Z x1 +(1/6) Z 5/6 Z x1 +(1/6)
1
P |X1 − X2 | < = 1dx2 dx1 + 1dx2 dx1
3 0 0 1/6 x1 −(1/6)
Z 1 Z 1
3 2 3 11
+ 1dx2 dx1 = + + = .
5/6 x1 −(1/6) 72 9 72 36

7.15 Let Xi = the time the ith call is made, i = 1, 2, 0 ≤ xi ≤ 1. Then



1 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
f (x1 , x2 ) = f1 (x1 )f2 (x2 ) =
0 otherwise
        
1 1 1 1 1 1 1
a P X 1 < , X2 < = P X1 < P X2 < = =
2 2 2 2 2 2 4
  Z 1/12 Z x1 +(1/12) Z 11/12 Z x1 +(1/12)
1
b P |X1 − X2 | < = dx2 dx1 + dx2 dx1
12 0 0 1/12 x1 −(1/12)
Z 1 Z 1
3 20 3 23
+ dx2 dx1 = + + =
11/12 x1 −(1/12) 288 144 288 144

7.2 Expected Values of Functions of Random Variables


7.17 a E(X1 ) = (0)(0.8) + (1)(0.2) = 0.2
E(X12 ) = (0)2 (0.8) + (1)2 (0.2) = 0.2
V (X1 ) = E(X12 ) − [E(X1 )]2 = 0.2 − (0.2)2 = 0.16
Note that X2 has the same marginal distribution as X1 , and so its mean and variance are equal
to those of X1 .
Chapter 7: Multivariate Probability Distributions 99

b Cov (X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 ) = 0 − (0.2)2 = −0.04


c E(Y ) = E(X1 + X2 ) = E(X1 ) + E(X2 ) = 0.2 + 0.2 = 0.4
V (Y ) = V (X1 + X2 ) = V (X1 ) + V (X2 )+2Cov(X1 , X2 ) = 0.16 + 0.16
+2(−0.04) = 0.24
Z 1Z 1−x1 1
2
Z
7.19 a E(X1 + X2 ) = (x1 + x2 )2dx2 dx1 = (1 − x21 )dx1 =
0 0 0 3
1 Z 1−x1 1  1
x4

2 2
Z Z
2
1 − x31 dx1 =
2

E[(X1 + X2 ) ] = (x1 + x2 ) 2dx2 dx1 = x−
0 0 0 3 3 4
0
 
2 3 1
= =
3 4 2
 2
2 1 1
V (Y ) = E(Y 2 ) − [E(Y )]2 = = −
3 18 2

b Using Tchebysheff’s theorem with k = 2, we have
r r !  
2 2 2 2 1
P − < X1 + X 2 < + ≥ 0.5; i.e., the desired interval is , 1 .
3 18 3 18 3
E(X1 X2 ) − E(X1 )E(X2 )
c ρ= p
Var(X1 )Var(X2 )
Z 1−X1 Z 1−X1
f (X1 ) = f (X1 , X2 )dx2 = 2dX2 = 2(1 − X1 ), 0 ≤ X1 ≤ 1
0 0
Z 1−X2 Z 1−X1
f (X2 ) = f (X1 , X2 )dX1 = 2dX1 = 2(1 − X2 ), 0 ≤ X2 ≤ 1
0 0
Z 1 Z 1
So E(X1 ) = E(X2 ) = y2(1 − y)dy = 2 y − y 2 dy
0 0
 1
y2 y3
    
=2 1−1 =2 1 = 1

=2 −
2

0 3 2 3 6 3
Z 1 Z 1
and E(X12 ) = E(X22 ) = y 2 2(1 − y)dy = 2 y 2 − y 3 dy
0 0
 1
y3 y4
    
= 2 1 − 1 = 2 1 = 1.

=2 −
3 4
0 3 4 12 6
 2
1 1
Hence, Var (X1 ) = Var(X2 ) = E(X12 ) − (E(X1 ))2 = −
6 3
1 1 1
= − =
6 9 18
Z 1Z 1−x1 Z 1 Z 1−x1
E(X1 X2 ) = x1 x2 2dx1 dx2 = x1 2x2 dx2 dx1
0 0 0 0
Z 1 1−x1 ! Z 1
x22 x1 (1 − x1 )2 dx1

= x1 dx1 =
0 0 0
1 1
x21
2x31 x41
Z
= x1 − 2x21 + x31 dx1 = − +
0 2 3 4 0
1 2 1 1
= − + =
2 3 4 12
100 Chapter 7: Multivariate Probability Distributions

 2
1 1 1 1 3−4
− −
12 3 12 9 1
So ρ = s 2 = = 36 = −
1 1 2
1
18 18
18
Z 2 Z 2 Z 2
7.21 a P (Y1 < 2, Y2 > 1) = e−y1 dy1 dy2 = (e−y2 − e−2 )dy2 = e−1 − 2e−2
1 y2 1
= 0.0972
∞ ∞ ∞
1
Z Z Z
−y1
b P (Y1 > 2Y2 ) = e dy1 dy2 = e−2y2 dy2 =
0 2y2 0 2
Z ∞Z ∞ Z ∞
−y1
c P (Y1 − Y2 ≥ 1) = e dy1 dy2 = e−(1+y2 ) dy2 = e−1
0 1+y2 0
Z y1
 e−y1 dy2 = y1 e−y1 0 ≤ y1 < ∞
d f1 (y1 ) = 0
0 otherwise

Z ∞
 e−y1 dy1 = e−y2 0 ≤ y2 < ∞
f2 (y2 ) = y2
0 otherwise

Z ∞ Z ∞ Z ∞
7.23 a E(Y1 − Y2 ) = (y1 − y2 )e−y1 dy1 dy2 = e−y2 dy2 = 1
0 y2 0
Z ∞Z ∞ Z ∞
b E[(Y1 − Y2 )2 ] = (y1 − y2 )2 e−y1 dy1 dy2 = 2e−y2 dy2 = 2
0 y2 0
V (Y ) = E(Y 2 ) − [E(Y )]2 = 2 − 12 = 1
c No, since
Z ∞ Z ∞ Z ∞
−y1
P (Y1 − Y2 > 2) = e dy1 dy2 = e−(2+y2 ) dy2 = e−2 = 0.1353.
0 2+y2 0

7.3 The Multinomial Distribution


7.25 Let Y1 = number of family home fires, Y2 = number of apartment fires, and Y3 = number of fires
in other types of dwellings, out of 4 fires. Then (Y1 , Y2 , Y3 ) has a multinomial distribution
with parameters n = 4, p1 = 0.73, p2 = 0.2, p3 = 0.07. Thus
n! 4!
P (Y1 = 2, Y2 = 1, Y3 = 1) = py11 py22 py33 = (0.73)2 (0.2)(0.07)
y1 !y2 !y3 ! 2!1!1!
= 0.08953.
7.27 Let Y1 = number of planes out of the next 5 inspected with no wing cracks, Y2 = number of
planes with detectable wing cracks, and Y3 = number of planes with critical wing cracks. Then
(Y1 , Y2 , Y3 ) has a multinomial distribution with parameters n = 5, p1 = 0.70, p2 = 0.25, p3 =
0.05.
n! 5!
a P (Y1 = 2, Y2 = 2, Y3 = 1) = p y 1 p y 2 py 3 = (0.70)2 (0.25)2 (0.05)
y1 !y2 !y3 ! 1 2 3 2!2!1!
= 0.04594  
5
b P (Y3 ≥ 1) = 1 − P (Y3 = 0) = 1 − (0.05)0 (0.95)5 = 1 − (0.95)5 = 0.2262
0
Chapter 7: Multivariate Probability Distributions 101

7.29 Let Y1 = number of persons between the ages of 18 and 24, Y2 = number of persons between
the ages of 25 and 44, and Y3 = number of persons between the ages of 45 and 64. Then
(Y1 , Y2 , Y3 ) has a multinomial distribution with parameters n = 5, p1 = 0.21, p2 =
0.28 + 0.19, p3 = 0.32. Therefore,
5!
P (Y1 = 2, Y2 = 2, Y3 = 1) = (0.21)2 (0.47)2 (0.32)1 = 0.09352.
2!2!1!
7.31 Let Y = number of applicants that have a college degree. Then, assuming applicants are selected
independently, Y has a binomial distribution with parameters n = 5, p = 0.10, and
 
5
P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.10)0 (0.90)5 = 1 − (0.90)5
0
= 1 − 0.59040 = 0.40951.

7.33 Let Y = number of items containing at least one defect. Then Y has a binomial distribution with
parameters n = 10, p = 0.10 + 0.05 = 0.15.
 
10
a P (Y = 2) = (0.15)2 (1 − 0.15)10−2 = 45(0.15)2 (0.85)8 = 0.2759
2
 
10
b P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.15)0 (0.85)10 = 1 − (0.85)10
0
= 1 − 0.19687 = 0.80313

7.4 More on the Moment-Generating Function



 
tY ty y−1 r
p (1 − p)y−r
P
7.35 MY (t) = E(e ) = e
y=r r−1
 r ∞

p y−1
[(1 − p)et ]y
P
=
1−p y=r r − 1
 r ∞  
p P y+r−1
= [(1 − p)et ]y+r
1 − p y=0 r−1
 r ∞
 
p t r y+r−1
[(1 − p)et ]y
P
= [(1 − p)e ]
1−p y=0 r−1
r  p 
pet ∞
   
P n+p−1 n 1
= since x = .
1 − (1 − p)et n=0 n 1−x
E(γ) = My′ (0)
(  r−1 )
pet [1 − (1 − p)et ]pet − pet [−(1 − p)et ]
= r
1 − (1 − p)et [1 − (1 − p)et ]2
t=0
r
= .
p
E(Y 2 ) = My′′ (0)
( r−2
pet (pet )2

= r(r − 1) t
1 − (1 − p)e [1 − (1 − p)et ]4
r−1 t )
pet pe [1 + (1 − p)et ]

+ r
1 − (1 − p)et [1 − (1 − p)et ]3
t=0

r2 + r − rp
=
p2
102 Chapter 7: Multivariate Probability Distributions

 2
r2 + r − rp r r(1 − p)
V (Y ) = E(Y 2 ) − [E(Y )]2 = − = .
p2 p p2
7.37 Let µi , σi denote the mean and variance of Xi , i = 1, 2. Then
MY (t) = E et(aX1 +bX2 ) = E etaX1 E etbX2 = MX1 (ta)MX2 (tb)
  
2 2 2 2 2 2 2 2 2
= etaµ1 +(ta) σ1 /2 etbµ2 +(tb) σ1 /2 = et(aµ1 +bµ2 )+t (a σ1 +b σ2 )/2
which is the moment-generating function of a normally distributed random variable with mean
µY = aµ1 + bµ2 , and variance σY2 = a2 σ12 + b2 σ22 .

7.39 Note that X1 has a normal distribution with parameters µ1 = 5, 000, σ12 = (300)2 , and X2
has a normal distribution with parameters µ2 = 4, 000, σ22 = (400)2 . From Exercise 5.38,
Y = X2 − X1 has a normal distribution with parameters µY = µ2 = µ1 = 4, 000 − 5, 000 =
−1, 000, σY2 = σ12 + σ22 = (300)2 + (400)2 = 250, 000 = (500)2 .
P(overload) = P (X2 > X1 ) = P (X2 − X1 > 0) = P (Y > 0)
 
Y − (−1, 000) 0 − (−1, 000)
=P > = P (Z > 2) = 0.5 − 0.4772 = 0.0228
500 500

7.8 Supplementary Exercises


Z x1
x1
3x1 dx2 = 3x21 0 ≤ x1 ≤ 1
Z 
7.41 a f1 (x1 ) = f (x1 , x2 )dx2 = 0
−∞ 
0 otherwise
Z
1
3
(1 − x22 )

 3x1 dx1 = 0 ≤ x2 ≤ 1
f2 (x2 ) = x2 2
0

otherwise
  1/2 Z 3/4 1/2  
3 1 3 9
Z Z
2
b P X 1 ≤ , X2 ≤ = 3x1 dx1 dx2 = − x2 dx2
4 2 0 x2 0 2 16
"  3 #
3 9 1 1 23
= − =
2 32 3 2 64
 
1 3
  P X 1 ≤ , X2 ≥
1 3 2 4 0
c P X1 ≤ |X2 ≥ =   =   =0
2 4 3 3
P X2 ≥ P X2 ≥
4 4
f (x1 , x2 ) 4x1 x2
7.43 f (x1 |X2 = x2 ) = = = 2x1 for 0 ≤ x1 ≤ 1. Then X1 and X2 are independent,
f (x2 ) 2x2
since f (x1 |X2 = x2 ) = f1 (x1 ) = 2x1 for 0 ≤ x1 ≤ 1.
1
 0 ≤ x2 ≤ x1 ≤ 1
x1

7.45 a f (x2 |x1 ) =
otherwise


0

1 0 ≤ x1 ≤ 1
f (x1 ) =
0 otherwise.
1
 0 ≤ x2 ≤ x1 ≤ 1
x1

Therefore, f (x1 , x2 ) = f (x2 |x1 )f (x1 ) =
otherwise


0
Chapter 7: Multivariate Probability Distributions 103

∞ 1/2
1 1 1 1 1
Z Z
b P (x2 ≥ |X1 = ) = f (x2 |x1 = )dx2 = dx =
4 2 1/4 2 1/4
1 2 2
2
1 

x1 −1

= 0 ≤ x2 ≤ x1 ≤ 1

f (x1 , x2 ) 1
Z
x1 lnx2
c f (x1 |x2 ) = = 1
f (x2 ) dx1
x

 x2 1



0 otherwise
So
  Z 1
1 1 1 1 ln2
P X1 ≥ |X2 = = −   dx1 = (lnx1 |11/2 ) =
2 4 1/2 1 ln4 ln4
x1 ln
4
Z 1
1
(x1 + x2 )dx2 = x1 + 0 ≤ x1 ≤ 1

7.47 a f (x1 ) = 0 2
0 otherwise

Z 1
1
(x1 + x2 )dx1 = x2 + 0 ≤ x2 ≤ 1

f (x2 ) = 0 2
0 otherwise


 
1 1
b f (x1 )f (x2 ) = x1 + x2 + 6= f (x1 , x2 ) = x1 + x2 , for 0 ≤ x1 ,
2 2
x2 ≤ 1. Therefore, X1 and X2 are not independent.

1
f (x1 , x2 ) (x1 + x2 )/(x2 + ) 0 ≤ x1 , x2 ≤ 1
c f (x1 |x2 ) = = 2
f (x2 ) 0 otherwise

7.49 Note that


f (x1 , x2 ) = 1, for 0 ≤ x2 + |x1 | ≤ 1, 0 ≤ x2 ≤ 1, |x1 | ≤ 1.
Z
1−x2

 1dx = 2(1 − x ) 0 ≤ x ≤ 1
1 2 2
a f (x2 ) = x2 −1
0

otherwise
Z 1−|x1 |
1dx1 = 1 − |x1 | |x1 | ≤ 1

b f (x1 ) = 0
0 otherwise

1/2 1−x2 1/2  2


1 1 1
Z Z Z
c P [(X1 − X2 ) ≥ 0] = 1dx1 dx2 = (1 − 2x2 )dx2 = − =
0 x2 0 2 2 4
1
3
Z
7.51 E(X1 ) = x1 3x21 dx1 =
0 4
Z 1    
3 2 3 1 1 3
E(X2 ) = x2 (1 − x2 )dx2 = − =
0 2 2 2 4 8
Z 1 Z x1 Z 1 4
3x1 3
E(X1 X2 ) = x1 x2 3x1 dx2 dx1 = dx1 =
0 0 0 2 10
 
3 3 3 3
Cov(X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 ) = − =
10 4 8 160
104 Chapter 7: Multivariate Probability Distributions

1 1 1  
1 x2 1 1 1
Z Z Z
7.53 a E(X1 X2 ) = x1 x2 (x1 + x2 )dx1 dx2 = x2 + dx2 = + =
0 0 0 3 2 6 6 3
1  
1 1 1 7
Z
E(Xi ) = xi xi +
dxi = + = for i = 1, 2
0 2 3 4 12
 
1 7 7 1
Cov(X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 ) = − =−
3 12 12 144
   
7 7 7
b E(3X1 − 2X2 ) = 3E(X1 ) − 2E(X2 ) = 3 −2 =
12 12 12
c Note that
Z 1  
1 1 1 5
E(Xi2 ) = x2i xi + dxi = + = , i = 1, 2
0 2 4 6 12
 2
5 7 11
V (Xi ) = − = , i = 1, 2
12 12 144
so we have
V (3X1 − 2X2 ) = 9V (X1 ) + 4V (X2 ) + 2(3)(−2)Cov(X1 , X2 )
     
11 11 1 155
=9 +4 − 12 − = = 1.0764.
144 144 144 144
7.55 Let X = number of defectives selected. Then X|p has a binomial distribution with parameters
n = 3, p, and
 
n x
P (X = x|p) = p (1 − p)n−x
x
 
n x
P (X = x, p) = p (1 − p)n−x
x
Z 1 
n x
P (X = x) = p (1 − p)n−x dp.
0 x
Therefore,
Z 1  Z 1 Z 1
3
P (X = 2) = p2 (1 − p)3−2 dp = 3p2 (1 − p)dp = 3(p2 − p3 )dp
0 2 0 0
 
1 1 1
=3 − = .
3 4 4
7.57 Let G = net daily gain = X − Y . Then
E(G) = E(X − Y ) = E(X) − E(Y ) = µ − αβ = 50 − 4(2) = 42
V (G) = V (X) + V (Y ) − 2Cov(X, Y ) = V (X) + V (Y ) = σ 2 + αβ 2
= 10 + 4(2)2 = 26
and, using Tchebysheff’s theorem,
70 − E(G)
P (G > 70) = P [G − E(G) ≥ 70 − E(G)] ≤ P |G − E(G)| ≥ p
V (G)
p !2
p  V (G) 26 26
× V (G) ≤ = 2
= = 0.03.
70 − E(G) (70 − 42) 784
Therefore, it is unlikely that her net gain for tomorrow will exceed $70.
Chapter 7: Multivariate Probability Distributions 105

1
7.59 We are given that f (x2 |X1 = x1 ) = for 0 ≤ x2 ≤ x1 ≤ 1. Then
x1
Z ∞ Z x1  
1 x1
E(X2 |X1 = x1 ) = x2 f (x2 |x1 )dx2 = x2 dx2 = .
−∞ 0 x1 2
Therefore,
 
3 3
E X2 |X1 = = .
4 8
 r  r r 1
P P P r
7.61 E(X) = E Xi = E(Xi ) = =
i=1 i=1 i=1 p p
and, since the Xi ’s are independent, we have
 r  r r 1−p
P P P r(1 − p)
V (X) = V Xi = V (Xi ) = 2
= .
i=1 i=1 i=1 p p2
106 Chapter 8: Statistics, Sampling Distributions and Control Charts
Chapter 8

Statistics, Sampling Distributions,


and Control Charts

8.1 The Sampling Distributions


8.1 a Putting the values in order, we have:
0.76, 0.98, 1.32, 1.41, 1.56, 1.77, 1.84, 1.89, 1.90, 2.01, 2.10, 2.42
Q1 = (1.32 + 1.41)/2 = 1.365
Q2 = (1.77 + 1.84)/2 = 1.805
Q3 = (1.90 + 2.01)/2 = 1.995

b x̄ = 19.96/12 = 1.6633
s2 = (35.7092 − (19.96)2 /12)/11 = 0.2281
c Using Tchebysheff’s theorem with k = 2, we estimate µ and σ by x̄ and s, respectively, to get
(x̄ − 2s, x̄ + 2s) = (0.7081, 2.6185).

107
108 Chapter 8: Statistics, Sampling Distributions and Control Charts

8.3 a X= number displayed on die


X occurences in 30 tosses
1 5
2 5
3 5
4 5
5 5
6 5
b x̄ = 3.5 s2 = 3.0172

8.5 median
sample 1 111.4
sample 2 323.2
sample 3 169.3
sample 4 129.9
The data for single-family housing prices is right-skewed, meaning that the median may provide
a better estimation of the “center” of the data.

8.2 The Sampling Distribution of X̄ (General Distribution)


8.7 From Exercise 8.6, we have
 √ √ 
n n
P (|X̄ − µ) ≈ P − ≤Z≤
10 10

This probability equals 0.95 for n/10 = 1.96; i.e. n = (19.6)2 = 384.16
Therefore, since n is integer valued, use 385 test welds.
 √ √ √ 
0.1 n n(X̄ − µ) 0.1 n
8.9 P (|X̄ − µ| ≤ 0.1) = P − ≤ ≤
3/4 3/4 3/4
√  2
0.1 n (3/4)(1.645)
This probability equals 0.9 for = 1.645, i.e. n = = 152.21. Therefore,
3/4 0.1
since n is integer valued, take n = 153
√ √ !
n(X̄ − µ) 100(14 − 12)
8.11 P (X̄ > 14) = P > ≈ P (Z > 2.22) = 0.5 − 0.4868
σ 9
= 0.0132
√ √ √ !
30(1 − 4) n(X̄ − µ) 30(5 − 4)
8.13 a P (1 < X̄ < 5) = P < <
0.8 σ 0.8
≈ P (−20.54 < Z < 6.85) ≈ 1
30
!  
X 115
b P Xi < 115 = P X̄ < = P (X̄ < 3.83) ≈ P (Z < −1.16)
i=1
30
= 0.5 − 0.3770 = 0.1230
c The approximations in parts (a) and (b) assume that daily downtimes are independent.
Chapter 8: Statistics, Sampling Distributions and Control Charts 109

50
!   √ √ !
X 200 n(Ȳ − µ) 50(4 − µ)
8.15 P Yi > 200 =P Ȳ > = P (Ȳ > 4) = P <
i=1
50 σ 2
√ !
50(4 − µ)
≈P Z> = 0.95
2

 
4−µ
This equation is true for 50 = −1.645, i.e:
2
(1.645)(2)
µ=4+ √ = 4.4653
50
100
! √ √ 
X n(X̄ − µ) n((120/n) − 1.5)
8.17 P Xi < 120 = P <
i=1
σ 1
 √ 
n((120/n) − 1.5)
≈P Z<
1

n((120/n) − 1.5)
This probability equals 0.1 for = −1.28 since:
1
P (Z < −1.28) = 0.1

Thus, we have 1.5n − 1.28 n − 120 = 0. Using the quadratic formula, we have:
p
√ 1.28 ± (−1.28)2 − 4(1.5)(−120)
n= = −8.5278 or 9.3811
2(1.5)
Using the positive root, we have n = (9.3811)2 = 88.0052 ≈ 88
8.19 P (|X̄ − Ȳ − (µ1 − µ2 )| ≤ 0.04)
!
0.04
q
2 2
= P (X̄ Ȳ − (µ1 − µ2 ))/ (σ1 /n1 ) + (σ2 /n2 ) ≤ p

(0.01/n) + (0.02/n)
 √ 
n(0.04)
=P |Z| ≤ √
0.03

n(0.04)
This probability equals 0.90 for √ = 1.645, i.e:
0.03
2
n = 0.03 1.645
0.04 = 50.74 ≈ 51

8.3 The Sampling Distribution of X̄ (Normal Distribution)


8.21 The z-score for a sample mean 1 psi above the population mean would be:
x̄ − µ 1
z= √ = √ = 0.32
σ/ n 10/ 10
The z-score for a sample mean 1 psi below the population mean would then be z = −0.32, so our
problem becomes:
P (|X̄ − µ| < 1) = P (−0.32 < Z < 0.32) = P (Z < 0.32) − P (Z < −0.32) = 0.6255 − 0.3745 =
0.2510
110 Chapter 8: Statistics, Sampling Distributions and Control Charts

8.23 a The t-score for a sample mean resistance of 202 would be:
x̄ − µ 202 − 200
t= √ = √ = 0.775
s/ n 10/ 15
The t-score for a sample mean resistance of 199 would be:
x̄ − µ 199 − 200
t= √ = √ = −0.387
s/ n 10/ 15
With 15 − 1 = 14 degrees of freedom, our problem then becomes:
P (199 < X̄ < 202) = P (−0.387 < T < 0.775) = 0.4221
b If the total resistance of the 15 resistors is 5100 ohms, then the mean resistance would be
5100/15 = 340
The t-score for a sample mean resistance of 340 would be:
x̄ − µ 340 − 200
t= √ = √ = 54.22
s/ n 10/ 15
With 14 degrees of freedom, our problem then becomes: P (X̄ < 340) = P (T < 54.22) ≈ 1
8.25 a Assuming that the population of downtimes each day is normally distributed, then the t-score for
a mean downtime of 5 hours would be:
x̄ − µ 5−4
t= √ = √ = 5.59
s/ n 0.8/ 20
The t-score for a mean downtime of 1 hour would be:
x̄ − µ 1−4
t= √ = √ = −16.77
s/ n 0.8/ 20
With 20 − 1 = 19 degrees of freedom, our problem then becomes:
P (1 < X̄ < 5) = P (−16.77 < T < 5.59) ≈ 1
b If the total downtime over 20 days is 115 hours, then the mean downtime is 115/20 = 5.75. The
t-score for a mean downtime of 5.75 hours would be:
x̄ − µ 5.75 − 4
t= √ = √ = 9.78
s/ n 0.8/ 20
With 19 degrees of freedom, our problem then becomes: P (X̄ < 5.75) = P (T < 9.78) ≈ 1
c We must assume that the population of downtimes each day is normally distributed and that the
20 days are chosen randomly (as opposed to consecutively, etc.)

8.4 The Sampling Distribution of the Sample Proportion


8.27 Let Y = number of residents younger than 33 years old out of the sample of 100. Then Y has a
binomial distribution with parameters n = 100, p = 0.5. Let X be a normally distributed random
variable with parameters:
µ = np = 100(0.5) = 50 and σ 2 = np(1 − p) = 100(0.5)(0.5) = 25
 
X −µ 59.5 − 50
P (Y ≥ 60) ≈ P (X ≥ 59.5) = P ≥
σ 5
= P (Z ≥ 1.9) = 0.5 − 0.4713 = 0.0287
Chapter 8: Statistics, Sampling Distributions and Control Charts 111

8.29 Let Y = number of customers out of the sample of 40 who make a purchase. Then Y has a
binomial distribution with parameters n = 40, p = 0.3. Let X be a normally distributed random
variable with parameters:
µ = np = 40(0.3) = 12 and σ 2 = np(1 − p) = 40(0.3)(0.7) = 8.4
 
X −µ 14.5 − 12
P (Y ≥ 15) ≈ P (X ≥ 14.5) = P ≥ √
σ 8.4
= P (Z ≥ 0.86) = 0.5 − 0.3051 = 0.1949
8.31 Let C = capacitor capacitance. Then
 
50 − 53
P (C < 50) = P Z < = P (Z < −1.5) = 0.06668
2
Let Y = the number of capacitors with capacitances below 50µF out of the sample of 64. Then
Y has a binomial distribution with parameters n = 64, p = P (C < 50) = 0.0668. Let X be a
normally distributed random variable with parameters:
µ = np = 64(0.0668) = 4.2752 and σ 2 = np(1 − p) = 64(0.0668)(0.9332) = 3.9896
 
X −µ 11.5 − 4.2752
P (Y ≥ 12) ≈ P (X ≥ 11.5) = P ≥ √
σ 3.9896
= P (Z ≥ 3.62) ≈ 0
8.33 a Let Y=number of right-turning vehicles out of the sample of 500. Then Y has a binomial distri-
bution with parameters n = 500, p = 1/3. Let X be a normally distributed random variable with
parameters:
µ = np = 500/3 and σ = np(1 − p) = 500(1/3)(2/3) = 1000/9
!
X −µ 150.5 − 500/3
P (Y ≤ 150) ≈ P (X ≤ 150.5) = P ≤ p
σ 1000/9
= P (Z ≤ −1.53) = 0.5 − 0.4370 = 0.0630
b Let U = number of vehicles out of the sample of 500 that proceed straight ahead. Then U has
the same distribution as Y in part (a), so:
P (at least 350 turn) = P (U ≤ 150) = P (Y ≤ 150) = 0.0630
8.35 Let Y = number of bids the firm wins out of the sample of 25. Then Y has a binomial distribution
with parameters n = 25 and p = 0.6. Let X be a normally distributed random variable with
parameters:
µ = np = 25(0.6) = 15 and σ 2 = np(1 − p) = 25(0.6)(0.4) = 6
 
X −µ 19.5 − 15
a P (Y ≥ 20) ≈ P (X ≥ 19.5) = P ≥ √ = P (Z ≥ 1.84)
σ 6
= 0.5 − 0.4671 = 0.0329
b P (Y ≥ 20) = 1 − F (19) = 1 − 0.971 = 0.029
c The assumption that contracts are awarded independently of each other is necessary for the
answers in (a) and (b) to be valid.
112 Chapter 8: Statistics, Sampling Distributions and Control Charts

8.5 The Sampling Distribution of S 2 (Normal Distribution)


8.37 Since the individual efficiency measurements are normal, the same mean (a linear combination of
normal random variables) is normal, regardless of the size of n, so:
 
x̄ − µ 10 − 9.5
P (x̄ > 10) = P √ > √ = P (Z > 2.83)
σ/ n 0.5/ 8
0.5 − 0.4977 = 0.0023
8.39 Since the LC50 measurements are normal, the sample mean is also normal for any size n. Hence:
 
0.5
P (|x̄ − µ| ≤ 0.5) = P |Z| < √ √ = P (|Z| < 1.15)
1.9/ 10
= 2P (0 ≤ Z ≤ 1.15) = 2(0.3749) = 0.7498
8.41 Assuming a normal population we find:
s2 (n − 1)
 
2 a(20 − 1) b(20 − 1)
P (a ≤ s ≤ b) = P ≤ ≤
1.9 σ2 1.9
2 2
= P (χ0.95 ≤ U ≤ χ0.05 ) = 0.90 with degrees of freedom = 19. Consequently:
19a
= χ20.95 = 10.1170 so that a = 1.0017 and
1.9
19b
= χ20.05 = 30.1435 so that b = 3.10435
1.9
8.43 a Assuming a normal population, we have:
 2 
2 s (n − 1) 80(15 − 1)
P (s > 80) = P > = P (U > 22.4)
σ2 50
From the χ2 table, with 14 degrees of freedom, we find that 22.4 lies between the values of 21.0642
and 23.6848. Hence, 0.05 < P (s2 > 80) < 0.10
b Assuming a normal poplulation, we use the χ2 table, with degrees of freedom = 14 to find:
 2 
s (n − 1) 20(15 − 1)
P (s2 < 20) = P < = P (U < 5.6)
σ2 50
≈ 1 − 0.975 = 0.025
c We know that E(S 2 ) = σ 2 = 50 and, assuming a normal population,:
2σ 2 2(502 )
V (S 2 ) = = = 357.1429
n−1 15 − 1
Using Tchebysheff’s theorem with k=2, at least 75% of the sample variances would lie between:
√ √
(50 − 2 357.1429, 50 + 2 357.1429) = (12.20, 87.80)
d Assume that the population of resistances is approximately normal.
8.45 From the χ2 table, with 11 degrees of freedom, we have:
 
225(12 − 1)
P (S > 15) = P (S 2 > 225) ≈ P χ2 > = P (χ2 > 24.75)
102
≈ 0.01
Chapter 8: Statistics, Sampling Distributions and Control Charts 113

8.6 Sampling Distributions: The Multiple-Sample Case


8.47 Sample sizes are large enough for the difference in sample means to be approximately normally
distributed with a mean of µ1 − µ2 = 0 and a standard deviation of
s r
σ12 σ22 102 122
σx̄1 −x̄2 = + = + = 1.562
n1 n2 100 100
The z-score for a difference in sample means of 1 would be:
(x̄1 − x̄2 ) − (µ1 − µ2 ) 1−0
z= = = 0.64
σx̄1 −x̄2 1.562
The z-score for a difference in sample means of -1 would then be -0.64. Our problem then becomes:
P (|X̄1 − X̄2 | < 1) = P (−0.64 < Z < 0.64) = P (Z < 0.64) − P (Z < −0.64) = 0.7389 − 0.2611 =
0.4778
8.49 a Assuming that the populations of carbon monoxide concentration measurements are normally
distributed, then the difference in the sample means will be normally distributed with a mean of
µ1 − µ2 = 12 − 10 = 2 and a standard deviation of:
s r
σ12 σ22 92 92
σx̄1 −x̄2 = + = + = 4.025
n1 n2 10 10
The z-score for a difference in sample means of 1.4 ppm would be:
(x̄1 − x̄2 ) − (µ1 − µ2 ) 1.4 − 2
z= = = −0.149
σx̄1 −x̄2 4.025
Our problem then becomes: P (|X̄1 − X̄2 | > 1.4) = P (|Z| > 0.149) = 2(1 − P (Z < 0.149)) =
2(1 − 0.5592) = 0.8816
b Populations should be approximately normal.
8.51 Assuming that the populations of thread strength measurements are normally distributed, then
the difference in the sample means will be normally distributed with a mean of µ1 − µ2 = 0. The
pooled estimate of the population standard deviation is:
s r
(n1 − 1)s21 + (n2 − 1)s22 (10 − 1)(0.2)2 + (10 − 1)(0.18)2
sp = = = 0.1903
(n1 − 1) + (n2 − 1) 10 − 1 + 10 − 1
So the point estimate of the standard deviation of the difference in sample means will be:
r r
1 1 1 1
sx̄1 −x̄2 = sp + = (0.1903) + = 0.0425
n1 n2 10 10
The t-score for a difference in sample means of 1 would be:
(x̄1 − x̄2 ) − (µ1 − µ2 ) 1−0
t= = = 23.51
sx̄1 −x̄2 0.0425
With n1 + n2 − 2 = 10 + 10 − 2 = 18 degrees of freedom, our problem then becomes:

P (|X̄1 − X̄2 | > 1) = P (|T | > 23.51) ≈ 0


114 Chapter 8: Statistics, Sampling Distributions and Control Charts

8.53 Because the sample sizes are large, the sampling distributions of trial times for operators A and
B will be approximately normally distributed. Thus, the difference in the sample means will be
normally distributed with a mean of µ1 − µ2 = 15 − 15 = 0 and a standard deviation of:
s r
σ12 σ22 22 22
σx̄1 −x̄2 = + = + = 0.3266
n1 n2 75 75
The z-score for a difference in sample means of x̄1 − x̄2 = 5 seconds would be:
(x̄1 − x̄2 ) − (µ1 − µ2 ) 5−0
z= = = 15.31
σx̄1 −x̄2 0.3266
Our problem then becomes: P (X̄1 − X̄2 > 5) = P (Z > 15.31) ≈ 0

8.8 Process Capability


8.55 a ¯x = 15.0283 R̄ = 0.3467 A2 = 0.729
Control limits: 15.0283 ± 0.729(0.3467) = (14.776, 15.281)
b Since samples 9,10, and 12 have means outside the Control limits, we recalculate ¯x = 15.0104 and
R̄ = 0.3417
Control limits: 15.0104 ± 0.729(0.3417) = (14.761, 15.260)
′ 3(0.1362)
8.57 Control limits: ¯x ± 3s̄
c2 = 15.0283 ± 0.7979 = (14.5162, 15.5404) where c2 is found from Table 11
in the Appendix. Since this interval is wider than:
15 ± 0.4 = (14.6, 15.4), the individual readings do not seem to be meeting the specifications.
137
8.59 a P̄ = = 0.0685
100(20)
r
(0.0685)(0.9315)
Control limits: 0.0685 ± 3 or(0, 0.1443)
100
b Since p11 = 15/100 = 0.15 does not fall within the control limits, sample 11 is omitted and the
122
new control limits for P̄ = = 0.0642 are given by:
100(19)
p
0.0642 ± 3 (0.0642)(0.9358)/100 or (0, 0.1377). Since p4 = 0.14 is outside these control limits,
108
we omit sample 4 and recompute P̄ = = 0.06. The control limits are now:
100(18)
r
(0.06)(0.94)
0.06 ± 3 or (0, 0.1312) and all remaining samples have sample proportions within
100
this interval.
116
8.61 a P̄ = = 0.0773
50(30)
r
0.0773(0.9227)
Control limits: 0.0773 ± 3 or (0, 0.1906)
50
b Since p7 = 10/50 = 0.2 is outside the control limits, we delete sample 7 and recompute P̄ =
106
= 0.0731 to get the new control limits:
50(29)
r
0.0731(0.9269)
0.0731 ± 3 or (0, 0.1835)
50

8.63 Control limits: 5.67 ± 2.575 5.67 or (0, 11.8015)
Chapter 8: Statistics, Sampling Distributions and Control Charts 115

8.65 Since λ is both the mean and variance for the Poisson distribution, control limits for λ may
be simultaneously viewed as limits for the mean or variance. Hence, a seperate method is not
necessary.

8.67 Cpk will increase. Proportion will decrease.


8.69 Exercise for student

8.9 Supplementary Exercises


8.71 Let Xi be the lifetime of the ith lamp, i = 1, 2, ..., 25
25
!  
X 52 − 50
P Xi > 1300 = P (x̄ > 52) ≈ P Z > √
i=1
4/ 25
= P (Z > 2.5) = 0.5 − 0.4938 = 0.0062
8.73 a E(x̄ − ȳ) = E(x̄) − E(ȳ) = µ1 − µ2
σ2 σ2
b V (x̄ − ȳ) = V (x̄) + V (ȳ) = 1 + 2
n m
 
  2t
Consider MU (t) = E etU = E e(2t/θ)Y = MY

8.75
θ
  
2t
= 1−θ = (1 − 2t)−1 = (1 − 2t)−2/2 Thus U is distributed gamma:
θ
α = 2/2 = 1 and β = 2 or, equivalently, chi-square with v = 2.
8.77 First note that for Z, a standard normal variable:
Z ∞ Z ∞
2 2 1 2 1 2
MZ 2 (t) = E(etZ ) = etz √ e−z /2 dz = √ e−z (1−2t)/2 dz
−∞ 2π −∞ 2π
−z 2
Z ∞
e 2(1−2t)−1
= (1 − 2t)−1/2 √ dz = (1 − 2t)−1/2
−∞ 2π(1 − 2t)−1/2
The latter integrand being the density of a normal (µ = 0, σ 2 = (1 − 2t)−1 ) distribution. Thus
Z 2 is distributed chi-square with v = 1. Hence:
50 50
! !
X X
2
P (total cost > 48) = P Ci > 48 = P 4 (Yi − µ) > 48
i=1 i=1
50  50
2 ! !
X Yi − µ X
P 4(0.2) √ > 48 =P 4(0.2) Zi2 > 48
i=1
0.2 i=1
Since the sum of gamma randomP50 variables with constant β is again distributed as a gamma
random variable, then U = i=1 Zi2 is the sum of gamma (α = 1/2 and β = 2) (i.e. chi-square
(v = 1)) random variables, and therefore, U has gamma (α = 50/2 and β = 2) or chi-square
(v = 50) distribution. Furthermore:
   
50 50
E(U ) = αβ = (2) = 50 and V (U ) = αβ 2 = (2)2 = 100. We are given that U will be
2 2
approximately normal for large v. In this case, v = 50 and thus:
   
48 60 − 50
P (total cost > 48) = P U > = P (U > 60) ≈ P Z > √
4(0.2) 100
= P (Z > 1) = 0.5 − 0.3413 = 0.1587
116 Chapter 9: Estimation
Chapter 9

Estimation

9.1 Point Estimators and Their Properties


9.1 a E(θ̂1 ) = E(X1 ) = θ
   
X1 + X2 E(X1 ) + E(X2 ) θ+θ
E(θ̂2 ) = E = = =θ
2 2 2
   
X1 + 2X2 E(X1 ) + 2E(X2 ) θ + 2θ
E(θ̂3 ) = E = = =θ
3 3 3
E(θ̂4 ) = E(x̄) = θ
Hence, each of the four estimators is unbiased for θ.
b V (θ̂1 ) = V (X1 ) = θ2
θ2 + θ2 θ2
   
X1 + X2 V (X1 ) + V (X2 )
V (θ̂2 ) = V = = =
2 4 4 2
2 2
5θ2
   
X1 + X2 V (X1 ) + 4V (X2 ) θ + 4θ
V (θ̂3 ) = V = = =
3 9 9 9
2 2
σ θ
V (θ̂4 ) = V (x̄) = =
n 3
Thus, θ̂4 = x̄ has the smallest variance among these four estimators.
9.3 a Since λ is the mean of a Poisson (λ) distribution, x̄ will be an unbiased estimator of λ; i.e.,
E(x̄) = µ = λ.
b E(C) = E(EY + Y 2 ) = 3E(Y ) + E(Y 2 ) = 3λ + (V (Y ) + (E(Y ))2 )
= 3λ + (λ + λ2 ) = λ2 + 4λ
n
 n  n
1 P 2 1 P 2 1 P
c Consider Xi + 3x̄. Since E Xi + 3x̄ = E(Xi2 ) + 3x̄
n i=1 n i=1 n i=1
1 Pn 1 Pn
= (V (Xi ) + (E(Xi ))2 ) + 3λ = (λ + λ2 ) + 3λ = λ2 + 4λ = E(C). This estimator is
n i=1 n i=1
unbiased for E(C).
2
(θ + 1 − θ)2
 
2 2 2 1
9.5 MSE(x̄) = V (x̄) + B = σ /n + (θ − E(x̄)) = + θ− θ+
12n 2
1 1
= +
12n 4

117
118 Chapter 9: Estimation

9.7 Since x̄ is an unbiased


p estimator of µ and Exercise 7.6 provides an unbiased estimator of σ, we
have x̄ − 1.645s (n − 1)/2Γ((n − 1)/2)/Γ(n/2) is an unbiased estimator of µ − 1.645σ.

9.2 Confidence Intervals: The Single-Sample Case


9.9 Not from this information. Since there is a 3 point margin of error, it is possible that before the
democratic convention, as few as 47 − 3 = 44% of likely voters opted for Kerry, and that after the
democratic convention, as many as 45 + 3 = 48% of likely voters opted for Kerry - implying that
his popularity may have actually increased after the democratic convention.
9.11 Here, x̄ = 46, s = 3, and n = 40. A 95% large sample confidence interval for the mean hours
worked is given by
√ √
x ± z0.025 σ/ n ≈ 46 ± 1.96(3)/ 40 = (45.07, 46.93).
9.13 We need to determine a sample size for estimating the population mean to within B = 4 grams
with confidence coefficient 1 − α = 0.90. Hence, using σ ≈ 210,
n = (z0.05 σ/B)2 = (1.645(210)/4)2 = 7, 458.48 or n = 7, 459.

9.15 We need to estimate the percent of shrinkage to within B = 0.2 with confidence coefficient
1 − α = 0.98. From Exercise 7.11, the standard deviation is given as 1.2. Using this information,
we get
n = (z0.01 σ/B)2 ≈ (2.33(1.2)/0.2)2 = (13.98)2 = 195.44 or n = 196.
9.17 The sample size for estimating the proportion of failing resistors is desired with B = 0.05 and
1 − α = 0.95. From Exercise 7.15, we estimate p to be approximately 0.12. Then,
p p
n = (z0.025 p(1 − p)/B)2 ≈ (1.96 (0.12)(0.88)/0.05)2 = 162.27 or n = 163.

9.19 To estimate the proportion of cracked supports to within B = 0.1 with 1 − α = 0.98 and p
estimated by 0.4, we have
p p
n = (z0.01 p(1 − p)/B)2 = (2.33 (0.4)(0.6)/0.1)2 = 130.29 or n = 131.
n n
x2i = 1, 426, and n = 12. Thus,
P P
9.21 From the LC50 measurements, we find xi = 108,
i=1 i=1
2 1/2
(108)

108 1426 −
x̄ = = 9 and s = 
 12  = 6.4244. With 11 degrees of freedom, a 90%
12 11

confidence interval for the true mean LC50 of DDT is then


√ √
x̄ ± t0.05 s/ n = 9 ± 1.796(6.4244)/ 12 = (5.6692, 12.3308) where t0.05 is found from t-table.
√ √
9.23 a For n = 10, x̄ ± t0.025 s/ n = 180 ± 2.262(5)/ 10 = (176.4235, 183.5765)
√ √
b For n = 100, x̄ ± z0.025 σ/ n ≈ 180 ± 1.96(5)/ 100 = (179.02, 180.98)

9.25 A 95% confidence interval for the population variance of LC50 measurements, where from Exercise
6.20, n = 12, and s2 = 41.2727, is
(n − 1)s2 (n − 1)s2
   
11(41.2727) 11(41.2727)
, = ,
x20.025 (11) x20.975 (11) 21.9200 3.81575
= (20.7117, 118.9805).
The chi-square values are taken from χ2 table, with 11 degrees of freedom.
Chapter 9: Estimation 119

9.27 Assuming a normal population of TSP measurements, then, with x̄ = 72, s = 23, and n = 9, a
95% confidence interval for the true mean TSP is
√ √
x̄ ± t0.025 s/ n = 72 ± 2.306(23)/ 9 = (54.3207, 89.6793).
The t-table is used to find t0.025 with 8 degrees of freedom.
9.29 Given x̄ = 585, s = 38, and n = 12, we seek a 95% confidence interval for the mean tensile
strength. Assuming a normal population of tensile strength measurements and using t-table, we
find
√ √
x̄ ± t0.025 s/ n = 585 ± 2.201(38)/ 12 = (560.8558, 609.1442).
9.31 From Exercise 7.29, s2 = 59117.75 and n = 9. Then, assuming a normal population and using χ2
table, we compute a 90% confidence interval for the variance of the number of cycles to failure as
(n − 1)s2 (n − 1)s2
   
8(59117.75) 8(59117.75)
, = ,
x20.05 (8) x20.95 (8) 15.5073 2.73264
= (30, 498.0235, 173, 071.4620).

9.33 No, the statement refers to a particular confidence interval. The probability that this particular
confidence interval contains the true population proportion, p, is 1 (if it in fact does contain p)
or 0 (if it does not contain p). The correct interpretation is that, in repeated sampling, 95% of
similarly constructed confidence intervals will contain the true population proportion of Americans
that list football as their favorite sport.

9.3 Confidence Intervals: The Multiple Samples Case


9.35 A large sample 95% confidence interval for the difference between the mean resistances to abrasions
is
1 1
x̄1 − x̄2 ± z0.25 (σ12 /n1 + σ22 /n2 ) 2 = 92 − 98 ± 1.96(20/50 + 30/40) 2
= (−8.1019, −3.8981).
9.37 A large sample 90% confidence interval for the true difference between mean depths is
1 1
x̄1 − x̄2 ± z0.25 (σ12 /n1 + σ22 /n2 ) 2 = 0.18 − 0.21 ± 1.645 (0.02)2 /35 + (0.03)2 /30 2


= (−0.0406, −0.0194).
If coating B were superior in inhibiting corrosion, then it should have shallower pit depths. Thus,
µA − µB would be positive. Since the confidence interval includes negative values (in fact the
entire interval is composed of negative values), we cannot conclude that coating B is better than
coating A.
9.39 The largest possible variance would correspond to p1 = p2 = 0.5. Then we require
(half the width of confidence interval) ≤ Bound = B,
r
(0.5)(0.5) (0.5)(0.5)
1.96 + ≤ 0.1.
n n
Solving for n yields
 
(0.5)(0.5) + (0.5)(0.5)
n ≥ (1.96)2 n ≥ 192.08 so that n = 193.
(0.1)2
120 Chapter 9: Estimation

9.41 For a 95% confidence interval for the difference between the population means, first we compute
(n1 − 1)s21 + (n2 − 1)s22 (8)(5.88)2 + (6)(7.68)2
s2p = = = 45.0350
n1 + n2 − 2 9+7−2
and sp = 6.7108 so that with degrees of freedom = 14, the required interval is
r r
1 1 1 1
ȳ1 − ȳ2 ± t0.025 sp + = 43.71 − 39.63 ± 2.145(6.7108) +
n1 n2 9 7
= (−3.1742, 11.3342).
Using F table, a 90% confidence interval for the ratio of the true variances is
 2
s22 (7.68)2 1 (7.68)2
  
s2 1
, F 0.05 (8, 6) = , (4.15)
s21 F0.05 (6, 8) s21 (5.88)2 3.58 (5.88)2
= (0.4765, 7.0797).
If intermittent training gives more variable results, then s22 /s21 > 1. Since this confidence interval
includes the value 1, we cannot conclude that intermittent training gives more variable results.
(n1 − 1)s21 + (n2 − 1)s22 2(0.02)2 + 2(0.07)2
9.43 First we compute s2p = = = 0.00265 so that
n1 + n 2 − 2 3+3−2
sp = 0.05148. Assuming normal populations with equal variance, a 95% confidence interval for
the difference between mean impulses for the two rackets is
r
1 1
x̄1 − x̄2 ± t0.025 sp + (degrees of freedom = 4)
n1 n2
r
1 1
= 2.41 − 2.22 ± 2.776(0.05148) + = (0.0733, 0.3067).
3 3
(n1 + 1)s21 + (n2 − 1)s22 14(0.04)2 + 6(0.05)2
9.45 We compute s2p = = = 0.00187 with
n1 + n2 − 2 15 + 7 − 2
sp = 0.04324. Then a 98% confidence interval for the difference between true mean H/C ratios
for altered and partly altered bitumen is
r
1 1
x̄1 = x̄2 ± t0.01 sp + (degrees of freedom = 20)
n1 n2
r
1 1
= 1.02 − 1.16 ± 2.528(0.04324) + = (−0.1900, −0.08996).
15 7
(n1 − 1)s21 + (n2 − 1)s22 39(0.14)2 + 19(0.04)2
9.47 From the given data, s2p = = = 0.01370 so that
n1 + n2 − 2 40 + 20 − 2
sp = 0.1171. Assuming normal populations and equal variances, we compute a 98% confidence
interval for the difference between mean HCl amounts asr follows:
r
1 1 1 1
x̄1 − x̄2 ± t0.01 sp + = 1.26 − 1.40 ± 2.33(0.1171) +
n1 n2 40 20
= (−0.2147, −0.0653).
9.49 With the aid of F table, a 90% confidence interval for the ratio of variances in Exercise 7.47 is
 2
s22 (0.6990)2 1 (0.6990)2
  
s2 1
, F 0.05 (6, 7) = , (3.87)
s21 F0.05 (7, 6) s21 (0.2059)2 4.21 (0.2059)2
= (2.7375, 44.6018).
Chapter 9: Estimation 121

9.51 Let µ1 be the mean yield of 14mm rods and µ2 the mean yield of 10mm rods. We seek to estimate
µ1 − µ2 , the increase in the mean yield stress where
50(17.2)2 + 11(26.7)2
s2p = = 371.0457 sp = 15.6134.
51 + 12 − 2
Assuming normal population and equal variance, a 90% confidence interval with 61 degrees of
freedom is r
r
1 1 1 1
x̄1 − x̄2 ± t0.05 sp + = 499 − 485 ± 1.645(19.2625) + = (3.834, 24.1666).
n1 n2 15 12
9.53 The sampling error represents half the width of a confidence interval. In this case,
r
(0.6)(0.4) (0.5)(0.5)
zα/2 + = 0.045.
1000 1000
Thus zα/2 = 2.033 so that 1 − α ≈ 2(0.4788) = 0.9576. Hence, a sampling error of 0.045 corre-
sponds approximately to a 95% confidence interval for p1 − p2 . An approximate 95% confidence
interval for p1 − p2 is
r r
p̂1 (1 − p̂1 ) p̂(1 − p̂2 ) (0.6)(0.4) (0.5)(0.5)
p̂1 − p̂2 ± z0.025 + = 0.6 − 0.5 ± 1.96 +
n1 n2 1000 1000
= (0.0566, 0.1434).

9.4 Prediction Intervals


9.55 Assume a random sample with resistances that are normally distributed. Then a 95% prediction
interval for the resistance is (degrees of freedom = 14)
r r
1 1
x̄ ± t0.025 s 1 + = 9.8 ± 2.145(0.5) 1 + = (8.6923, 10.9077).
n 15
9.57 Assuming downtimes are independent from month to month and have a normal distribution, a
95% prediction interval for the downtime next month is (degrees of freedom = 4)
r r
1 1
x̄ ± t0.025 s 1 + = 42 ± 2.776(3) 1 + = (32.8771, 51.1229).
n 5
9.59 Assuming a normal distribution of gas mileages, a 90% prediction interval for the gas mileage on
the next test is (degrees of freedom = 49)
r r
1 1
x̄ ± t0.05 s 1 + = 39.4 ± 1.645(2.6) 1 + = (35.0804, 43.7196).
n 50

9.5 Tolerance Intervals


9.61 x̄ = 8.5, s = 1.9579, n = 10. Assume a normal population and K = 4.265 from table for
tolerance limits.
8.5 ± 4.265(1.9579) = (0.1496, 16.8504)

9.63 a 1.1 ± 1.958(0.015) = (1.0706, 1.1294)


b 1.1 ± 1.958(0.06) = (0.9825, 1.2175). The value of s does have a great effect on the length of the
tolerance interval.
9.65 a Since n = 10 and δ = 0.95, the confidence coefficient is 1−10(0.95)10−1 +(10−1)(0.95)10 = 0.0861.
b Since n = 10 and δ = 0.80, the confidence coefficient is 1−10(0.80)10−1 +(10−1)(0.80)10 = 0.6242.
122 Chapter 9: Estimation

9.6 The Method of Maximum Likelihood


9.67 We need to find λ̂ to maximize P
Qn −λ λxi xi
−nλ λ
Qn
L(λ) = i=1 p(xi ) = i=1 e =e Q .
xi ! xi !
Before differentiating, we take logarithms to get
Pn Qn
ln L(λ) = −nλ + i=1 xi ln(λ)−ln i=1 xi .
Thus,
Pn
∂lnL(λ) xi
= −n + i=1 = 0.
∂λ λ=λ̂ λ̂
Pn Pn
i=1 xi ∂ 2 lnL(λ) − i=1 xi
Solving for λ̂ gives λ̂ = = x̄. Since = < 0, we have λ̂ = x̄ maximizes
n ∂λ2 λ2
L(λ).
n
((xi −µ)/σ)2
 
1 xi − µ
P
Qn Qn −1/2
2
9.69 L(µ, σ )= i=1 f (xi ) = i=1 √ e−1/2 = (2πσ )2 −n/2
e i=1

2πσ σ
n
 2
n 1 P xi − µ
Instead, we maximize L(µ, σ2 ) = − ln(2πσ 2 ) − .
2 2 i=1 σ
Then,
∂lnL(µ, σ 2 )
n x µ̂
P i
µ = µ̂ = ( )=0
∂µ
i=1 σ̂
2 2
σ = σ̂
n n x
P P i
so that xi − nµ̂ = 0 and hence µ̂ = = x̄. Next,
i=1 i=1 n
∂lnL(µ, σ 2 )

n
 
n 2π 1 P
µ = µ̂ = − + (xi − µ̂)2 = 0
∂σ 2 2 2πσ̂ 2 2σ̂ 4 i=1
σ 2 = σ̂ 2
Simplifying,
n
−σ̂ 2 + (xi − x̄)2 = 0 and hence
P
i=1
Pn 2
 
2 i=1 (xi − x̄) n−1 2
σ̂ = = s .
n 2
Chapter 9: Estimation 123

9.71 Since percentage points of the gamma distribution are not generally available, we will attempt
to express the distribution of Xi (gamma (α = 2, β)) in terms of the chi-square distribution
for which tables are available. Recall that the moment-generating function for the chi-square (ν)
distribution is (1 − 2t)−ν/2 .
  −2    
tβ 2t 2t
Since MXi (t) = (1−βt)−α = 1 − 2 , we have MXi = (1−2t)−2 . But MXi =
2 β β
E(e(2t/β)X
  ) = E(e
i t(2Xi /β)
) = M2Xi /β (t). This suggests defining Ui = 2Xi /β so that MUi (t) =
2t
MX i = (1 − 2t)−α = (1 − 2t)−2α/2 . Hence Ui is chi-square (ν = 2α = 2(2) = 4). The
β
n
P
sum of independent chi-square (νi ) random variables is again chi-square (νi = νi ). Thus, for
i=1
X1 , . . . , Xn , a random sample from a gamma (α = 2, β) distribution,
n  Xn
X 2 2nx̄
Ui = Xi =
i=1
β i=1 β

has a chi-square (ν = 2nα = 2n(2) = 4n) distribution. Thus,


n
   
2nx̄
1 − α = P χ21−α/2 (4n) ≤ Ui ≤ χ2α/2 (4n) = P χ21−α/2 (4n) ≤ ≤ χ2α/2 (4n) .
P
i=1 β
!
2nx̄ 2nx̄
Rearranging, 1 − α = P 2 , 2 .
χα/2 (4n) χ1−α/2 (4n)
A 100(1 − α)% confidence interval for β is then
!
2nx̄ 2nx̄
, .
χ2α/2 (4n) χ21−α/2 (4n)
n
Xi = 239.3. Interpolating from χ2 table, a 95% confidence interval for β is
P
In our case, nx̄ =
i=1
   
2(239.3) 2(239.3) 478.6 478.6
, = , = (9.6781, 26.1255).
χ20.025 (32) χ20.975 (32) 49.4517 18.3193
9.73 Here Xi is distributed uniform (0, θ). We need the maximum likelihood estimate of θ. Then,
n n
Y Y 1
L(θ) = f (xi ) = = θ−n .
i=1 i=1
θ

The slope of L(θ) 6= 0, for any θ > 0, so there is no point in taking the derivative of L(θ).
However, L(θ) increases as θ decreases to zero. Thus we take θ as small as possible. But since
0 < Xi < θ, θ cannot be smaller than the largest value of Xi . Hence θ̂ = max Xi . In this case,
1≤i≤3
the maximum likelihood estimate of θ is θ̂ = max Xi = 0.06.
1≤i≤3

9.7 Supplementary Exercises


9.75 Assume that the diameter measurements are independent and have a common normal distribution.
Then a 90% confidence interval for the mean diameter is (degrees of freedom = 9)
√ 1.833(0.3)
x̄ + t0.05 s/ n = 2.1 ± √ = (1.9261, 2.2739).
10
124 Chapter 9: Estimation

9.77 A 95% confidence interval for the mean hardness is (degrees of freedom = 14)

t0.025 s 2.145 90
x̄ ± √ = 65 ± √ = (59.7458, 70.2542).
n 15
1
(23.85)2 2

23.85 58.323 −
9.79 a We compute x̄ = = 2.385 and the standard deviation s = 
 10 
10 9

= 0.4001. Assuming a normal population, a 95% confidence interval for the mean pre-etch window
width is (degrees of freedom = 9)
t0.025 s 2.262(0.4001)
x̄ ± √ = 2.385 ± √ = (2.0988, 2.6712).
n 10
34.26
b We compute x̄ = = 3.425 and the standard deviation
10
1
(34.26)2 2

120.5412 −
s=
 10 
9

= 0.5931. Assuming a normal population, a 95% confidence interval for the mean post-etch
window width is
t0.025 s 3.426 ± 2.262(0.5931)
x̄ ± √ = √ = (3.0018, 3.8502).
n 10
(n − 1)s2 (n − 1)s2 9(0.5931)2 9(0.5931)2
   
9.81 a , = , = (0.1664, 1.1724)
χ20.025 (n − 1) χ20.975 (n − 1) 19.0228 2.70039
 2
s22 1 (0.5931)2 1 (0.5931)2
  
s2 1
b , F 0.05 (9, 9) = , 3.18
s21 F0.05 (9, 9) s21 (0.4001)2 3.18 (0.4001)2
= (0.6910, 6.9879)
c Assume both samples are independent and observations come from normal populations.
1/2
(5.17)2

2.6779 −
9.83 a We calculate s = 
 10  = 0.02359. Then a 95% confidence interval for the
9

population variance is
(n − 1)s2 (n − 1)s2 9(0.02359)2 9(0.02359)2
   
, = ,
χ20.025 (n − 1) χ20.975 (n − 1) 19.0228 2.70039
= (0.000263, 0.001855).
b Assume that weight proportions are normally distributed and the samples are independent. The
independence of the samples and the normality of the weight proportions need to be verified.
Chapter 9: Estimation 125

9.85 Let m be the number of observations allocated to sample 1 and let n − m be the number allocated
to sample 2. We need to minimize the length of the confidence interval from µ1 − µ2 given by
r
σ12 σ22
2zα/2 + .
m n−m
σ2 σ22
Equivalently, we will choose m to minimize V (x̄1 − x̄2 ) = 1 + . Taking the derivative of
m n−m
V with respect to m, we get
dV σ2 σ2 σ22
= 12 + 22 + = 0.
dm m m (n − m)2
nσ1 nσ2
Rearranging, we have σ12 (n − m)2 = m2 σ22 so that m = and hence n − m = .
σ1 + σ2 σ1 + σ 2
9.87 A point estimate of 2µ1 + µ2 is 2ȳ + x̄. Then V (2ȳ + x̄) = 4V (ȳ) + V (x̄)
σ2 3σ 2
 
4 3
=4 + = σ2 + . An estimate of the common variance σ 2 is given by
n m n m
(m − 1)s2x
(n − 1)s2y +
s2p = 3 .
n+m−2
A 95% confidence interval for 2µ1 + µ2 is then
r
4 3
2ȳ + x̄ ± tα/2 sp +
n m
where sp is the square root of s2p defined above and degrees of freedom = n + m − 2.
9.89 The number of defectives, X, is distributed binomial (n, p). The maximum likelihood estimate of
x number of defectives
p for fixed n is p̂ = . We seek the maximum likelihood estimate for r = .
n number of good items
p
Dividing numerator and denominator of r by the total number of items gives r = . Hence
1−p
r = g(p) is a function of p. The maximum likelihood estimate of r is then given by
x
n x
r̂ = g(p̂) = x = n − x.
1−
n
126 Chapter 10: Hypothesis Testing
Chapter 10

Hypothesis Testing

10.1 Terminology of Hypothesis Testing


10.1 H0 : µ = 130 and Ha : µ < 130

10.3 H0 : µ = 64 and Ha : µ < 64


10.5 H0 : µ = 65 and Ha : µ < 65
10.7 H0 : µ = 2500 and Ha : µ < 2500
10.9 H0 : µ = 30 and Ha : µ < 30

10.11 H0 : µ = 490 and Ha : µ > 490

10.2 Hypothesis Testing: The Single-Sample Case


10.13 Hypotheses: H0 : µ = 130 Ha : µ < 130
x̄ − µ0 128.6 − 130
Test Statistics: z = σ ≈ = −4.22
√ 2.1

n 40
Rejection Region: z < −z0.05 = −1.645
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the mean
output voltage is less than 130 at α = 0.05.
(z0.05 + z0.01 )2 σ 2 (1.645 + 2.33)2 (2.1)2
10.15 n≥ 2
= = 69.68 or n = 70
(µa − µn ) (129 − 130)2
10.17 β = P (fail to reject H0 given that Ha is true)
 
x̄ − 64
=P √ > −2.33|µ = 60
8/ 50
= P (x̄ > 61.3639|µ = 60)
 
x̄ − 60 61.3639 − 60
=P √ > √
8/ 50 8/ 50
= P (Z > 1.21) = 0.5 − 0.3869 = 0.1131

127
128 Chapter 10: Hypothesis Testing

10.19 Hypotheses: H0 : µ = 7 Ha : µ 6= 7
x̄ − µ0 6.8 − 7
Test Statistics: z = √ ≈ √ = −1.22
σ/ n 0.9/ 30
Rejection Region: |z| > z0.025 = 1.96
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the mean pH is significantly different from 7 at α = 0.05.
P-value: P (|Z| ≥ 1.22) = 2(P (Z ≥ 1.22)) = 2(0.5 − 0.3888) = 0.2224
10.21 Hypotheses: H0 : p ≥ 0.9 Ha : p < 0.9
p̂ − p0 35/40 − 0.9
Test Statistics: z = p =p = −0.53
p0 (1 − p0 )/n (0.9)(0.1)/40
Rejection Region: z < −z0.01 = −2.33
Conclusion: Fail to reject H0 at α = 0.01; i.e., there is insufficient evidence to conclude that
the specification is not being met at α = 0.01.
10.23 Summary Statistics: x̄ = 14, 510/6 = 2, 418.33
1/2
s = (35, 121, 500 − (14, 510)2 /6)/5 = 79.3515
Hypotheses: H0 : µ = 2500 Ha : µ < 2500
x̄ − µ0 2, 418.33 − 2, 500
Test Statistics: Assuming a normal population, t = √ = √
s/ n 79.3515/ 6
= −2.52
Rejection Region: t < −t0.01 = −3.365 (degrees of freedom = 5)
Conclusion: Fail to reject H0 at α = 0.01; i.e., there is insufficient evidence to conclude that
the mean range of the rockets is less than 2500 after storage.
10.25 Hypotheses: H0 : µ = 30 Ha : µ 6= 30
Test Statistics: Assuming that the stress resistance measurements are normally distributed,
x̄ − µ0 27.4 − 30
t= √ = √ = −7.47
s/ n 1.1/ 10
Rejection Region: |t| > t0.025 = 2.262 (degrees of freedom = 9)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to doubt the specification
for stress resistence of the plastic at α = 0.05.
10.27 Summary Statistics: x̄ = 34.26/10 = 3.426
1/2
s = (120.5412 − (34.26)2 /10)/9 = 0.5931
Hypotheses: H0 : µ = 3.5 Ha : µ 6= 3.5
x̄ − µ0
Test Statistics: Assuming the post-etch window widths are normally distributed, t = √ =
s/ n
3.426 − 3.5
√ = −0.3945
0.5931/ 10
Rejection Region: |t| > −t0.025 = 2.262 (degrees of freedom = 9)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the specifications are being violated at α = 0.05.
Chapter 10: Hypothesis Testing 129

10.29 Hypotheses: H0 : p ≤ 0.5


Ha : p > 0.5
p̂ − p0 0.53 − 0.5
Test Statistics: z=p =p = 1.77
p0 (1 − p0 )/n (0.5)(0.5)/871
P-value: P (Z ≥ 1.77) = 0.5 − 0.4616 = 0.0384
Conclusion: Since the P-value is small (i.e., less than, say, 0.05), we will reject H0 with a P-value
of 0.0384 and conclude that a majority of adults in Florida favor strong support of Israel.
10.31 Hypotheses: H0 : µ = 300, 000 Ha : µ 6= 300, 000

x̄ − µ0 295, 000 − 300, 000


Test Statistics: z= √ ≈ √ = −3.16
σ/ n 10, 000/ 40
Rejection Region: |z| > z0.05 = 1.645
Conclusion: Reject H0 at α = 0.1; i.e., there is sufficient evidence to conclude that the mean
tensile strength of the wire fails to meet the specifications.
10.33 Summary Statistics: x̄ = 57.5/9 = 6.3889
1/2
s = (368.93 − (57.5)2 /9)/8 = 0.4428
Hypotheses: H0 : µ = 6.5 Ha : µ 6= 6.5
Test Statistics: Assuming the pH measurements are normally distributed,

x̄ − µ0 6.3889 − 6.5
t= √ = √ = −0.753
s/ n 0.4428/ 9
Rejection Region: |t| > t0.025 = 2.306 (degrees of freedom = 8)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude the
mean pH is different from the claimed value of 6.5 at α = 0.05.
10.35 Hypotheses: H0 : σ 2 ≤ 100 Ha : σ 2 > 100
(n − 1)s2 14(12)2
Test Statistics: χ2 = 2 = = 20.16
σ0 100
Rejection Region: χ2 > χ20.05 = 23.6848 (degrees of freedom = 14)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the standard deviation of haul times exceeds the claimed value of 10 minutes at α = 0.05.

10.37 Hypotheses: H0 : σ 2 ≤ 400 Ha : σ 2 > 400


(n − 1)s2
Test Statistics: Assuming a normal population of ranges, χ2 =
σ02
5(79.3515)2
= = 78.71.
400
Rejection Region: χ2 > χ20.05 = 11.0705 (degrees of freedom = 5)
Conclusion: Reject H0 at α = 0.05; i.e., there is strong evidence that storage significantly
increases the variability of the ranges at α = 0.05.
130 Chapter 10: Hypothesis Testing

10.3 Hypothesis Testing: The Multiple-Sample Case


10.39 Hypotheses: H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0

x̄1 − x̄2 1.65 − 1.43


Test Statistics: z=p 2 2
=p = 3.65
σ1 /n1 + σ2 /n2 (0.26)2 /30 + (0.22)2 /35
Rejection Region: |z| > z0.005 = 2.575
Conclusion: Reject H0 at α = 0.01; i.e., there is sufficient evidence to conclude that the soils
significantly differ with respect to the mean shear strength at α = 0.01.
10.41 Hypotheses: H0 : µ1 − µ2 = 0 Ha : µ1 − µ2 6= 0
Test Statistics: Assume independent samples and a common normal distribution.
1/2
6(210)2 + 9(190)2

sp = = 198.2423
7 + 10 − 2
3520 − 3240
t= p = 0.10
198.2423 1/7 + 1/10
Rejection Region: |t| > t0.025 = 2.131 (degrees of freedom = 15)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the methods produce concrete with significantly different mean strengths at α = 0.05.

10.43 Summary Statistics: 1: Natives 2: Nonnatives


n 10 10
x̄ 87 77.3
s 3.2318 4.0291
Hypotheses: H0 : µ 1 − µ 2 = 0 Ha : µ 1 − µ 2 > 0
Test Statistics: Assume independent samples and a common normal population.
1/2
9(3.2318)2 + 9(4.0291)2

sp = = 3.6522
10 + 10 − 2
87 − 77.3
t= p = 5.94
3.6522 1/10 + 1/10
Rejection Region: t < −t0.05 = 1.734 (degrees of freedom = 18)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the
nonnative English speakers have a significantly smaller mean percentage of correct responses at
α = 0.05.

10.45 Hypotheses: H0 : σ12 = σ22 Ha : σ12 6= σ22


Test Statistics: Assuming normal populations, F = s22 /s21 = (16)2 /(15.3)2
= 1.09
Rejection Region: F < 1/F0.05 (5, 5) = 1/5.05 = 0.1980 or F > F0.05 (5, 5)
= 5.05
Conclusion: Fail to reject H0 at α = 0.10; i.e., there is insufficient evidence to conclude that
the variances significantly differ at α = 0.10.
Chapter 10: Hypothesis Testing 131

10.47 Hypotheses: H0 : µ1 − µ2 = 0 Ha : µ1 − µ2 6= 0 where ’1’ indicates ’original printer’ and ’2’


indicates ’modified printer’
Test Statistics: Assume a common normal population and independent samples.
 1/2
9(36) + 7(64)
sp = = 6.9462
10 + 8 − 2
98 − 94
t= p = 1.214
6.9462 1/10 + 1/8
The assumption of normality needs to be checked. For example, the exponential distribution may
be more appropriate.
Rejection Region: t > t0.01 = 2.583 (degrees of freedom = 10 + 8 − 2 = 16)
Conclusion: Fail to reject H0 at α = 0.01; i.e., there is insufficient evidence to conclude that
the mean time between failures (MTBF) for the modified printer is less than the MTBF for the
original printer at α = 0.01.

10.49 Hypotheses: H0 : µ 1 − µ 2 = 0 Ha : µ 1 − µ 2 > 0


1/2
Test Statistics: With sp = (9(0.03)2 + 9(0.02)2 )/18 = 0.02550 and assuming a common
normal distribution for the resistances,
0.19 − 0.11
t= r = 7.015.
1 1
0.02550 +
10 10
Rejection Region: t > t0.10 = 1.330 (d.f. = 10 + 10 − 2 = 18)
Conclusion: Reject H0 at α = 0.10; i.e., there is sufficient evidence to conclude that alloying
significantly reduces the mean resistance in the wire at α = 0.10.

10.51 Summary Statistics: 1: first group of rackets; 2: second group of rackets


n1 = 6, x̄1 = 2, 418.33, s1 = 79.3515; n2 = 6, x̄2 = 2, 368.33, s2 = 76.7898;
sp = 78.0812.
Hypotheses: H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0
2418.33 − 2368.33
Test Statistics: t = r = 1.109
1 1
78.0812 +
6 6
Rejection Region: |t| > t0.025 = 2.228 (degrees of freedom = 6 + 6 − 2 = 10)
Conclusion: Fail to reject H0 at α = 0.05; i.e. there is insufficient evidence to conclude that
the storage methods produce significantly different mean ranges at α = 0.05.

10.53 Since the brands of gasoline are common to both automobiles, we block on the brands and analyze
the experiment as a paired sample experiment. We compute the differences for auto A minus auto
B: −0.9, −1, 0.9, 0.7, −0.2.
Summary Statistics: x̄D = −0.1, sD = 0.8803, n = 5
Hypotheses: H0 : µ D = 0 Ha : µD 6= 0
Test Statistics: Assuming a normal population of differences in gas mileage,
−0.1
t= √ = −0.254.
0.8803/ 5
Rejection Region : |t| > t0.025 = 2.776 (degrees of freedom = 4)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude a
significant difference between mean mileage figures for the two automobiles at α = 0.05.
132 Chapter 10: Hypothesis Testing

10.55 Since the type of powder is common to both procedures, we block on powder type and analyze
the experiment as a paired sample experiment. The differences for procedure I minus procedure
II are: −2, 1, −3, −2, 1, 3.
Summary Statistics: x̄D = −1/3, sD = 2.3381, n=6
Hypotheses: H0 : µD = 0 Ha : µD 6= 0
Test Statistics: Assuming a normal population for the porosity differences,
−0.3333
t= √ = −0.349.
2.3381/ 6
Rejection Region: |t| > t0.025 = 2.571 (degrees of freedom = 5)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the mean porosities of the two procedures significantly differ at α = 0.05.
10.57 Hypotheses: H0 : σ12 = σ22 Ha : σ12 6= σ22
Test Statistics: Assuming normal populations,
F = s21 /s22 = (79.3515)2 /(76.7898)2 = 1.07.
Rejection Region: F > F0.05 (5, 5) = 5.05 or F < 1/F0.05 (5, 5) = 1/5.05
= 0.198
Conclusion: Fail to reject H0 at α = 0.10; i.e., there is insufficient evidence to conclude that the
variance among range measurements significantly differs for the two storage methods at α = 0.10.
10.59 Hypotheses: H0 : µd = 0 and Ha : µd > 0
Since the data are paired, we should calculate the difference between each pair of measurements:
217 − 95 = 122
252 − 107 = 145
269 − 109 = 160
271 − 113 = 158
291 − 115 = 176
291 − 118 = 173
291 − 118 = 173
293 − 119 = 174
311 − 119 = 192
Total 1473
From this new data set we get:
x̄d = 163.67 sd = 20.51
Assuming that the populations of difference meausrements are normally distributed, we can cal-
culate the t-score for our data:
x̄d − µd 163.67 − 0
t= √ = √ = 23.94
sd / n 20.51/ 9
With 9 − 1 = 8 degrees of freedom our test statistic has a p-value of p = P (T > 23.94) ≈ 0
Because the p-value ≈ 0 is smaller than α = 0.05, reject the null hypothesis, as there is sufficient
evidence to conclude that, at the 5% significance level, the mean difference in the number of FETI
iterations without reconjugation and those with reconjugation is different from 0.
Chapter 10: Hypothesis Testing 133

10.61 Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0


Assuming the populations are normally distributed, we calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (1200 − 1526.67) − 0)
t= q 2 = q = −0.779
s1 s22 1142.232 196.32
n1 + n2 8 + 3
Degrees of freedom:
 2 2 2
s22

s1 1142.232 196.32
n1 + n2 8 + 3
v = (s2 /n )2 (s22 /n2 )2
= (1142.232 /8)2 (196.32 /3)2
= 7.973 ≈ 8
1 1
n1 −1 + n2 −1 8−1 + 3−1
Our test statistic has a p-value of p = 2P (T < −0.779) = 2(0.2292) = 0.4584
Because the p-value of 0.4584 is larger than α = 0.05, do not reject the null hypothesis, as there
is not sufficient evidence to conclude that, at the 5% significance level, there is a difference in the
mean daily traffic on bridges with different numbers of spans.
10.63 Hypotheses: H0 : µd = 0 and Ha : µd 6= 0
Since the data are paired, we should calculate the difference between each data pair. From this
new data set, we find that:
x̄d = −0.277 sd = 0.0344
Assuming that the population of differences is normally distributed, we can caluclate our test
statistic:
x̄d − µd −0.277 − 0
t= √ = √ = −26.75
sd / n 0.0344/ 11
With 11 − 1 = 10 degrees of freedom, our test statistic has a p-value of: p = 2P (T < −26.75) ≈ 0
Because p ≈ 0 is smaller than α = 0.05, reject the null hypothesis, as there is sufficient evidence
to conclude that, at the 5% significance level, the mean difference in the percentage of deviation
at 1400◦ C calibration and the deviation at 1300◦ C calibration is different from 0.
134 Chapter 10: Hypothesis Testing

10.4 χ2 Tests on Frequency Data


10.65 Design and Construction executives holding a Ph.D. in business were fewer than 5, so we must
group some of the categories together in order to use the χ2 test for independence.

Engineering Business Other Total


BS 172 28 50 250
M.S. or Ph.D. 62 57 21 140
Total 234 85 71 390
We can also calculate the expected value for each entry:

Engineering Business Other


BS (234)(250)/390 = 150 (85)(250)/390 = 54.49 (71)(250)/390 = 45.51
M.S. or Ph.D. (234)(140)/390 = 84 (85)(140)/390 = 30.51 (71)(140)/390 = 25.49
Hypotheses: H0 : Degree is independent of type of degree; and Ha : Degree is not independent of
type of degree.
Our test statistic can be calculated as:
c X r
X (Xij − E(Xij ))2
X2 =
j=1 i=1
E(Xij )
(172 − 150)2 (21 − 25.49)2
= + ... + = 46.09
150 25.49
With (2 − 1)(3 − 1) = 2 degrees of freedom, we can calculate the p-value of our test statistic:
p = P (χ2 (2) > 46.09) ≈ 0
Because p ≈ 0 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence to
conclude that, at the 5% level of significance, the degree earned is not independent of the type of
degree.
10.67 Hypotheses: H0 : pS = 0.5, pL = 0.25, pR = 0.25
Ha : At least one equality fails to hold
Test Statistics: Observed and expected cell counts are listed in the table below.
Totals
ni 28 12 10 50
E(ni ) 25 12.5 12.5 50
(28 − 25)2 (12 − 12.5)2 (10 − 12.5)2
χ2 = + + = 0.88
25 12.5 12.5
Rejection Region: χ2 > χ20.10 = 4.60517 (degrees of freedom = 2)
Conclusion: Fail to reject H0 at α = 0.10; i.e., there is insufficient evidence to conclude that
the proportions differ from those stated in the null hypothesis at α = 0.10.
Chapter 10: Hypothesis Testing 135

10.69 Hypotheses: H 0 : p 1 = p2 Ha : p1 6= p2
Test Statistics: Observed and expected (values in parentheses) cell counts are given in the table
below.
Inspector
A B Totals
Top category 18(19) 20(19) 38
Lower category 7(6) 5(6) 12
Totals 25 25 50
(18 − 19)2 (5 − 6)2
χ2 = + ··· + = 0.44
19 6
2 2
Rejection Region: χ > χ0.05 = 3.84146
(degrees of freedom = (2−1)(2−1) = 1)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the inspectors significantly differ in their assessments at α = 0.05.
10.71 a We can calculate the expected value for each entry:

Athletic Involvement
None 1-3 Semesters 4+ Semesters
GPA Below Mean (528)(426)/852 = 264 (219)(426) = 109.5 (105)(426) = 52.5
GPA Above Mean (528)(426)/852 = 264 (219)(426) = 109.5 (105)(426) = 52.5
Hypotheses: H0 : GPA is independent of length of athletic involvment; and Ha : GPA is not
independent of length of athletic involvement.
Our test statistic can be calculated as:
c X r
X (Xij − E(Xij ))2
X2 =
j=1 i=1
E(Xij )
(290 − 264)2 (63 − 52.5)2
= + ... + = 13.71
264 52.5
With (2 − 1)(3 − 1) = 2 degrees of freedom, we can calculate the p-value of our test statistic:
p = P (χ2 (2) > 13.71) = 0.0011
Because p = 0.0011 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence
to conclude that, at the 5% level of significance, GPA is not independent of the length of athletic
involvement.
b Hypotheses: H0 : p1 = p2 = 0.5 and Ha : p1 6= p2
We can calculate our test statistic:
c X r
X (Xij − E(Xij ))2
X2 =
j=1 i=1
E(Xij )
(42 − 105(0.5))2 (63 − 105(0.5))2
= + = 4.2
105(0.5) 105(0.5)
With 2 − 1 = 1 degrees of freedom, we can calculate the p-value of our test statistic:
p = P (χ2 (1) > 4.2) = 0.0404
Because p = 0.0404 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence
to conclude that, at the 5% level of significance, for students with 4 or more semesters of athletic
involvement, the proportion of students with GPA below the mean is different than the proportion
of students with GPA above the mean.
136 Chapter 10: Hypothesis Testing

10.73 Hypotheses: H0 : Sex and opinion are independent


Ha : Sex and opinion are not independent
Test Statistics: Observed and expected (values in parentheses) cell frequencies are presented in
the following table.
Favor Oppose Undecided Totals
Male 31(35.41) 44(38.80) 6(6.79) 81
Female 42(37.59) 36(41.20) 8(7.21) 86
Totals 73 80 14 167
(31 − 35.41)2 (8 − 7.21)2
χ2 = + ··· + = 2.60
35.41 7.21
Rejection Region: χ2 > χ20.05 = 5.99147
(degrees of freedom = (2 − 1)(3 − 1) = 2)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
sex and opinion are dependent at α = 0.05.

10.75 Hypotheses: H0 : p2003 = p2007 and Ha : p2003 6= p2007


The numbers of people who feared aerosol effects on the ozone in 2003 and 2007 can be calculated
as:
Y2003 = 0.35(300) = 105 Y2007 = 0.29(300) = 87
Y = Y2003 + Y2007 = 105 + 87 = 192 n = n2003 + n2007 = 300 + 300 = 600
Our test statistic can be calculated as:
k 
(Yi − ni Y /n)2 ((ni − Yi ) − ni (n − Y )/n)2
X 
2
X = +
i=1
ni Y /n ni (n − Y )/n
2
((300 − 105) − 300(600 − 192)/600)2
 
(105 − 300(192)/600)
= +
300(192)/600 300(600 − 192)/600
(87 − 300(192)/600)2 ((300 − 87) − 300(600 − 192)/600)2
 
+ +
300(192)/600 300(600 − 192)/600
= 2.482
With 2 − 1 = 1 degree of freedom, our p-value can be calculated from our test statistic as:
p = P (χ2 (1) > 2.482) = 0.1152
Because p = 0.1152 is greater than α = 0.01, do not reject the null hypothesis, as there is not
sufficient evidence to conclude that, at the 1% level of significance, there is a difference in the
proportion of people who feared aerosol effects on the ozone in 2003 and in 2007.
Chapter 10: Hypothesis Testing 137

10.77 Hypotheses: H0 : p1 = · · · = p7 = 0.012 Ha : At least one inequality holds


Test Statistics: Observed and expected (values in parentheses) cell frequencies are presented in
the following table.
Failures Successes
1:0-6 1(0.34) 27(27.66)
2:6-12 3(1.10) 89(90.90)
3:12-18 1(3.24) 269(266.76)
4:18-24 9(8.42) 693(693.58)
5:24-30 10(7.79) 639(641.21)
6:30-36 0(1.61) 134(132.39)
7:>36 0(1.12) 93(91.88)
(1 − 0.34)2 (93 − 91.88)2
χ2 = + ··· + = 9.62
0.34 91.88
Rejection Region: χ2 > χ20.05 = 12.5916
(degrees of freedom = (2 − 1)(7 − 1) = 6)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence that the proportions
significantly differ at α = 0.05.

10.6 Using Computer Programs to Fit Distributions


10.79 Since we do not know λ, we must estimate it from the sample data: λ̂ = x̄ = 3.98
Since the 0 and 1 groups contain less than 5 together, we must group them with the next category:
Number of Defects Frequency
≤2 8
3 10
4 14
5 8
≥6 10
Hypotheses: H0 : Data are modeled by a poisson distribution P (x) = λx e−λ /x;
Ha : Data are not modeled by a poisson distribution.
Using a poisson distribution, we can calculate the expected value of each outcome:
Number of Defects Expected Frequency
≤2 12.05
3 9.82
4 9.77
5 7.78
≥6 10.59
Since the sample size is large (n=50), we can compute our test statistic:
X (Fi − E(yi ))2
X2 =
E(yi )
(8 − 12.05)2 (10 − 10.58)2
= + ... + = 3.239
12.05 10.58
Since we had to estimate λ, we have 5 − 1 − 1 = 3 degrees of freedom, and our p-value can be
calculated as:
p = P (χ2 (3) > 3.239) = 0.356
Because p = 0.356 is greater than α = 0.05, do not reject the null hypothesis, as there is not
enough evidence to show that the data did not come from a Poisson distribution.
138 Chapter 10: Hypothesis Testing

10.81 Since we do not know λ, we must estimate it from the sample data: λ̂ = x̄ = 0.483
Since several groups have less than 5, we must group 8, 7, 6, 5 and 4 together into one group, ≥ 4
with 10 observations.
Hypotheses: H0 : The data are modeled by a poisson distribution;
Ha : The data are not modeled by a poisson distribution.
Using a poisson distribution, we can calculate the expected value of each outcome:
0 255.39
1 123.37
2 29.80
3 4.80
≥4 0.64
Since the sample size is large (n=414), we can compute our test statistic:
X (Fi − E(yi ))2
X2 =
E(yi )
(296 − 255.39)2 (10 − 0.64)2
= + ... + = 165.63
255.39 0.64
Since we had to estimate λ, we have 5 − 1 − 1 = 3 degrees of freedom, and our p-value can be
calculated as:
p = P (χ2 (3) > 165.63) ≈ 0
Because p ≈ 0 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence to
show that the data did not come from a Poisson distribution.
10.83 a The Kolmogorov-Smirnov test is appropriate. The maximum-likelihood estimate of θ from an
exponential distribution is θ̂ = x̄ = 187.8/16 = 11.7375.
i i−1
xi F (xi ) i/n − F (xi ) F (xi ) −
n n

8.9 0.5315 0.0625 -0.4690 0.5315


9.0 0.5355 0.1250 -0.4105 0.4730
9.3 0.5472 0.1875 -0.3597 0.4222
9.5 0.5549 0.2500 -0.3049 0.3674
9.8 0.5661 0.3125 -0.2536 0.3161
10.0 0.5734 0.3750 -0.1984 0.2609
10.1 0.5770 0.4375 -0.1395 0.2020
10.2 0.5806 0.5000 -0.0806 0.1431
10.5 0.5912 0.5625 -0.0287 0.0912
10.6 0.5947 0.6250 0.0303 0.0322
11.1 0.6116 0.6875 0.0759 -0.0134
13.6 0.6861 0.7500 0.0639 -0.0014
14.2 0.7017 0.8125 0.1108 -0.0483
16.1 0.7463 0.8750 0.1287 -0.0662
16.8 0.7610 0.9375 0.1765 -0.1140
18.1 0.7861 1.0000 0.2139 -0.1514
where F (x) = 1 − e−x/11.7375 .
Hypotheses: H0 : Population has exponential distribution
Ha : Population does not have an exponential distribution
Test Statistics: D = max (D+ , D− ) = max (0.2139, 0.5315) = 0.5315
Chapter 10: Hypothesis Testing 139


  
0.2 0.5
modified D = D − 16 + 0.26 + √ = 2.28
16 16
Rejection Region: modified D > 1.094 (from Table 8.5)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the
exponential distribution is inadequate at α = 0.05.
b Using F (x) = 1 − e−x/12 , we have,
i i−1
i − F (xi ) F (xi ) −
n n

1 -0.4612 0.5237
2 -0.4026 0.4651
3 -0.3518 0.4143
4 -0.2969 0.3594
5 -0.2456 0.3081
6 -0.1904 0.2529
7 -0.1315 0.1940
8 -0.0726 0.1351
9 -0.0206 0.0831
10 0.0384 0.0241
11 0.0840 -0.0215
12 0.0820 -0.0095
13 0.1188 -0.0563
14 0.1364 -0.0739
15 0.1841 -0.1216
16 0.2213 -0.1588
Hypotheses: H0 : Population has an exponential (θ = 12) distribution
Ha : Population does not have an exponential (θ = 12) distribution
Test Statistics: D = max (D+ , D− ) = max (0.2213, 0.5237) = 0.5237

 
0.11
modified D = 0.5237 16 + 0.12 + √
16
= 2.1720
Rejection Region: modified D > 1.358 (from Table 8.5)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the
exponential (θ = 12) model is inadequate at α = 0.05.
10.85 We wish to test the fit of a Weibull (γ = 2) distribution. Equivalently, we will work with yi = x2i
so that y will have an exponential (θ) distribution. The maximum-likelihood estimate of θ is given
by θ̂ = ȳ = 54, 144.13 and F (y) = 1 − e−y/54,144.13 .
140 Chapter 10: Hypothesis Testing

i i−1
yi F (yi ) i/n − F (yi ) F (yi ) −
n n

23317.29 0.3499 0.1 -0.2499 0.3499


29584.00 0.4210 0.2 -0.2210 0.3210
29756.25 0.4228 0.3 -0.1228 0.2228
30032.89 0.4257 0.4 -0.0257 0.1257
37249.00 0.4974 0.5 0.0026 0.0974
41902.09 0.5388 0.6 0.0612 0.0388
46872.25 0.5792 0.7 0.1208 -0.0208
55178.01 0.6391 0.8 0.1609 -0.0609
68958.76 0.7202 0.9 0.1798 -0.0798
178590.76 0.9631 1.0 0.0369 0.0631

Hypotheses: H0 : Population has a Weibull (γ = 2) distribution


Ha : Population does not have a Weibull (γ = 2) distribution
Test Statistics: D = max (D+ , D− ) = max (0.1798, 0.3499) = 0.3499

  
0.2 0.5
modified D = 0.3499 − 10 + 0.26 + √ = 1.1812
10 10
Rejection Region: modified D > 1.094 (from Table 8.5)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the Weibull
(γ = 2) model is inadequate at α = 0.05.
10.87 We use the Kolmogorov-Smirnov test. The maximum-likelihood estimate for θ is x̄ = 59.6.
Chapter 10: Hypothesis Testing 141

i i−1
yi F (yi ) i/n − F (yi ) F (yi ) −
n n

1 0.0166 0.0333 0.0167 0.0166


3 0.0491 0.0667 0.0176 0.0158
5 0.0805 0.1000 0.0195 0.0138
7 0.1108 0.1333 0.0225 0.0108
11 0.1685 0.1667 -0.0019 0.0352
11 0.1685 0.2000 0.0315 0.0019
11 0.1685 0.2333 0.0648 -0.0315
12 0.1824 0.2667 0.0843 -0.0510
14 0.2093 0.3000 0.0907 -0.0573
14 0.2093 0.3333 0.1240 -0.0907
14 0.2093 0.3667 0.1573 -0.1240
16 0.2354 0.4000 0.1646 -0.1312
16 0.2354 0.4333 0.1979 -0.1646
20 0.2851 0.4667 0.1816 -0.1483
21 0.2970 0.5000 0.2030 -0.1697
23 0.3202 0.5333 0.2132 -0.1798
42 0.5057 0.5667 0.0609 -0.0276
47 0.5455 0.6000 0.0545 -0.0212
52 0.5821 0.6333 0.0512 -0.0179
62 0.6466 0.6667 0.0200 0.0133
71 0.6962 0.7000 0.0038 0.0295
71 0.6962 0.7333 0.0372 -0.0038
87 0.7677 0.7667 -0.0010 0.0344
90 0.7791 0.8000 0.0209 0.0124
95 0.7969 0.8333 0.0365 -0.0031
120 0.8665 0.8667 0.0002 0.0331
120 0.8665 0.9000 0.0335 -0.0002
225 0.9771 0.9333 -0.0437 0.0771
246 0.9839 0.9667 -0.0172 0.0505
261 0.9875 1.0000 0.0125 0.0208

Hypotheses: H0 : Population has an exponential distribution


Ha : Population does not have an exponential distribution
Test Statistics: D = max (D+ , D− ) = max (0.2132, 0.0771) = 0.2132

  
0.2 0.5
modified D = 0.2132 − 30 + 0.26 + √
30 30
Rejection Region: modified D > 1.094 (from Table 8.5)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the
exponential model is inadequate at α = 0.05.
Normal scores plot for time between failures:
142 Chapter 10: Hypothesis Testing

The plot does not follow a straight line.

10.7 Acceptance Sampling


10.89 From Table 13, a lot size of 3,000 and a level II plan give a code letter K.
Sample size Acceptance number
(a) 125 10
(b) 125 8
(c) 50 5

20 − 19.2
10.91 Since = 2.67 > 1.57, we accept the lot.
0.3

10.8 Supplementary Exercises


10.93 Hypotheses: H0 : µ = 69 Ha : µ < 69
64.3 − 69
Test Statistics: z = √ = −24.77
1.2 40
P-value: P (Z < −24.77) ≈ 0
Conclusion: Since the P-value is so small, we have sufficient evidence to reject H0 and conclude
that the mean battery weight is significantly less than 69. The P-value gives strong evidence that
something is wrong with the production process.
10.95 Hypotheses: H0 : µ ≥ 215 Ha : µ < 215
210 − 215
Test Statistics: z = √ = −1.964
18/ 50
P-value: P (Z < −1.964) = 0.025
Conclusion: Since the P-value is small (say, less than 0.05), we reject H0 and conclude that
the mean breaking strength of the garments is significantly less than 215. The threads probably
should not be used on the garments.
Chapter 10: Hypothesis Testing 143

10.97 Hypotheses: H0 : p ≤ 0.08 Ha : p > 0.08


0.12 − 0.08
Test Statistics: z = p = 1.47
(0.08)(0.92)/100
Rejection Region: z > z0.10 = 1.28
Conclusion: Reject H0 at α = 0.10; i.e., there is sufficient evidence to conclude that significantly
more than 8% of the resistors fail to meet the tolerance specifications at α = 0.10. Thus, we have
evidence against the company’s claim.

10.99 Hypotheses: H0 : µ = 10 Ha : µ 6= 10
9.8 − 10
Test Statistics: Assuming a normal population, t = √ = −1.55.
0.5/ 15
P-value: Let p = P (|t| > 1.55); then p/2 = P (t > 1.55), and hence 0.05 < p/2 < 0.1. Thus,
0.1 < p < 0.2 (degrees of freedom = 14).
Conclusion: Since the P-value is large (say, greater than 0.05), we fail to reject H0 and conclude
that the mean is not significantly different from 10. A two-tailed test is appropriate because the
specification calls for producing exactly 10-ohm resistors and deviations from this specification
should be detected in both directions.
10.101 Hypotheses: H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0
92 − 98
Test Statistics: z = p = −5.60
20/50 + 30/40
Rejection Region: |z| > z0.025 = 1.96
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the mean
resistance to abrasion differs for the two coupling agents at α = 0.05.

10.103 Hypotheses: H 0 : p 1 − p2 = 0
Ha : p1 − p2 > 0 where ’1’ indicates ’before chemical’ and ’2’ indicates ’after chemical’
43 22
Test Statistics: p̂1 = = 0.86 p̂2 = = 0.44
50 50
0.86 − 0.44
z=r = 4.91
(0.86)(0.14) (0.44)(0.56)
+
50 50
Rejection Region: z > z0.025 = 1.96
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the chemical
significantly reduces the number of samples containing the harmful bacteria at α = 0.025.
10.105 Hypotheses: H0 : µ 1 − µ 2 = 0
Ha : µ1 − µ2 6= 0 where ’1’ denotes ’wood’ and ’2’ denotes ’graphite’
Test Statistics: Assume normal populations with equal variances.
1/2
2(0.2)2 + 2(0.07)2

sp = = 0.05148
3+3−2
2.41 − 2.22
t= r = 4.52
1 1
0.05148 +
3 3
Rejection Region: |t| > t0.025 = 2.776 (degrees of freedom = 3 + 3 − 2 = 4)
Conclusion: Reject H0 at α = 0.05, i.e., there is sufficient evidence to conclude that the mean
impulses differ for the two rackets at α = 0.05.
144 Chapter 10: Hypothesis Testing

10.107 Hypotheses: H0 : Plant species is independent of the species of the nearest neighbor
Ha : Plant species is not independent of the species of the nearest neighbor
a Test Statistics: Observed and expected (values in parentheses) cell frequencies are presented in
the following table.
Nearest Neighbor
A B Totals
A 20(13.44) 4(10.56) 24
Plant
B 8(14.56) 18(11.44) 26
Totals 28 22 50
(20 − 13.44)2 (19 − 11.44)2
χ2 = + ··· + = 13.99
13.44 11.44
Rejection Region: χ2 > χ20.05 = 3.84146
(degrees of freedom = (2 − 1)(2 − 1) = 1)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that plant species
and nearest neighbor plant species are dependent at α = 0.05.
b The test statistic is χ2 = 13.99 and the same rejection region and conclusion hold as in part (a).
c Test Statistics: Observed and expected (values in parentheses) cell frequencies are presented in
the following table.
Nearest Neighbor
A B Totals
Sampled A 20(13.44) 4(5.76) 24
Plant B 18(14.56) 8(6.24) 26
Totals 28 12 50
(20 − 18.24)2 (8 − 6.24)2
χ2 = + ··· + = 5.05
18.24 6.24
Rejection Region: χ2 > χ20.05 = 3.84146
(degrees of freedom = (2 − 1)(2 − 1) = 1)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that plant species
and nearest neighbor plant species are dependent at α = 0.05.
Chapter 10: Hypothesis Testing 145

10.109 Hypotheses: H0 : p1 = p2 Ha : p1 6= p2
Test Statistic: Observed and expected (values in parentheses) cell counts are listed in the table
below. 1 = Response to T-intersection question and 2 = Response to four-legged-intersection
question.
1
Correct Incorrect Totals
Correct 141(90.4394) 145(195.5606) 286
2 Incorrect 13(63.5606) 188(137.4394) 201
Totals 154 333 487
P (Xij − E(Xij ))2
x2 =
ij E(Xij )
(141 − 90.4394)2 (145 − 195.5606)2 (13 − 63.5606)2 (188 − 137.4394)2
= + + +
90.4394 195.5606 63.5606 137.4394
= 100.1577
Rejection Region: x2 > x20.05 = 3.84146
(degrees of freedom = (2 − 1)(2 − 1) = 1)
Conclusion: Reject H0 at α = 0.05; i.e., knowledge of T-intersections is not independent of
four-legged intersections.
10.111 a Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0
We can calculate our sample means and sample standard deviations:
x̄1 = 0.0837 x̄2 = 0.1017
s1 = 0.0218 s2 = 0.0120
Assuming that the populations are normally distributed, we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (0.0837 − 0.1017) − 0)
t= q 2 = q = −1.251
s1 s22 0.02182 0.01202
n1 + n2 3 + 3
Degrees of freedom:
 2 2 2
s22

s1 0.02182 0.01202
n1 + n2 3 + 3
v = (s2 /n )2 (s22 /n2 )2
= (0.02182 /3)2 (0.01202 /3)2
= 3.11 ≈ 4
1 1
n1 −1 + n2 −1 3−1 + 3−1
Our test statistic has a p-value of p = 2P (T < −1.251) = 2(0.1396) = 0.2792
Because the p-value of 0.2792 is greater than α = 0.05, do not reject the null hypothesis, as there
is not sufficient evidence to conclude that, at the 5% significance level, there is a difference in the
mean distances for BDC and MR semiactive dampers.
146 Chapter 10: Hypothesis Testing

b Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0
We can calculate our sample means and sample standard deviations:
x̄1 = 0.1510 x̄2 = 0.1703
s1 = 0.0475 s2 = 0.0506
Assuming that the populations are normally distributed, we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (0.1510 − 0.1703) − 0)
t= q 2 = q = −0.4824
s1 s22 0.04752 0.05062
n1 + n2 3 + 3
Degrees of freedom:
 2 2 2
s22

s1 0.04752 0.05062
n1 + n2 3 + 3
v = (s2 /n )2 (s22 /n2 )2
= (0.04752 /3)2 (0.05062 /3)2
= 3.98 ≈ 4
1 1
n1 −1 + n2 −1 3−1 + 3−1
Our test statistic has a p-value of p = 2P (T < −0.4824) = 2(0.3274) = 0.6548
Because the p-value of 0.6548 is greater than α = 0.05, do not reject the null hypothesis, as there
is not sufficient evidence to conclude that, at the 5% significance level, there is a difference in the
mean interstory drifts for BDC and MR dampers.
c Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0
We can calculate our sample means and sample standard deviations:
x̄1 = 3.067 x̄2 = 7.127
s1 = 0.9556 s2 = 0.2307
Assuming that the populations are normally distributed, we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (3.067 − 7.127) − 0)
t= q 2 = q = −7.1531
s1 s22 0.95562 0.23072
n1 + n2 3 + 3
Degrees of freedom:
 2 2 2
s22

s1 0.95562 0.23072
n1 + n2 3 + 3
v = (s2 /n )2 (s22 /n2 )2
= (0.95562 /3)2 (0.23072 /3)2
= 2.23 ≈ 3
1 1
n1 −1 + n2 −1 3−1 + 3−1
Our test statistic has a p-value of p = 2P (T < −7.1531) = 2(0.001) = 0.002
Because the p-value of 0.002 is smaller than α = 0.05, reject the null hypothesis, as there is
sufficient evidence to conclude that, at the 5% significance level, there is a difference in the mean
absolute accelerations measured for BDC and MR dampers.
10.113 a Hypotheses: H0 : µd = 0 and Ha : µd > 0
Since the data are paired, we should calculate the difference between each data pair. From this
new data set, we find that:
x̄d = 15.41 sd = 11.71
Assuming that the population of differences is normally distributed, we can calculate our test
statistic:
x̄d − µd 15.41 − 0
t= √ = √ = 3.482
sd / n 11.71/ 7
With 7 − 1 = 6 degrees of freedom, our test statistic has a p-value of: p = P (T > 3.482) = 0.0066
Because p = 0.0066 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence
to conclude that, at the 5% significance level, the mean difference in the load at first yield and
the load at the first traverse crack is greater than 0.
Chapter 10: Hypothesis Testing 147

10.115 a Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0


Because the sample sizes are large, the sampling distributions will be approximately normally
distributed, and we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (53.05 − 56.97) − 0)
t= q 2 = q = −2.325
s1 s22 9.662 8.392
n1 + n2 53 + 65
Degrees of freedom:
 2 2 2
s22

s1 9.662 8.392
n1 + n2 53 + 65
v = (s2 /n )2 (s22 /n2 )2
= (9.662 /53)2 (8.392 /65)2
= 103.75 ≈ 104
1 1
n1 −1 + n2 −1 53−1 + 65−1
Our test statistic has a p-value of p = 2P (T < −2.325) = 2(0.011) = 0.022
Because the p-value of 0.022 is smaller than α = 0.05, reject the null hypothesis, as there is
sufficient evidence to conclude that, at the 5% significance level, there is a difference in the mean
pretest scores from San Bernardino and Orange Counties.
b Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0
Because the sample sizes are large, the sampling distributions will be approximately normally
distributed, and we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (74.39 − 76.28) − 0)
t= q 2 = q = −1.294
s1 s22 8.342 7.312
n1 + n2 53 + 65
Degrees of freedom:
 2 2 2
s22

s1 8.342 7.312
n1 + n2 53 + 65
v = (s2 /n )2 (s22 /n2 )2
= (8.342 /53)2 (7.312 /65)2
= 104.3 ≈ 105
1 1
n1 −1 + n2 −1 53−1 + 65−1
Our test statistic has a p-value of p = 2P (T < −1.294) = 2(0.0993) = 0.1986
Because the p-value of 0.1986 is larger than α = 0.05, do not reject the null hypothesis, as there
is not sufficient evidence to conclude that, at the 5% significance level, there is a difference in the
mean final test scores from San Bernardino and Orange Counties.
c There is not enough information. The data from pretest and final test scores come from the same
group of people. Since the groups are not independent, we must use the paired samples technique,
which requires that we be able to find the differences in the individual data points.
d Yes. The techniques we use work well for randomly selected samples. If the engineers self-selected
(or by some other means) to take this training, then we cannot conclude that this is representative
of all engineers in San Bernardino and Orange counties.
e Yes. If these represent the population means, then hypothesis testing is unnecessary. It would be
clear that both mean pretest and mean final test results were higher in Orange county than in
San Bernardino.
148 Chapter 11: Inference For Regression Parameters
Chapter 11

Inference For Regression Parameters

11.1 Regression Models with One Predictor Variable


x2i = 91
P P P
11.1 a xi = 21 xi yi = 78
yi2 = 68
P P
yi = 18 n=6
(21)2 (18)2
SSxx = 91 − = 17.5 SSyy = 68 − = 14
6 6
(21)(18) SSxy 15 6
SSxy = 78 − = 15 β̂1 = = = = 0.8571
6 SSxx 17.5 7
  
18 6 21
β̂0 = ȳ − β̂1 x̄ = − =0
6 7 6
b Scatterplot of y vs x:

The least-squares line does not pass through the data points.

149
150 Chapter 11: Inference For Regression Parameters

P P 2 P
11.3 xi = 10 x = 22.5 xi yi = 61.95
P P 2 i
yi = 27.5 yi = 170.77 n=5
2
(10) (27.5)2
SSxx = 22.5 − = 2.5 SSyy = 170 − = 19.52
5 5
(10)(27.5) SSxy 6.95
SSxy = 61.95 − = 6.95 β̂1 = = = 2.78.
5 SSxx 2.5
 
27.5 10
β̂0 = ȳ − β̂1 x̄ = − (2.78) = −0.06
5 5
Least-squares line: ŷ = −0.06 + (2.78)x
P P 2 P
11.5 xi = 556 xi = 39, 080 xi yi = 120, 399
P P 2
yi = 1, 756 yi = 391, 720 n=8
(556)2 (1, 756)2
SSxx = 39, 080 − = 438 SSyy = 391, 720 − = 6, 278
8 8
(556)(1, 756) SSxy −1, 643
SSxy = 120, 399 − = −1, 643 β̂1 = = = −3.7511
8 SSxx 438
 
(1, 756) 556
β̂0 = ȳ − β̂1 x̄ = − (−3.7511) = 480.2043
8 8
Least-squares line: ŷ = 480.2043 − (3.7511)x
P P 2 P
11.7 xi = 180 xi = 2, 900 xi yi = 7, 420
P P 2
yi = 469 yi = 19, 313 n = 12
(180)2 (469)2
SSxx = 2, 900 − = 200 SSyy = 19, 313 − = 982.9167
12 12
(180)(469) SSxy 385
SSxy = 7, 420 − = 385 β̂1 = = = 1.925
12 SSxx 200
 
(469) 180
β̂0 = ȳ − β̂1 x̄ = − (1.925) = 10.2083
12 12
Least-squares line: ŷ = 10.2083 + 1.925x

11.2 The Probability Distribution of the Random Error Com-


ponent
x̄ = 3.5 ȳ = 3
11.9
SSxx = 17.5 SSxy = 15 SSyy = 14
SSE = SSyy − β̂1 (SSxy ) = 14 − (15/17.5)(15) = 1.143
SSE 1.143
s2 = = = 0.286
n−2 6−2
x̄ = 21.2831 ȳ = 1.157
11.11
SSxx = 1434.4 SSxy = 45.38 SSyy = 2.09
SSE = SSyy − β̂1 (SSxy ) = 2.09 − (45.38/1434.4)(45.38) = 0.654
SSE 0.654
s2 = = = 0.0242
n−2 29 − 2

s = 0.0242 = 0.1557
This value is, on average, how far the observed unleaded gas price is from the predicted value for
a given crude price.
Chapter 11: Inference For Regression Parameters 151

ȳ = 9.448 x̄ = 15.504
11.13
SSyy = 1101.2 SSxy = 1546.6 SSxx = 2360.2
SSE = SSyy − β̂1 (SSxy ) = 1101.2 − (1546.6/2360.2)(1546.6) = 87.74
SSE 87.74
s2 = = = 10.97
n−2 10 − 2

s = 10.97 = 3.312
This value is, on average, how far the observed flowthrough LC50 is from the predicted value for
a given static LC50.

11.3 Making Inferences about Slope β1


11.15 Hypotheses: H0 : β1 = 0 Ha : β1 6= 0
Test Statistics: Assume that errors are independent, normal (0, σ 2 ).
1 6/7
t= √ =√ √ = 6.71
s/ SSxx 0.2857/ 17.5
Rejection Region: |t| > t0.025 = 2.776 (degrees of freedom = 6 − 2 = 4)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the slope is
significantly different from zero at α = 0.05.

s 0.0663
11.17 β̂1 ± t0.025 √ = 2.78 ± 3.182 √ = (2.2618, 3.2982)
SSxx 2.5
(degrees of freedom = 5 − 2 = 3)
11.19 Hypotheses: H0 : β1 = 0 Ha : β1 6= 0
Test Statistic: Assume that errors are independent, normal (0, σ 2 ).
1 −3.7511
t= √ =√ √ = −17.94
s/ SSxx 19.1457/ 438
Rejection Region: |t| > t0.025 = 2.447 (degrees of freedom = 8 − 2 = 6)
Conclusion: Reject H0 at α = 0.05; i.e., there is a significant linear relationship between Rockwell
hardness and abrasion loss at α = 0.05.

11.21 We seek a confidence interval for 5β1 . If we assume  that the errors are independent, normal
2 σ2
(0, σ ), then β̂1 is distributed normal β1 , . Hence 5β̂1 has mean E(5β̂1 ) = 5E(β̂1 ) =
SSxx
25σ 2 25σ 2
 
5β1 and V (5β̂1 ) = 25V (β̂1 ) = . Then 5β̂1 is distributed normal 5β1 , . Thus a
SSxx SSxx
95% confidence interval for 5β1 is

5s 24.1792
5β̂1 ± t0.025 √ = 5(1.925) ± 2.228(5) √ = (5.752, 13.498)
SSxx 200
(degrees of freedom = 12 − 2 = 10)
152 Chapter 11: Inference For Regression Parameters

P P 2 P
11.23 xi = 55 x = 385 xi yi = 1, 889.09
P P 2i
yi = 348.47 yi = 12, 153.6443 n = 10
2
(55) (348.47)2
SSxx = 385 − = 82.5 SSyy = 12, 153.6443 − = 10.5102
10 10
(55)(348.47)
SSxy = 1, 889.09 − = −27.495
10
SSxy −27.495
a β̂1 = = = −0.3333
SSxx 82.5
 
348.47 55
β̂0 = ȳ − β̂1 x̄ = − (−0.3333) = 36.68
10 10
Least-squares line: ȳ = 36.68 − 0.3333x
SSyy − (SSxy )2 10.5102 − (−27.495)2
b SSE = = = 1.3469
SSxx 82.5
SSE 1.3469
s2 = = = 0.1684
n−2 8
c Hypotheses: H0 : β1 = 0 Ha : β 1 < 0
Test Statistics: Assume that errors are independent, normal (0, σ 2 ).
1 −0.3333
t= √ =√ √ = −7.38
s/ SSxx 0.1684/ 8.25

Rejection Region: t > −t0.05 = −1.860 (degrees of freedom = 10 − 2 = 8)


Conclusion: Reject H0 at α = 0.05; i.e., the mean temperature decreases significantly as distance
from the lake increases at α = 0.05.

11.4 Using the Linear Model for Estimation and Prediction


s
(x − x̄)2
 
1
11.25 ŷ ± t0.05 s2 + (degrees of freedom = 5 − 2 = 3)
n SSxx
s
1 (1.8 − 2)2
 
= −0.06 + 2.78(1.8) ± 2.353 (0.0663) + = (4.6623, 5.2257)
5 2.5
s
(x − x̄)2
 
1
11.27 ŷ ± t0.025 s2 + (degrees of freedom = 8 − 2 = 6)
n SSxx
s
1 (75 − 69.5)2
 
= 480.2043 − 3.7511(75) ± 2.447 19.1457 +
8 438
= (194.1520, 203.5855)
Assume that the errors are independent, normal (0, σ 2 ).
s 
(x − x̄)2

1
11.29 ŷ ± t0.05 s2 + (degrees of freedom = 12 − 2 = 10)
n SSxx
s
(12 − 15)2
 
1
= 10.2083 + 1.925(12) ± 1.812 24.1792 +
12 200
= (30.1164, 36.5002)
Chapter 11: Inference For Regression Parameters 153

11.31 Hypotheses: H0 : β0 = 0 Ha : β0 6= 0
β̂0 −0.06
Test Statistics: t = s =s  = −0.17
x2 1 (2)2
 
1
s2 + (0.0663) +
n SSxx 5 2.5
Rejection Region: |t| > t0.025 = 3.182
(degrees of freedom = 5 − 2 = 3)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that β0 is
different from zero at α = 0.05.
ȳ = 0.0616 x̄ = 874.42
11.33 a
SSxy = 54.15 SSxx = 159916 SSyy = 0.0339
SSxy 54.15
β̂1 = = = 0.000339
SSxx 159916
β̂0 = ȳ − β̂1 x̄ = 0.0616 − (0.000339)(874.42) = −0.2345
REGRESSION EQUATION: Wear rate = −0.2345 + 0.000339 ∗ (Temperature)
b Scatter plot with confidence and prediction bands:

c SSE = SSyy − β̂1 SSxy = 0.0339 − 0.000339(54.15) = 1.8173


s2 = SSE/(n − 2) = 1.8173/(26 − 2) = 0.0757
s 
(x − x̄)2

1
ŷ ± t0.025 s2 + (degrees of freedom = 24)
n SSxx
s
(900 − 874.42)2
 
1
= (0.0706) ± 2.064 0.0757 + = (−0.047, 0.188)
26 159916
s 
(x − x̄)2

1
d ŷ ± t0.025 s2 1 + + (degrees of freedom = 24)
n SSxx
s
(900 − 874.42)2
 
1
= (0.0706) ± 2.064 0.0757 1 + + = (−0.509, 0.650)
26 159916
154 Chapter 11: Inference For Regression Parameters

11.6 Inference in Multiple Regression


11.35 a A value of R2 = 0.89 means that 89% of the variation in the y’s is accounted for by the model.
This suggests that the model may provide a good fit to the data.
b Hypotheses: H0 : β1 = · · · = β5 = 0
Ha : At least one βi is nonzero, i = 1, ..., 5
R2 /k 0.89/5
Test Statistic: F = = = 38.84
(1 − R2 )/ (n − (k + 1)) (0.11)/(30 − 6)
Rejection Region: F > F0.05 (k, n − (k + 1)) = F0.05 (5, 24) = 2.62
Conclusion: Reject H0 at α = 0.05; i.e., the fit of the model is adequate at α = 0.05 and hence
the model is of some use in predicting y.

11.37 Hypotheses: H0 : β1 = · · · = β18 = 0


Ha : At least one βi is nonzero, i = 1, ..., 18
0.95/18
Test Statistic: F = = 1.06
(1 − 0.95)/(20 − 19)
Rejection Region: F > F0.05 (18, 1) ≈ 247.16 by interpolating from F table.
Conclusion: Fail to reject H0 at α = 0.05; i.e., none of the β parameters are significantly different
from zero, so the model is not adequate at α = 0.05.

11.39 a ŷ = 20.09111 − 0.6705x + 0.009535x2


b Scatterplot of completion time vs months of experience:

c Hypotheses: H0 : β2 = 0 Ha : β2 6= 0
β̂2 0.009535
Test Statistic: t = = = 1.51
sβ̂2 0.006326
Rejection Region: |t| > t0.01 = 3.055 (degrees of freedom = 12)
Conclusion: Fail to reject H0 at α = 0.01, i.e., the quadratic term does not make a significant
contribution to the model.
Chapter 11: Inference For Regression Parameters 155

P P 2 P
d xi = 151 x = 2, 295 xi yi = 1, 890
P P 2i
yi = 222 yi = 3, 456 n = 15
2
(151)
SSxx = 2, 295 − = 774.9333
15
2
(222)
SSyy = 3, 456 − = 170.4
15
(151)(222)
SSxy = 1, 890 − = −344.8
15
SSxy −344.8
β̂1 = = = −0.4449
SSxx 774.9333
 
222 151
β̂0 = ȳ − β̂1 x̄ = − (−0.4444) = 19.2791
15 15
Reduced fitted model: ŷ = 19.2791 − 0.4449x

(To test the utility of the model, we compute s2 = 1.3065, which yields a t = β̂1 / s/ SSxx =


−10.84 for testing H0 : β1 = 0 vs. Ha : β1 6= 0. This is highly significant, so the model is


adequate.)

t0.05 s 1.771 1.3065
e β̂1 ± √ = −0.4449 ± √ = (−0.5177, −0.3722)
SSxx 774.9333
(degrees of freedom = 13)
In the context of this problem, β1 is the decrease in time required to complete the task per
additional month of experience.
11.41 a Hypotheses: H0 : β2 = 0 Ha : β2 6= 0
β̂2 0.55
Test Statistic: t = = = 3.04
sβ̂2 0.181
P-value: Let p = P (|t| > 3.04); then 0.005 < p/2 = P (t > 3.04) < 0.01 so that 0.01 < p < 0.02
(degrees of freedom = 8).
Conclusion: Since the P-value is small (say, less than 0.05), we reject H0 and conclude that the
mean Christmas sales are related to August sales in the proposed model.
11.43 If there are not interaction terms in the model and m independent variables, then the number of
degrees of freedom for error is n − (m + 1). If there are m independent variables and all possible
interactions are included in the full model, then (for n > 2m ) there will be n − 2m degrees of
freedom for error. The number of degrees of freedom for error is most easily related to k, the
number of parameters (excluding β0 ) giving n − (k + 1) degrees of freedom for error.
11.45 a Y = β0 + β1 x1 + β2 x21 + ǫ
b Hypotheses: H0 : β1 = β2 = 0 Ha : β1 or β2 is nonzero
Test Statistic: F = 44.50
Rejection Region: F > F0.05 (2, 27) = 3.35
Conclusion: Reject H0 at α = 0.05; i.e., the model is adequate at α = 0.05.
c The P-value of 0.0001 indicates that if this experiment were repeated, only about once in 10,000
trials would an F-value as high as 44.5 be observed by chance when, in fact, β1 = β2 = 0.
d Hypotheses: H0 : β2 = 0 Ha : β2 6= 0
Test Statistic: t = −4.68
Rejection Region: |t| > t0.025 = 2.052 (degrees of freedom = 27)
Conclusion: We reject H0 at α = 0.05 and conclude that the second-order term is significant.
156 Chapter 11: Inference For Regression Parameters

e The P-value of 0.0001 indicates that if this experiment were repeated, only about once in 10,000
trials would a t-value at least 4.68 units from zero be observed by chance when, in fact, β2 = 0.

11.47 Minitab output for a linear model.


The regression equation is
Height = 0.0791 - 0.0227 Velocity

Predictor Coef SE Coef T P


Constant 0.079069 0.001346 58.73 0.000
Velocity -0.022742 0.001929 -11.79 0.000

S = 0.00108417 R-Sq = 92.1% R-Sq(adj) = 91.4%

Analysis of Variance

Source DF SS MS F P
Regression 1 0.00016332 0.00016332 138.95 0.000
Residual Error 12 0.00001411 0.00000118
Total 13 0.00017743

11.7 Model Building: A Test for a Portion of a Model


11.49 a We would fit the complete model

Y = β0 + β1 x1 + β2 x2 + β 3 x3 + β4 x4 + ǫ

where k = 4 in order to determine SSE2 . Then we would fit the reduced model Y = β0 + β1 x3 +
β2 x4 + ǫ where g = 2 in order to determine SSE1 . These are the relevant quantities for the
F-statistic.
b ν1 = k − g = 4 − 2 = 2 ν2 = n − (k + 1) = 25 − (4 + 1) = 20

11.51 a H0 : β 1 = · · · = β 5 = 0
Ha : At least one of β1 , . . . , β5 is nonzero
b H0 : β 3 = β 4 = β 5 = 0
Ha : At least one of β1 , β4 , or β5 is nonzero
0.729/5
c Test Statistic: F = = 18.29
(1 − 0.729)/(40 − 6)
Rejection Region: F > F0.05 (5, 34) ≈ 2.50
Conclusion: Reject H0 at α = 0.05; i.e., the complete model is useful for prediction since at least
one of the β’s is significantly different from zero at α = 0.05.
(3, 197.16 − 1, 830.44)/(5 − 2)
d Test Statistic: F = = 8.46
1, 830.44/(40 − (5 + 1))
Rejection Region: F > F0.05 (3, 34) ≈ 2.89
Conclusion: Reject H0 at α = 0.05; i.e., the second order model is significant at α = 0.05.
Chapter 11: Inference For Regression Parameters 157

11.53 Hypotheses: H0 : β1 = β3 = 0 Ha : β2 6= 0 or β3 6= 0
(795.23 − 783.90)/2
Test Statistic: F = = 1.41
783.90/(200 − (4 + 1))
Rejection Region: F > F0.05 (2, 195) ≈ 3.06
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence that the mean faculty
salary is dependent on sex at α = 0.05.

11.11 Supplementary Exercises

11.55 a Time series plots of CO, SO2 , Pb, NO2 and Ozone levels (with different y scales for comparison):
158 Chapter 11: Inference For Regression Parameters

CO, NO2 and SO2 levels have decreased linearly over the years, Pb levels have decreased drasti-
cally, and ozone levels have remained more or less constant.
b Scatterplot matrix of CO, Pb, NO2 , Ozone and SO2 levels:

With a correlation matrix:


CO Pb NO2 Ozone
Pb 0.858
NO2 0.939 0.706
Ozone 0.839 0.708 0.860
SO2 0.972 0.768 0.917 0.797
The strongest relationship is between CO levels and SO2 levels.
ȳ = 5.845 x̄ = 0.0077
SSxy = 0.048 SSxx = 0.000051 SSyy = 47.39
SSxy 0.048
β̂1 = = = 937.91
SSxx 0.000051
β̂0 = ȳ − β̂1 x̄ = 5.845 − (937.91)(0.0077) = −1.339
REGRESSION EQUATION: CO= −1.339 + 937.91 ∗ (SO2 )
Chapter 11: Inference For Regression Parameters 159

Hypotheses: H0 : ρ = 0 and Ha : ρ 6= 0
Assuming that the populations of CO and SO2 measurements are normally distributed, we can
calculate our test statistic:
s
r
n−2 20 − 2
t=r = (0.972) = 17.55
1 − r2 1 − (0.972)2
With 20 − 2 = 18 degrees of freedom, we can calculate our p-value as:
p = 2P (t < 17.55) ≈ 0
Because p ≈ 0 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence to
conclude that the correlation coefficient is non-zero.
11.57 a Scatterplot of dampers vs height:

The scatterplot indicates a non-linear correlation between the dampers and the height of the
building. There is a possible quadratic relationship.
160 Chapter 11: Inference For Regression Parameters

b Minitab output of dampers vs height and height2 :

The regression equation is


Dampers = 162 - 2.77 Height + 0.0147 Height^2

Predictor Coef SE Coef T P


Constant 162.48 46.26 3.51 0.013
Height -2.7741 0.6533 -4.25 0.005
Height^2 0.014671 0.002105 6.97 0.000

S = 20.0621 R-Sq = 97.2% R-Sq(adj) = 96.2%

Analysis of Variance

Source DF SS MS F P
Regression 2 82845 41423 102.92 0.000
Residual Error 6 2415 402
Total 8 85260
c An F of over 100 gives us a p-value ≈ 0. Since p ≈ 0 is less than α = 0.05, reject the null hypothesis
that all of the βi are 0. We can be very confident that the model contributes information for the
prediction of the height of the building.
11.59 a Scatterplot of limit load vs PDA:

The scatterplot shows a strong negative linear correlation between limit load and PDA.
Chapter 11: Inference For Regression Parameters 161

b Minitab output for limit load vs PDA:


The regression equation is
Limit load = 0.840 - 0.00825 PDA

Predictor Coef SE Coef T P


Constant 0.84028 0.02573 32.66 0.000
PDA -0.0082500 0.0004572 -18.04 0.000

S = 0.0354170 R-Sq = 97.9% R-Sq(adj) = 97.6%

Analysis of Variance

Source DF SS MS F P
Regression 1 0.40838 0.40838 325.56 0.000
Residual Error 7 0.00878 0.00125
Total 8 0.41716
c An F of over 300 gives us a p-value ≈ 0. Because p ≈ 0 is less than α = 0.05, reject the null
hypothesis that β1 is 0. We can be very confident that the model contributes information for the
prediction of the limit load.

11.61 a Minitab output for number of year 1 cost of developing underground utilities vs C1-C8:
The regression equation is
Underground-Yr1 cost = 1279 + 301 C1 + 0.870 C2 - 121 C3 + 2.82 C4 - 274 C5
+ 1.24 C6 - 114 C7 + 928 C8

Predictor Coef SE Coef T P


Constant 1278.6 816.1 1.57 0.125
C1 301.10 15.91 18.92 0.000
C2 0.8698 0.4177 2.08 0.044
C3 -120.6 154.8 -0.78 0.441
C4 2.8238 0.6038 4.68 0.000
C5 -274.2 479.1 -0.57 0.570
C6 1.2374 0.4381 2.82 0.008
C7 -113.7 286.5 -0.40 0.694
C8 928.0 364.5 2.55 0.015

S = 1724.31 R-Sq = 99.0% R-Sq(adj) = 98.8%

Analysis of Variance

Source DF SS MS F P
Regression 8 11734462627 1466807828 493.34 0.000
Residual Error 38 112983254 2973244
Total 46 11847445882
162 Chapter 11: Inference For Regression Parameters

b Minitab output for number of year 1 cost of developing overhead utilities vs C1-C8:
The regression equation is
Overhead-Yr1cost = - 873 + 205 C1 + 2.21 C2 - 167 C3 + 1.14 C4 + 1182 C5
+ 0.687 C6 + 307 C7 + 412 C8

Predictor Coef SE Coef T P


Constant -872.9 902.6 -0.97 0.340
C1 204.78 17.60 11.63 0.000
C2 2.2084 0.4620 4.78 0.000
C3 -167.3 171.2 -0.98 0.335
C4 1.1361 0.6677 1.70 0.097
C5 1181.9 529.8 2.23 0.032
C6 0.6872 0.4846 1.42 0.164
C7 307.3 316.9 0.97 0.338
C8 412.2 403.1 1.02 0.313

S = 1907.08 R-Sq = 98.4% R-Sq(adj) = 98.1%

Analysis of Variance

Source DF SS MS F P
Regression 8 8537420756 1067177595 293.43 0.000
Residual Error 38 138204360 3636957
Total 46 8675625116
c Minitab output for number of year 2 cost of developing underground utilities vs C1-C8:
The regression equation is
Underground-Yr2 cost = 1373 + 318 C1 + 0.879 C2 - 127 C3 + 2.99 C4 - 227 C5
+ 1.30 C6 - 102 C7 + 987 C8

Predictor Coef SE Coef T P


Constant 1372.8 897.5 1.53 0.134
C1 318.07 17.50 18.18 0.000
C2 0.8790 0.4594 1.91 0.063
C3 -127.4 170.2 -0.75 0.459
C4 2.9885 0.6639 4.50 0.000
C5 -226.7 526.8 -0.43 0.669
C6 1.3006 0.4818 2.70 0.010
C7 -102.2 315.1 -0.32 0.747
C8 986.8 400.8 2.46 0.018

S = 1896.15 R-Sq = 99.0% R-Sq(adj) = 98.7%


Chapter 11: Inference For Regression Parameters 163

Analysis of Variance

Source DF SS MS F P
Regression 8 13039013684 1629876710 453.32 0.000
Residual Error 38 136624794 3595389
Total 46 13175638478
d Minitab output for number of year 2 cost of developing overhead utilities vs C1-C8:
The regression equation is
Overhead-Yr2cost = 1147 + 186 C1 + 2.09 C2 - 189 C3 + 1.63 C4 + 2197 C5
- 0.09 C6 + 160 C7 + 374 C8

Predictor Coef SE Coef T P


Constant 1147 2948 0.39 0.699
C1 186.47 57.49 3.24 0.002
C2 2.093 1.509 1.39 0.173
C3 -188.6 559.3 -0.34 0.738
C4 1.627 2.181 0.75 0.460
C5 2197 1731 1.27 0.212
C6 -0.094 1.583 -0.06 0.953
C7 160 1035 0.15 0.878
C8 374 1317 0.28 0.778

S = 6228.74 R-Sq = 84.7% R-Sq(adj) = 81.5%

Analysis of Variance

Source DF SS MS F P
Regression 8 8158250097 1019781262 26.28 0.000
Residual Error 38 1474295399 38797247
Total 46 9632545496
164 Chapter 11: Inference For Regression Parameters

11.63 a Scatterplots of percent deviation vs black body temperature for calibration at 1400 and 1500
degrees:

b Minitab output follows:


Regression Analysis: 1400 deviation versus temperature

The regression equation is


1400 deviation = 8.96 - 0.00627 temperature

Predictor Coef SE Coef T P


Constant 8.9636 0.7219 12.42 0.000
temperature -0.0062727 0.0005152 -12.18 0.000

S = 0.108059 R-Sq = 94.3% R-Sq(adj) = 93.6%


Chapter 11: Inference For Regression Parameters 165

Analysis of Variance

Source DF SS MS F P
Regression 1 1.7313 1.7313 148.27 0.000
Residual Error 9 0.1051 0.0117
Total 10 1.8364
Regression Analysis: 1500 deviation versus temperature

The regression equation is


1500 deviation = 8.67 - 0.00586 temperature

Predictor Coef SE Coef T P


Constant 8.6682 0.7087 12.23 0.000
temperature -0.0058636 0.0005057 -11.59 0.000

S = 0.106078 R-Sq = 93.7% R-Sq(adj) = 93.0%

Analysis of Variance

Source DF SS MS F P
Regression 1 1.5128 1.5128 134.44 0.000
Residual Error 9 0.1013 0.0113
Total 10 1.6141
For both models, with p ≈ 0, we can reject the null hypotheses that β1 = 0 and conclude that the
models provide information for the prediction of the percent deviation.
c For the 1400 degree calibration, we can set the percent deviation in the regression equation equal
to 0 and solve for the temperature:
8.96 - 0.00627 temperature = 0
temperature = 8.96/0.00627 = 1429.027
Doing the same thing for the 1500 degree calibration we get:
temperature = 8.67/0.00586 = 1479.5
166 Chapter 11: Inference For Regression Parameters

11.65 a Minitab output follows:


Regression Analysis: Core Velocity versus Impeller Diameter, Off-Bottom Clearance

The regression equation is


Core Velocity = 0.396 - 0.400 Impeller Diameter - 0.0183 Off-Bottom Clearance

Predictor Coef SE Coef T P


Constant 0.39588 0.03738 10.59 0.000
Impeller Diameter -0.40036 0.08137 -4.92 0.000
Off-Bottom Clearance -0.01827 0.06884 -0.27 0.794

S = 0.0424250 R-Sq = 57.4% R-Sq(adj) = 52.6%


The p value for the off-bottom clearance term and a scatterplot of core velocity vs off-bottom
clearance shows that the off-bottom clearance term does not give significant information. So we
reanalyze without this term. Minitab output follows:
The regression equation is
Core Velocity = 0.389 - 0.399 Impeller Diameter

Predictor Coef SE Coef T P


Constant 0.38860 0.02477 15.69 0.000
Impeller Diameter -0.39879 0.07915 -5.04 0.000

S = 0.0413741 R-Sq = 57.2% R-Sq(adj) = 54.9%


Model: Core Velocity = 0.389 − 0.399 Impeller Diameter
11.67 a ŷ = 0.04565 + 0.000785x1 + 0.23737x2 − 0.0000381x1 x2
b SSE = 2.7152, s2 = M SE = 0.1697
c The least-squares technique chooses estimates for β0 , β1 , β2 , β3 so as to minimize
SSE = i (yi − ŷi )2 .
P

11.69 From the SAS printout, a 95% confidence interval for the mean cost of computer jobs that require
42 seconds of CPU time and print 2,000 lines is ($7.32, $9.45).

11.71 a ŷ = 0.6013 + 0.5953x1 − 3.7254x2 − 16.2320x3 + 0.2349x1 x2 + 0.3081x1 x3


b The value of R2 = 0.9281 indicates that about 92.8% of the variability in the Y scores is accounted
for by the model.
Hypotheses: H0 : β1 = · · · = β5
Ha : At least one of β1 , . . . , β5 is nonzero
Test Statistic: F = 139.42
P-value: 0.0001
Conclusion: Since the P-value of 0.0001 is extremely small, we reject H0 and we have sufficient
evidence that the model is useful for predicting achievement test scores.
Chapter 11: Inference For Regression Parameters 167

c Regression lines for high, medium and low SES:

d Hypotheses: H0 : β4 = β5 = 0
Ha : β4 6= 0 or β5 6= 0
(1216.0189 − 969.4831)/(5 − 3)
Test Statistic: F = = 6.87
969.4831/ (60 − (5 + 1))
Rejection Region: F > F0.05 (2, 54) ≈ 3.20
Conclusion: Reject H0 at α = 0.05; i.e., the mean increase in achievement test scores per unit
increase in IQ significantly differs for the three levels of SES at α = 0.05.
11.73 Hypotheses: H0 : β2 = 0 Ha : β 2 < 0
−0.53
Test Statistic: t = = −1.10
0.48
Rejection Region: t < −t0.01 = −2.326
(degrees of freedom = 42 − 3 = 39(∞))
Conclusion: Fail to reject H0 at α = 0.01; i.e., there is insufficient evidence to conclude that after
allowing for the effect of initial assembly time, plant A had a lower mean assembly time than
plant B.
11.75 We test the hypotheses from Exercise 10.26, part (b).
(259.34 − 226.12)/(4 − 2)
Test Statistic: F = = 3.31
226.12/(50 − (4 + 1))
Rejection Region: F > F0.05 (2, 45) ≈ 3.21
Conclusion: Reject H0 at α = 0.05; i.e., the mean delivery time significantly differs for mail and
truck deliveries at α = 0.05.
11.77 a Y = cost of material and labor
x1 = area
x2 = number of baths
(
1 central air
x3 =
0 no central air
First-order model: Y = β0 + β1 x1 + β2 x2 + β3 x3 + ǫ
168 Chapter 11: Inference For Regression Parameters

b Second-order model: Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x1 x2 + β5 x1 x3
+β6 x2 x3 + β7 x21 + β8 x22 + ǫ
c H0 : β 4 = · · · = β 8 = 0
Ha : At least one of β4 , . . . , β8 is nonzero
11.79 Minitab output follows:
Regression Analysis: Time versus Brand, Experience

The regression equation is


Time = 1.68 + 0.444 Brand - 0.0793 Experience

Predictor Coef SE Coef T P


Constant 1.67939 0.04998 33.60 0.000
Brand 0.44414 0.04111 10.80 0.000
Experience -0.079322 0.005981 -13.26 0.000

S = 0.0649679 R-Sq = 97.7% R-Sq(adj) = 97.1%

Analysis of Variance

Source DF SS MS F P
Regression 2 1.27145 0.63573 150.62 0.000
Residual Error 7 0.02955 0.00422
Total 9 1.30100

Source DF Seq SS
Brand 1 0.52900
Experience 1 0.74245
a ŷ = 1.6794 + 0.4441x1 − 0.0793x2
b Hypotheses: H0 : β1 = β2 = 0 Ha : β1 6= 0 or β2 6= 0
Test Statistic: F = 150.62
P-value: 0.0001
Conclusion: Since the P-value is extremely small, we reject H0 and conclude that the model is
appropriate.
c Since R2 = 0.9773 is large, it tends to support the finding that the model is appropriate.
d (−0.0793 ± 1.895(0.005981)) = (−0.0907, −0.0680)
Since this interval does not include zero, we may conclude at a 90% confidence level that β2 is
not zero. Hence the service person’s number of months of experience in preventive maintenance
is useful in predicting time of preventive maintenance.
e ŷ = 1.6794 + 0.4441(0) − 0.0793(6) = 1.2036 (hours)
f Assuming that the service times are independent, the predicted mean time to service ten computers
is 12.036 hours.
g (1.6430, 1.9785)
Chapter 11: Inference For Regression Parameters 169

11.81 Minitab output follows:


The regression equation is
y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1^2 - 0.000843 x2^2 + 0.000241 x1x2

Predictor Coef SE Coef T P


Constant -9.917 1.354 -7.32 0.000
x1 0.16681 0.02124 7.85 0.000
x2 0.13760 0.02673 5.15 0.000
x1^2 -0.0011082 0.0001173 -9.45 0.000
x2^2 -0.0008433 0.0001594 -5.29 0.000
x1x2 0.0002411 0.0001440 1.67 0.103

S = 0.187142 R-Sq = 93.7% R-Sq(adj) = 92.7%

Analysis of Variance

Source DF SS MS F P
Regression 5 17.5827 3.5165 100.41 0.000
Residual Error 34 1.1908 0.0350
Total 39 18.7735
ŷ = −9.9168 + 0.1668x1 + 0.1376x2 − 0.001108x21 − 0.0008433x22 + 0.0002411x1 x2
a The value of R2 = 0.9365 indicates that about 93.65% of the variability in the GPA data is
accounted for by the model.
Hypotheses: H0 : β1 = · · · = β5 = 0
Ha : At least one of β1 , . . . , β5 is nonzero
Test Statistic: F = 100.41
P-value: 0.0001 < 0.05 = α
Conclusion: Since the P-value is extremely small, we reject H0 and conclude that the model is
useful in predicting mean freshman GPA values.
b Regression curves for x2 = 60, 75 and 90:
170 Chapter 11: Inference For Regression Parameters

c Hypotheses: H0 : β5 = 0 Ha : β5 6= 0
Test Statistic: t = 1.67
P-value: 0.1032 > 0.10 = α
Conclusion: Fail to reject H0 at α = 0.10; i.e., there is insufficient evidence to conclude that the
interaction term is important for the prediction of GPA.
11.83 Minitab output follows:
Regression Analysis: SO2 emission versus Output

The regression equation is


SO2 emission = - 93.1 + 0.445 Output

Predictor Coef SE Coef T P


Constant -93.13 23.05 -4.04 0.005
Output 0.44458 0.03986 11.15 0.000

S = 10.7931 R-Sq = 94.7% R-Sq(adj) = 93.9%

Analysis of Variance

Source DF SS MS F P
Regression 1 14488 14488 124.37 0.000
Residual Error 7 815 116
Total 8 15304

Regression Analysis: SO2 emission versus Output, Output^2

The regression equation is


SO2 emission = 204 - 0.638 Output + 0.000959 Output^2

Predictor Coef SE Coef T P


Constant 204.46 82.95 2.46 0.049
Output -0.6380 0.2985 -2.14 0.076
Output^2 0.0009592 0.0002636 3.64 0.011

S = 6.50996 R-Sq = 98.3% R-Sq(adj) = 97.8%

Analysis of Variance

Source DF SS MS F P
Regression 2 15049.3 7524.6 177.55 0.000
Residual Error 6 254.3 42.4
Total 8 15303.6
Chapter 11: Inference For Regression Parameters 171

a Scatterplot of sulfur dioxide emission vs output and regression line:

b ŷ = −93.1277 + 0.4446x
c ŷ = 204.4603 − 0.6380x + 0.0009593x2
d Hypotheses: H0 : β2 = 0 Ha : β2 6= 0
Test Statistic: t = 3.64
P-value: 0.0108
Conclusion: Since the P-value is small, we reject H0 and conclude that the quadratic model is
useful in describing the relationship between sulfur dioxide and output.
e ŷ = 204.4603 − 0.6380(500) + 0.0009593(500)2 = 125.2853
11.85 Hypotheses: H0 : β2 = 0 Ha : β 2 < 0
Test Statistic: t = − 6.60
Rejection Region: t < −t0.05 = −1.717 (degrees of freedom = 22)
Conclusion: Reject H0 at α = 0.04; i.e., there is sufficient evidence to conclude that the rate of
increase in output per unit increase of input decreases as the input increases.

11.87 a Hypotheses: H0 : β1 = β2 = 0 Ha : β1 6= 0 or β2 6= 0
Test Statistics: We fit the reduced model Y = β0 + β3 x2 + ǫ.
P P 2 P P 2 P
Summary Statistics: x2 = x2 = 10, y = 461, y = 13151, xy = 228. Then for
2
the reduced model SSE = SSyy − SSxy /SSxx = 2554.95 − (−2.5)2 /5 = 2523.70.

(2523.7 − 128.586)/2
F = = 149.01
128.586(20 − 4)
Rejection Region: F > F0.01 (2, 16) > F0.05 (2, 16) = 3.63
b Conclusion: Reject H0 at α = 0.05; i.e., there is a significant quadratic relationship between age
of machine and time for repairs.
172 Chapter 11: Inference For Regression Parameters

11.89 Hypotheses: H0 : β3 = β4 = β5 = 0
Ha : At least one of β3 , β4 , or β5 is nonzero
(370.7911 − 164.9185)/3
Test Statistic: F = = 9.99
164.9185/(30 − 6)
Rejection Region: F > F0.05 (3, 24) = 3.01
Conclusion: Reject H0 at α = 0.05; i.e., the inclusion of the variable for speed limit contributes
information for the prediction of number of highway deaths.
11.91 Minitab output follows:
Regression Analysis: reaction distance versus speed

The regression equation is


reaction distance = - 0.000000 + 1.10 speed

Predictor Coef SE Coef T P


Constant -0.00000000 0.00000000 * *
speed 1.10000 0.00000 * *

S = 0 R-Sq = 100.0% R-Sq(adj) = 100.0%

Analysis of Variance

Source DF SS MS F P
Regression 1 2117.5 2117.5 * *
Residual Error 4 0.0 0.0
Total 5 2117.5

Regression Analysis: braking distance versus speed

The regression equation is


braking distance = - 102 + 4.86 speed

Predictor Coef SE Coef T P


Constant -102.50 28.21 -3.63 0.022
speed 4.8629 0.5861 8.30 0.001

S = 24.5174 R-Sq = 94.5% R-Sq(adj) = 93.1%

Analysis of Variance

Source DF SS MS F P
Regression 1 41383 41383 68.84 0.001
Residual Error 4 2404 601
Total 5 43787
Chapter 11: Inference For Regression Parameters 173

Regression Analysis: Total stopping distance versus speed

The regression equation is


Total stopping distance = - 102 + 5.96 speed

Predictor Coef SE Coef T P


Constant -102.50 28.21 -3.63 0.022
speed 5.9629 0.5861 10.17 0.001

S = 24.5174 R-Sq = 96.3% R-Sq(adj) = 95.3%

Analysis of Variance

Source DF SS MS F P
Regression 1 62222 62222 103.51 0.001
Residual Error 4 2404 601
Total 5 64627
a Reaction = 1.1 speed
Reaction = 1.1(55) = 60.5
b Braking = −102.4952 + 4.8629 speed
Braking = −102.4952 + 4.8629(55) = 164.9643
c Total = −102.4952 + 5.9629 speed
Total = −102.4952 + 5.9629(55) = 225.4643
11.93 Model for BaP: %CN = −25.5902 + 68.8217 mean Rf
Model for BaA: %CN = −20.6338 + 77.4706 mean Rf
Model for Phe: %CN = −16.0972 + 86.0228 mean Rf
174 Chapter 12: Analysis of Variance
Chapter 12

Analysis of Variance

12.1 Analysis of Variance (ANOVA) Technique


12.1 Source df SS MS F
Treatments 23-20 = 3 1580.35 1580.35/3 = 526.78 526.78/7.4 = 71.19
Error 20 1728.35-1580.35 = 148 148/20 = 7.4
Total 23 1728.35
a There are 3+1 = 4 treatments.
b There were 23+1 = 24 total observations.
c There were 24/4=6 observations per treatment
12.3 a Source df SS MS F
Treatments 4 24.7 24.7/4 = 6.175 6.175/1.257 = 4.914
Error 34-4=30 62.4-24.7 = 37.7 37.7/30 = 1.257
Total 34 62.4
b There are 4+1 = 5 treatments.

12.2 Analysis of Variance for the


Completely Randomized Design
yi2 = 106.32,
P P
12.5 Summary Statistics: yi = 28.8,
T1 = 8.6, T2 = 15.9, T3 = 4.3
a ANOVA Table
Source df SS MS F
Treatments 2 11.0752 5.5376 3.1513
Error 7 12.3008 1.7573
Total 9 23.3760

b Hypotheses: H0 : µ1 = µ2 = µ3 Ha : At least two means differ


Test Statistic: F = 3.15
Rejection Region: F > F0.05 (2, 7) = 4.74
Conclusion: Fail to reject H0 at α = 0.05; i.e., the treatment means are not significantly different
at α = 0.05.

175
176 Chapter 12: Analysis of Variance

yi2 = 103083,
P P
12.7 Summary Statistics: yi = 1155,
T1 = 244, T2 = 269, T5 = 381, T7 = 261
ANOVA Table
Source df SS MS F
Treatments 3 345.6089 115.2030 8.6342
Error 9 120.0833 13.3426
Total 12 465.6922

Hypotheses: H0 : µ1 = µ2 = µ3 = µ5 = µ7 Ha : At least two means differ


Test Statistic: F = 8.63
Rejection Region: F > F0.05 (3, 9) = 3.86
Conclusion: Reject H0 at α = 0.05; i.e., the mean percentage of kill significantly differs for the
four rates of application of nematicide at α = 0.05.
12.9 (a) and (b):
42.960 + 145.222 + 92.249 + 73.224
Summary Statistics: ȳ = = 3.2445,
109
2.0859 + 14.0186 + 7.9247 + 5.6812
s2p = = 0.2830,
105
15(2.864 − ȳ)2 + 41(3.542 − ȳ)2 + 29(3.181 − ȳ)2 + 24(3.051 − ȳ)2

F =
3s2p
= 8.0283
ANOVA Table
Source df SS MS F
Treatments 3 6.8160 2.2720 8.0283
Error 105 29.7150 0.2830
Total 108 36.5310

c Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least two means differ


Test Statistic: F = 8.0283
Rejection Region: F > F0.05 (3, 105) ≈ 2.70
Conclusion: Reject H0 at α = 0.05; i.e., the mean scores are significantly different for the four
academic ranks. s  
1 1
d 2.864 − 3.051 ± 1.96 0.2830 + = (−0.5302, 0.1562)
15 24
s  
42.960 + 145.222 92.249 + 73.224 1 1
e − ± 1.96 0.2830 +
15 + 41 29 + 24 56 53
= (0.0384, 0.4381)
Since this interval does not contain zero, we may conclude at the 95% confidence level (α = 0.05)
that there is a significant difference between scores for tenured and nontenured faculty members.
12.11 a ANOVA Table
Source df SS MS F
Company 1 3237.2 3237.2 19.6222
Error 98 16167.7 164.9765
Total 99 19404.9
Chapter 12: Analysis of Variance 177

b Hypotheses: H0 : µ1 = µ2 Ha : µ1 6= µ2
Test Statistic: F = 19.62
Rejection Region: F > F0.05 (1, 98) ≈ 3.95
Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference between the number of
hours missed for the two companies.
12.13 Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least two means differ
(8(80) + 8(81) + 8(86) + 8(90))
Test Statistics: ȳ = = 84.25
32
700
s2p = = 25
(32 − 4)
8(80 − 84.25)2 + · · · + 8(90 − 84.25)2

F = = 6.9067
(25)(3)
Rejection Region: F > F0.05 (3, 28) = 2.95
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences among the mean percent-
ages of copper for the four castings.
yi = 772, yi2 = 30, 550,
P P
12.15 Summary Statistics:
TA = 174, TB = 208, TC = 231, TD = 159,
(722)2
TSS = 30, 550 − = 750.8,
20
(174)2 (159)2 (722)2
SST = + ··· + − = 637.2
5 5 20
ANOVA Table
Source df SS MS F
Treatments 3 637.2 212.4 29.9
Error 16 113.6 7.1
Total 19 750.8

Hypotheses: H0 : µA = µB = µC = µD Ha : At least two means differ


Test Statistic: F = 29.9
Rejection Region: F > F0.05 (3, 16) = 3.24
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences among the mean tensile
strengths for the four heat treatments.
178 Chapter 12: Analysis of Variance

12.3 Relation of ANOVA for CRD


with a t-Test and Regression
12.17 Model: Y = β0 + β1 x1 + β2 x2 + ǫ where
 
1 sample 1 1 sample 2
x1 = x2 =
0 otherwise 0 otherwise
Sample x1 x2 E(Y)
1 1 0 β0 + β1 = µ1
2 0 1 β0 + β2 = µ2
3 0 0 β0 = µ3

Hypotheses: Since β1 = µ1 − µ3 and β2 = µ2 − µ3 , H0 : µ1 = µ2 = µ3 is equivalent to H0 : β1 =


β2 = 0 with Ha : β1 6= 0 or β2 6= 0.
The ANOVA table, test statistic, rejection region, and conclusion are exactly as given in Exercise
11.1.
12.19 Model: Y = β0 + β1 x1 + β2 x2 + β3 x3 + ǫ where
 
1 thermometer 1 1 thermometer 2
x1 = x2 =
0 otherwise 0 otherwise
Thermometer x1 x2 E(Y)
1 1 0 β0 + β1 = µ1
2 0 1 β0 + β2 = µ2
3 0 0 β 0 = µ3

Hypotheses: Since β1 = µ1 − µ3 and β2 = µ2 − µ3 ,


H0 : µ1 = µ2 = µ3 is equivalent to H0 : β1 = β2 = 0 with Ha : β1 6= 0 or β2 6= 0.
The ANOVA table, test statistic, rejection region and conclusion are exactly as given in Exercise
12.14.

12.4 Estimation for Completely Randomized Design


12.21 a Hypotheses: H0 : µ1 = µ2 Ha : µ1 6= µ2
3.7 − 4.1
Test Statistic: t = s   = −0.67
1 1
1.2567 +
7 7
Rejection Region: |t| > 1.645 (degrees of freedom = 30)
Conclusion: Fail to reject H0 at α = 0.10; i.e., there is not a significant difference between µ1 and
µ2 at α = 0.10.
s  
1 1
b 3.7 − 4.1 ± 1.645 1.2567 + = (−1.3857, 0.5857)
7 7
r
1.2567
c 3.7 ± 1.645 = (3.0030, 4.3970)
7
Chapter 12: Analysis of Variance 179

s  
33.6 44.1 1 1
12.23 a − ± 2.179 0.8620 + = (−3.3795, −0.8205)
5 5 5 5
b t0.1/6 ≈ 2.401 (degrees of freedom = 12)
q
i j yi − yj ± 2.401 0.8620( n11 + n12 )
1 2 (5.7231, 7.7169)
1 3 (7.1231, 9.1169)
2 3 (7.8231, 9.8169)

12.5 Analysis of Variance for the Randomized Block Design


yi2 = 197,
P P
12.25 Summary Statistics: yi = 43, T1 = 12, T2 = 22, T3 = 9,
B1 = 10, B2 = 16, B3 = 7, B4 = 10, n = 12,
(43)2
TSS = 197 − = 2.9167,
12
1  (43)2
SST = (12)2 + (22)2 + (9)2 − = 23.1667,
4 12
1  (43)2
SSB = (10)2 + (16)2 + (7)2 + (10)2 − = 14.25
3 12
a ANOVA Table
Source df SS MS F
Treatment 2 23.1667 11.5833 12.6333
Block 3 14.25 4.7500 5.1889
Error 6 5.5 0.9167
Total 11 42.9167

b Hypotheses: H0 : µ1 = µ2 = µ3 Ha : At least two means differ


Test Statistic: F = 12.63
Rejection Region: F > F0.05 (2, 6) = 5.14
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences among the treatment
means.
c Hypotheses: H0 : Block means are equal
Ha : At least two block means differ
Test Statistic: F = 5.19
Rejection Region: F > F0.05 (3, 6) = 4.76
Conclusion: Reject H0 at α = 0.05; i.e., we have sufficient evidence to conclude that blocking is
effective in reducing experimental error at α = 0.05.
180 Chapter 12: Analysis of Variance

P P 2
12.27 Summary Statistics: yi = 1520.3, yi = 110587, n = 21,
T1 = 497.7, T2 = 531.3, T3 = 491.3,
B1 = 211.1, B2 = 202.7, B3 = 233.1,
B4 = 218.1, B5 = 220.5, B6 = 205.3, B7 = 229.5,
(1520.3)2
TSS = 110587.13 − = 524.6494,
21
1  (1520.3)2
SST = (497.7)2 + · · · + (491.3)2 − = 131.9010,
6 21
1  (1520.3)2
SSB = (211.1)2 + · · · + (229.5)2 − = 268.2894
3 21
ANOVA Table
Source df SS MS F
Treatment 2 131.9010 65.9505 6.3588
Block 6 268.2894 44.7149 4.3113
Error 12 124.4591 10.3716
Total 20 524.6494

Hypotheses: H0 : µ1 = µ2 = µ3 Ha : At least two means differ


Test Statistic: F = 6.3588
Rejection Region: F > F0.05 (2, 12) = 3.89
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences in pressures required to
separate the components among the three bonding agents.
P P 2
12.29 Summary Statistics: yi = 437, yi = 18, 169, n = 12,
T1 = 110, T2 = 109, T3 = 218,
B1 = 112, B2 = 108, B3 = 123, B4 = 94,
(437)2
TSS = 18, 169 − = 2, 254.9167,
12
1  (437)2
SST = (110)2 + · · · + (218)2 − = 1, 962.1667,
4 12
1  (437)2
SSB = (112)2 + · · · + (94)2 − = 143.5833
3 12
ANOVA Table
Source df SS MS F
Treatment 2 1962.1667 981.0833 34.4626
Block 3 143.5833 47.8611 1.925
Error 6 149.1667 24.8611 1.925
Total 11 2254.9167

a Hypotheses: H0 : µ1 = µ2 = µ3 Ha : At least two means differ


Test Statistic: F = 39.46
Rejection Region: F > F0.05 (2, 6) = 5.14
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences between the mean numbers
of blades among the three stations.
Chapter 12: Analysis of Variance 181

b Hypotheses: H0 : Block means are equal


Ha : At least two block means differ
Test Statistic: F = 1.93
Rejection Region: F > F0.05 (3, 6) = 4.76
Conclusion: Fail to reject H0 at α = 0.05; i.e., there are no significant differences in the mean
number of blades among the four months.
P P 2
12.31 Summary Statistics: yi = 236.2, yi = 3732.62, n = 15,
T1 = 73.3, T2 = 81.5, T3 = 81.4, B1 = 49.2,
B2 = 46.8, B3 = 46.3, B4 = 48.8, B5 = 45.1,
(236.2)2
TSS = 3, 732.62 − = 13.2573,
15
1  (236.2)2
SST = (73.3)2 + · · · + (81.4)2 − = 8.8573,
5 15
1  (236.2)2
SSB = (49.2)2 + · · · + (45.1)2 − = 3.977,
3 15
ANOVA Table
Source df SS MS F
Treatment 2 8.8573 4.4287 83.8239
Block 4 3.9773 0.9943 18.8203
Error 8 0.4227 0.0528
Total 14 13.2573

Hypotheses: H0 : µ1 = µ2 = µ3 Ha : At least two means differ


Test Statistic: F = 83.82
Rejection Region: F > F0.05 (2, 8) = 4.46
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences in mean delivery times
among the three carriers.
P P 2
12.33 Summary Statistics: yi = 6, 598, yi = 3, 632, 768, n = 12
T1 = 1, 653, T2 = 1, 702, T3 = 1, 634, T4 = 1, 609,
B1 = 2, 265, B2 = 2, 136, B3 = 2, 197,
(6, 598)2
TSS = 3, 632, 768 − = 4, 967.6690,
12
1  (6, 598)2
SST = (1, 653)2 + · · · + (1, 609)2 − = 1, 549.6680,
3 12
1 (6, 598)2
SSB = ((2, 265)2 + · · · + (2, 197)2 ) − = 2, 082.1690
4 12
ANOVA Table
Source df SS MS F
Treatment 3 1549.6680 516.5560 2.3202
Block 2 2082.1690 1041.0845 4.6761
Error 6 1335.8320 222.6387
Total 11 4967.6690
182 Chapter 12: Analysis of Variance

a Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least two means differ


Test Statistic: F = 2.32
Rejection Region: F > F0.05 (3, 6) = 4.76
Conclusion: Fail to reject H0 at α = 0.05; i.e., there are not significant differences in mean
temperature among the four treatments.
b Hypotheses: H0 : Block means are equal Ha : At least two block means differ
Test Statistic: F = 4.68
Rejection Region: F > F0.05 (2, 6) = 5.14
Conclusion: Fail to reject H0 at α = 0.05; i.e., there are not significant differences among the
batch means.
12.35 a Summary Statistics: ȳ = 35.5 n = 20
T1 = 182 T2 = 151 T3 = 135 T4 = 124 T5 = 115
B1 = 169 B2 = 49 B3 = 243 B4 = 246
TSS = (40 − 35.5)2 + · · · + (39 − 35.5)2 = 6034.55,
SST = 4((182/4 − 35.5)2 + · · · + (115/4 − 35.5)2 ) = 695.30,
SSB = 5((169/5 − 35.5)2 + · · · + (246/5 − 35.5)2 ) = 5112.95
SSE = 6034.55 − 695.30 − 5112.95 = 226.30
ANOVA Table
Source DF SS MS F P
Treatment 4 695.30 173.83 9.22 0.001
Block 3 5112.95 1704.32 90.37 0.000
Error 12 226.30 18.86
Total 19 6034.55

Hypotheses: H0 : µ1 = · · · = µ5 Ha : At least two means differ


Test Statistic: F = 90.37
p-value: ≈ 0
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences in mean number of
circumferential waves for different shell thicknesses.
b The treatments should be randomly assigned to units within each block, the measurements for
each treatment/block population should be approximately normally distributed and the variances
of the probability distributions should be equal.
c One would need to check for normality and variance equality by taking random samples from each
treatment/block assignment.

12.37 Model: Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 + ǫ where


  
1 May 1 June 1 July
x1 = x2 = x3 =
0 otherwise 0 otherwise 0 otherwise
 
1 Station 1 1 Station 2
x4 = x5 =
0 otherwise 0 otherwise
Hypotheses: Testing differences among treatment means is equivalent to testing
H0 : β4 = β5 = 0 against Ha : β4 6= 0 or β5 6= 0
Chapter 12: Analysis of Variance 183

Minitab output follows:


Regression Analysis: grass blades versus x_1, x_2, x_3, x_4, x_5

The regression equation is


grass blades = 49.4 + 6.00 x_1 + 4.67 x_2 + 9.67 x_3 - 27.0 x_4 - 27.2 x_5

Predictor Coef SE Coef T P


Constant 49.417 3.526 14.02 0.000
x_1 6.000 4.071 1.47 0.191
x_2 4.667 4.071 1.15 0.295
x_3 9.667 4.071 2.37 0.055
x_4 -27.000 3.526 -7.66 0.000
x_5 -27.250 3.526 -7.73 0.000

S = 4.98609 R-Sq = 93.4% R-Sq(adj) = 87.9%

Analysis of Variance

Source DF SS MS F P
Regression 5 2105.75 421.15 16.94 0.002
Residual Error 6 149.17 24.86
Total 11 2254.92

Regression Analysis: grass blades versus x_1, x_2, x_3

The regression equation is


grass blades = 31.3 + 6.0 x_1 + 4.7 x_2 + 9.7 x_3

Predictor Coef SE Coef T P


Constant 31.333 9.379 3.34 0.010
x_1 6.00 13.26 0.45 0.663
x_2 4.67 13.26 0.35 0.734
x_3 9.67 13.26 0.73 0.487

S = 16.2455 R-Sq = 6.4% R-Sq(adj) = 0.0%

Analysis of Variance

Source DF SS MS F P
Regression 3 143.6 47.9 0.18 0.906
Residual Error 8 2111.3 263.9
Total 11 2254.9
(2, 111.3333 − 149.1667)/2
Test Statistic: F = = 39.4626
149.1667/6
Rejection Region: F > F0.05 (2, 6) = 5.14
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences among the treatment
means at α = 0.05.
184 Chapter 12: Analysis of Variance

s  
497.7 531.3 1 1
12.39 a − ± 2.179 10.3716 + = (−8.551, −1.049)
7 7 7 7
Since the entire interval includes negative values, we may conclude at a 95% confidence level that
the mean pressure of iron exceeds that of nickel.
b t0.1/6 ≈ 2.401(degrees of freedom = 12)
s  
497.7 531.3 1 1
− ± 2.401 10.3716 + = (−4.8 ± 4.1331)
7 7 7 7
= (−8.9331, −0.6669)
497.7 491.3
− ± 4.1331 = (−3.2188, 5.0474)
7 7
531.3 491.3
− ± 4.1331 = (1.5812, 9.8474)
7 7
12.41 t0.1/6 ≈ 2.749 (degrees of freedom = 6)
s  
110 109 1 1
Treatments 1 and 2: − ± 24.8611 + = 0.25 ± 9.6921 = (−9.4421, 9.9421). Since
4 4 4 4
this interval includes zero, the difference between treatment 1 and treatment 2 is not significant
at the 90% confidence level.
110 218
Treatments 1 and 3: − ± 9.6921 = (−36.6921, −17.3079). Since this interval does not in-
4 4
clude zero, the difference between treatment 1 and treatment 3 is significant at the 90% confidence
level.
109 218
Treatments 2 and 3: − ± 9.6921 = (−36.9421, −17.5579). Since this interval does not in-
4 4
clude zero, the difference between treatment 2 and treatment 3 is significant at the 90% confidence
level.
s  
73.3 81.5 1 1
12.43 a − ± 3.355 0.0528 + = (−2.128, −1.152)
5 5 5 5
b Assume normal distributions with equal variances.

12.6 The Factorial Experiment


12.45 Summary Statistics: Let Ai = temperature i total and Bj = time j total.
P P 2
yi = 206, yi = 5358, n = 8,
A1 = 103, A2 = 103, B1 = 107, B2 = 99,
T11 = 49, T12 = 58, T21 = 54, T22 = 45,
(206)2
TSS = 5358 − = 53.5.
8
1
(49)2 + · · · + (45)2 = 48.5.

Let SST =
2
1  (206)2
SS(A) = (103)2 + · · · + (103)2 − =0
4 8
1  (206)2
SS(B) = (107)2 + · · · + (99)2 − =8
4 8
SS(A × B) = 48.5 − 0 − 8 = 40.5
SSE = 53.5 − 48.5 = 5.
Chapter 12: Analysis of Variance 185

a ANOVA Table
Source df SS MS F
Treatments 3 48.5
A 1 0.0 0.0 0.0
B 1 8.0 8.0 6.4
A×B 1 40.5 40.5 32.4
Error 4 5.0 1.25
Total 7 53.5
Comparing to F0.05 (1, 4) = 7.71, the interaction between temperature and time is significant at
α = 0.05. s  
1 1
b t0.1/12 ≈ 3.966 (degrees of freedom = 4) and 3.966 1.25 + = 4.4341
2 2
Temperature
Low High
Low y1 = 24.5 y3 = 29
Time
High y2 = 27 y4 = 22.5

Means Simultaneous Confidence Intervals


1, 2 −2.5 ± 4.434 = (−6.934, 1.934)
1, 3 −4.5 ± 4.434 = (−8.934, −0.066)
1, 4 2.0 ± 4.434 = (−2.434, 6.434)
2, 3 −2.0 ± 4.434 = (−6.434, 2.434)
2, 4 4.5 ± 4.434 = (−0.066, 8.934)
3, 4 6.5 ± 4.434 = (2.066, 10.934)

12.47 a Summary Statistics: Let Ai = weight i total and Bj = sex j total


P
yi = 464.4,
P 2
yi = 3003.7527 + 1742.0943 + 1529.8112 + 1427.93 = 7703.5882, n = 32,
A1 = 250.4, A2 = 214, B1 = 262.4, B2 = 202,
T11 = 146.4, T12 = 116, T21 = 104, T22 = 98
1
(146.4)2 + (116)2 + (104)2 + (98)2 − 6739.605 = 174.015

b SST =
8
1
(250.4)2 + (214)2 − 6739.605 = 41.405

SS(A) =
16
1
(262.4)2 + (202)2 − 6739.504 = 114.106

SS(B) =
16
SS(A × B) = 174.015 − 41.405 − 114.005 = 18.504
(c) and (d):
Weight Sex (ni − 1)s2i
L F 324.63
H F 60.0943
L M 177.8112
H M 227.43
789.9655 = SSE
e TSS = SST + SSE = 174.015 + 789.9655 = 963.9805
186 Chapter 12: Analysis of Variance

f ANOVA Table
Source df SS MS F
Treatments 3 174.0150
A 1 41.4050 41.4050 1.4676
B 1 114.1060 114.1060 4.0444
A×B 1 18.6050 18.6050 0.6594
Error 28 789.9655 28.2131
Total 31 963.9805
g Comparing to F0.05 (1, 28) = 4.20, the interaction term is not significant. Thus, the length of
time to complete a task decreases by about the same amount for men and women as their weights
increase. s  
1 1
h 18.30 − 13.00 ± 2.048 28.2131 + = 5.3 ± 5.4391 = (−0.139, 10.739). Since the interval
8 8
includes zero, there is not a significant difference in time to complete the task between light men
and women at α = 0.05.
i 14.50 − 12.25 ± 5.4391 = (−3.189, 7.689). Since the interval includes zero, there is not a significant
difference in time to complete the task between heavy men and women at α = 0.05.
12.49 Summary Statistics: Let Ai = crucible i total and Bj = temperature j total.
yi = 114.2, yi2 = 1096.18, n = 12,
P P

A1 = 55.2, A2 = 59, B1 = 34.3, B2 = 38.5, B3 = 41.4,


T11 = 15.9, T12 = 19, T13 = 20.3,
T21 = 18.4, T22 = 19.5, T23 = 21.1
a ANOVA Table
Source df SS MS F
Treatments 5 8.1567
A 1 1.2033 1.2033 5.9180
B 2 6.3717 3.1858 15.6680
A×B 2 0.5817 0.2908 1.4303
Error 6 1.2200 0.2033
Total 11 9.3767
Comparing to F0.05 (2, 6), the interaction term is not significant at α = 0.05. Then, comparing
to F0.05 (2, 6) = 5.14 and F0.05 (1, 6) = 5.99, we find that temperature is significant at α = 0.05.
There does not appear to be a significant difference between the two types of crucible with respect
to average ash content at α = 0.05.
Chapter 12: Analysis of Variance 187

12.51 Summary Statistics: Let Ai = carbon level i total and Bj = manganese level j total.
yi = 309.3, yi2 = 12024.87, n = 8,
P P

A1 = 145.5, A2 = 163.8, B1 = 149.6, B2 = 159.7,


T11 = 72, T12 = 73.5, T21 = 77.6, T22 = 86.2
a ANOVA Table
Source df SS MS F
Treatments 3 60.9137
A 1 41.8612 41.8612 29.6625
B 1 12.7512 12.7512 9.0354
A×B 1 6.3013 6.3013 4.4650
Error 4 5.6450 1.4113
Total 7 66.5588

Comparing to F0.05 (1, 4) = 7.71, we find that the interaction is not significant at α = 0.05.
Thus, we test ”main effects” and find that both factors, carbon and manganese, are significant at
α = 0.05.
b From part (a), we see that average breaking strength significantly increases as both percentage
carbon and percentage manganese increase. Therefore, we choose the treatment combination of
0.5% carbon and 1.0% manganese.

12.7 Supplementary Exercises


12.53 a Completely Randomized Design.
yi = 117.9, yi2 = 933.33,
P P
b Summary Statistics:
TA = 40.8, TB = 48.9, TC = 28.2,
nA = 5, nB = 6, nC = 4,
(117.9)2
TSS = 933.33 − = 6.636,
6
2 2
(40.8) (48.6) (28.2)2
SST = + + = 3.579
5 6 4
ANOVA Table
Source df SS MS F
Treatments 2 3.5790 1.7895 7.0245
Error 12 3.0570 0.2547
Total 14 6.6360

Hypotheses: H0 : µA = µB = µC Ha : At least two means differ


Test Statistic: F = 7.02
Rejection Region: F > F0.01 (2, 12) = 6.93
Conclusion: Reject H0 at α = 0.01; i.e., the mean time to completion of the task significantly
differs for therthree methods at α = 0.01.
48.9 0.2547
c ± 2.179 = (7.701, 8.599)
6 6
188 Chapter 12: Analysis of Variance

yi = 230.3, yi2 = 4431.17, n = 12,


P P
12.55 Summary Statistics:
TA = 76.5, TB = 78, TC = 75.8,
B1 = 58.2, B2 = 56.2, B3 = 57.9, B4 = 58,
(230)2
TSS = 4431.17 − = 11.3292,
12
1  (230.2)2
SST = (76.5)2 + · · · + (75.8)2 − = 0.6317,
4 12
2
1 (230.2)
SSB = ((58.2)2 + · · · + (58)2 ) − = 0.8558
3 12
ANOVA Table
Source df SS MS F
Treatments 2 0.6317 0.3158 0.1925
Blocks 3 0.8558 0.2853 0.1739
Error 6 9.8417 1.6403
Total 11 11.3292

Comparing the F-value of 0.1925 to F0.05 (2, 6) = 5.14, we find no significant differences in the
mean mileage ratings among the three brands of gasoline at α = 0.05.
c Comparing the F-value of 0.1739 to F0.05 (3, 6) = 4.76, we find no significant difference in the
mean mileage for the four models.
s  
78 75.8 1 1
d − ± 3.707 1.6403 + = (−2.807, 3.907)
4 4 4 4
s  
1 1
e t0.1/6 ≈ 2.749 (degrees of freedom = 6) and 2.749 1.6403 + = 2.4896
4 4
Brands Simultaneous Confidence Intervals
76.5 78
A and B − ± 2.4896 = (−2.865, 2.115)
4 4
76.5 75.8
A and C − ± 2.4896 = (−2.315, 2.665)
4 4
78 75.8
B and C − ± 2.4896 = (−1.940, 3.040)
4 4
Chapter 12: Analysis of Variance 189

yi = 128, yi2 = 948, n = 24,


P P
12.57 a Summary Statistics:
A1 = 24, A2 = 32, A3 = 26, A4 = 49,
B1 = 47, B2 = 39, B3 = 42,
T11 = 6, T12 = 11, T13 = 4, T21 = 9, T22 = 4, T23 = 19,
T31 = 17, T32 = 1, T33 = 8, T41 = 15, T42 = 23, T43 = 11,
(128)2
TSS = 948 − = 265.3333,
24
1  (128)2
SST = (6)2 + · · · + (11)2 − = 247.3333,
2 24
1  (9128)2
SS(A) = (21)2 + · · · + (49)2 − = 74.3333,
6 24
1  (128)2
SS(B) = (47)2 + · · · + (42)2 − = 4.0833,
8 24
SS(A×B) = 247.333 −74.3333 − 4.0833 = 168.9167,
SSE = 265.3333 − 247.3333 = 18
ANOVA Table
Source df SS MS F
Treatments 11 247.3333
A 3 74.3333 24.7778 16.5185
B 2 4.0833 2.0417 1.3611
A×B 6 168.9169 28.1528 18.7685
Error 12 18.0000 1.500
Total 23 265.3333

b Comparing the F-value 18.77 to F0.05 (6, 12) = 3.00 indicates that there is a significant interaction
between the factors at α = 0.05. s  
1 1 q
c t0.1/18 ≈ 2.998 (degrees of freedom = 12), 2.998 1.5 + = 2.1199 and 2.998 1.5( 81 + 81 ) =
6 6
Ai Bj
1.8359. Let Ai = and Bj = .
6 8
Simultaneous Confidence Intervals
A1 − A2 ± 2.1199 = (−3.959, 0.292)
A1 − A3 ± 2.1199 = (−2.959, 1.292)
A1 − A4 ± 2.1199 = (−6.792, 2.541)
A2 − A3 ± 2.1199 = (−1.126, 3.126)
A2 − A4 ± 2.1199 = (−4.959, −0.708)
A3 − A4 ± 2.1199 = (−5.959, −1.708)
B1 − B2 ± 1.8359 = (−0.840, 2.840)
B1 − B3 ± 1.8359 = (−1.215, 2.465)
B2 − B3 ± 1.8359 = (−2.215, 1.465)

The intervals that contain zero indicate a nonsignificant difference. Hence, levels 1, 2, and 3 of
factor A are not significantly different, whereas level 4 is significantly different from the others
at α = 0.05. Also, the levels of factor B are not significantly different. To select the treatment
combination with the largest mean, we take level 4 of factor A and any level of factor B.
190 Chapter 12: Analysis of Variance

12.59 a Completely Randomized Design.


yi = 1311, yi2 = 108587,
P P
b Summary Statistics:
T1 = 506, T2 = 400, T3 = 405,
n1 = 6, n2 = n3 = 5,
(1311)2
TSS = 108587 − = 1166.9374,
16
(506)2 (400)2 (405)2 (1311)2
SST = + + − = 57.6042
6 5 5 16
ANOVA Table
Source df SS MS F
Treatments 2 57.6042 28.8021 0.3375
Error 13 1109.3333 85.3333
Total 15 1166.9374
c Comparing the F-value 0.34 to F0.05 (2, 13) = 3.81, we see that there is not a significant difference
in the mean productivities for the three lengths of workdays at α = 0.05.
r
405 85.3333
d ± 1.771 = (73.684, 88.316)
5 5
12.61 Summary Statistics: Let Ai = payment method i total and Bj be the scheduling type j total.
yi = 1192, yi2 = 91198, n = 16,
P P

A1 = 520, A2 = 672, B1 = 558, B2 = 634,


T11 = 240, T12 = 280, T21 = 318, T22 = 354,
(1192)2
TSS = 91190 − = 2394,
16
1  (1192)2
SST = (240)2 + · · · + (354)2 − = 1806,
4 16
2
1  (1192)
SS(A) = (520)2 + (672)2 − = 1444,
8 16
1  (1192)2
SS(B) = (558)2 + (634)2 − = 361,
8 16
SS(A×B) = 1806 − 1444 − 361 = 1,
SSE = 2394 − 1806 = 588
a ANOVA Table
Source df SS MS F
Treatments 3 1806
A 1 1444 1444 29.47
B 1 361 361 7.37
A×B 1 1 1 0.02
Error 12 588 49
Total 15 2394
b Comparing the F-value 0.02 to F0.05 (1, 12) = 4.75, we find that the interaction is not significant
at α = 0.05.
Chapter 12: Analysis of Variance 191

s  
1 1
c t0.05/4 ≈ 2.597 (degrees of freedom = 12) and 2560 49 + = 12.6714
4 4
520 672
A1 − A2 ± 12.6714 = − ± 12.6714 = (−50.671, −25.329)
4 4
558 634
B1 − B2 ± 12.6714 = − ± 12.6714 = (−31.671, −6.329)
4 4
The hourly and piece rate is significantly higher than the hourly rate and the worker-modified
schedule is significantly higher than the 8−5 schedule. Thus, we recommend the hourly and piece
rate and the worker-modified schedule.

12.63 a Randomized Block Design.


yi = 57.4, yi2 = 222, n = 15,
P P
b Summary Statistics:
T1 = 10.3, T2 = 11.2, T3 = 12.1, T4 = 12.2, T5 = 12.6,
B1 = 18.8, B2 = 20.1, B3 = 18.5,
(57.4)2
TSS = 222 − = 2.3493,
15
1  (57.4)2
SST = (10.3)2 + · · · + (12.6)2 − = 1.7293,
3 15
1  (57.4)2
SSB = (18.8)2 + (20.1)2 + (18.5)2 − = 0.2893
5 15
ANOVA Table
Source df SS MS F
Treatments 4 1.7293 0.4323 10.4673
Blocks 2 0.2893 0.1447 3.5036
Error 8 0.3307 0.0413
Total 14 2.3493
Comparing the F-value 10.47 with F0.05 (4, 8) = 3.84 indicates that there are significant differences
in the mean soil pH levels among the five concentrations of lime at α = 0.05.
c Comparing the F-value 3.50 with F0.05 (2, 8) = 4.46 indicates that there is not a significant
difference in soil pH levels among the locations at α = 0.05.
yi = 450, yi2 = 13876, n = 15,
P P
12.65 Summary Statistics:
T1 = 138, T2 = 179, T3 = 133,
(450)2
TSS = 13876 − = 376
15
(138)2 (179)2 (133)2 (450)2
SST = + + − = 254.8
5 5 5 15
ANOVA Table
Source df SS MS F
Treatments 2 254.8 127.4 12.61
Error 12 121.2 10.1
Total 14 376

Comparing the F-value 12.61 with F0.05 (2, 12) = 3.89, we see that there are significant differences
among the three batches.
192 Chapter 12: Analysis of Variance

b To compare the three batches, we use 90% simultaneous confidence intervals with t0.01/6 ≈ 2.403
s  
1 1
(degrees of freedom = 12) and 2.403 10.1 + = 4.8300.
5 5
Batches Simultaneous Confidence Intervals
138 179
1 and 2 − ± 4.8300 = (−13.03, −3.37)
5 5
138 133
1 and 3 − ± 4.8300 = (−3.83, 5.83)
5 5
179 133
2 and 3 − ± 4.8300 = (4.37, 14.03)
5 5
Since batch 2 is significantly different from batches 1 and 3, we select batch 2 to give the largest
mean brightness.
12.67 ANOVA Table
Source df SS MS F p
Treatments 7 2341400.0
Diameter 1 530450.0 530450.0 19.57 0.0013
Thickness 2 1352744.4 676372.2 24.95 0.0001
Temp. 2 201036.1 100518.1 3.71 0.0624
RH 2 257169.4 128584.7 4.74 0.0356
Error 10 271061.1
Total 17 2612461.1

At the 0.05 significance level, the factor Temperature is the only one not significant.
12.69 For As /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Pk 2
Pk 2
i=1 ni (ȳi − ȳ) /(k − 1) 2 i=1 (ni − 1)Si
Test Statistic: F = where S p =
Sp2 n−k
In this case,
(7 − 1)(0.15)2 + (11 − 1)(0.12)2 + (31 − 1)(0.067)2 + (5 − 1)(0.37)2
Sp2 =
54 − 4
= 0.0192
and
7(0.46 − 0.5456)2 + 11(0.48 − 0.5456)2 + 31(0.56 − 0.5456)2 + 5(0.72 − 0.5456)2
F = = 4.4641.
(4 − 1)0.0192
Rejection Region: F > F0.05 (3, 50) = 2.79
Conclusion: Reject H0 at α = 0.05; i.e., the mean mass ratio of arsenic to copper is significantly
higher at the plume than at the other sites.
Chapter 12: Analysis of Variance 193

For Cd /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Test Statistic:
(10 − 2)(0.017)2 + (11 − 1)(0.024)2 + (31 − 1)(0.011)2 + (5 − 1)(0.022)2
Sp2 =
57 − 4
= 0.0003
10(0.068 − 0.0757)2 + 11(0.087 − 0.0757)2
F =
(4 − 1)0.0003
31(0.074 − 0.0757)2 + 5(0.077 − 0.0757)2
+ = 2.3317
(4 − 1)0.0003
Rejection Region: F > F0.05 (3, 53) = 2.7791
Conclusion: Do not reject H0 at α = 0.05; i.e., there is no significant difference in the mean mass
ratio of cadmium to copper at any of the four sites.
For Pb /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Test Statistic:
(13 − 1)(0.16)2 + (11 − 1)(0.17)2 + (49 − 1)(0.07)2 + (4 − 1)(0.23)2
Sp2 =
77 − 4
= 0.0136
13(1.03 − 0.8786)2 + 11(0.94 − 0.8786)2 + 49(0.82 − 0.8786)2 + 4(0.90 − 0.8786)2
F =
(4 − 1)0.0136
= 12.4890
Rejection Region: F > F0.05 (3, 73) = 2.7300
Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference in the mean mass ratio of
lead to copper at the Tucson Research ranch site from 8/84−10/84 and the Bisbee site.

For Sb /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Test Statistic:
(3 − 1)(0.019)2 + (7 − 1)(0.018)2 + (11 − 1)(0.016)2 + (5 − 1)(0.034)2
Sp2 =
26 − 4
= 0.0004
3(0.073 − 0.0821)2 + 7(0.078 − 0.0821)2 + 11(0.10 − 0.0821)2 + 5(0.054 − 0.0821)2
F =
(4 − 1)(0.0099)
= 6.5315

Rejection Region: F > F0.05 (3, 22) = 3.0491

Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference in the mean mass ratio of
antimony to copper at the plume site and the Tucson Research Ranch site (8/84−9/85).
194 Chapter 12: Analysis of Variance

For Zn /Cu :

Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different

Test Statistic:

(13 − 1)(0.31)2 + (11 − 1)(1.1)2 + (49 − 1)(1.6)2 + (5 − 1)(0.67)2


Sp2 =
78 − 4
= 1.8639

13(1.64 − 7.0682)2 + 11(4.2 − 7.0682)2 + 49(9.7 − 7.0682)2 + 5(1.7 − 7.0682)2


F =
(4 − 1)1.8639
= 171.1504

Rejection Region: F > F0.05 (3, 74) = 2.7283

Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference in the mean mass ratio of
zinc to copper for the Tucson Research ranch site (8/84−9/85) compared with the other sites.

12.71 Exercise for student.


(
0 if not in Basin EE
12.73 Let X1 =
1 if in Basin EE
(
0 if not in Basin F
X2 =
1 if in Basin F
(
0 if not in Basin G
X3 =
1 if in Basin G
(
0 if not in Basin M
X4 =
1 if in Basin M
Chapter 12: Analysis of Variance 195

Minitab output follows:


Regression Analysis: mean versus x_1, x_2, x_3, x_4, depth

The regression equation is


mean = - 0.0067 + 0.054 x_1 + 0.391 x_2 + 0.086 x_3 + 0.103 x_4 + 0.00619 depth

Predictor Coef SE Coef T P


Constant -0.00672 0.09014 -0.07 0.942
x_1 0.0545 0.1027 0.53 0.606
x_2 0.3910 0.1106 3.53 0.005
x_3 0.0862 0.1055 0.82 0.431
x_4 0.1032 0.1103 0.94 0.370
depth 0.006195 0.004644 1.33 0.209

S = 0.133983 R-Sq = 63.6% R-Sq(adj) = 47.1%

Analysis of Variance

Source DF SS MS F P
Regression 5 0.34549 0.06910 3.85 0.029
Residual Error 11 0.19747 0.01795
Total 16 0.54296
12.75 Standardize all the means, and test for outliers. (Exercise for student.)

12.77 a Minitab output follows:


Source DF SS MS F P
Device 2 0.000897 0.000448 1.42 0.312
Error 6 0.001889 0.000315
Total 8 0.002786
Since p = 0.312 is greater than α = 0.05, we cannot reject the null hypothesis that there is no
difference in the mean distances for each semi-active device.
b Minitab output follows:
Source DF SS MS F P
Devic 2 0.00248 0.00124 0.31 0.745
Error 6 0.02409 0.00401
Total 8 0.02657
Since p = 0.745 is greater than α = 0.05, we cannot reject the null hypothesis that there is no
difference in the mean interstory drifts for each semi-active device.
c Minitab output follows:
Source DF SS MS F P
Device 2 25.92 12.96 10.39 0.011
Error 6 7.48 1.25
Total 8 33.40
Since p = 0.011 is less than α = 0.05, we reject the null hypothesis that there is no difference
in the mean acceleration for each semi-active device and conclude that at least one of them is
different from the other two.
196 Chapter 12: Analysis of Variance

d Minitab output follows:


Source DF SS MS F P
Device 4 0.16748 0.04187 4.21 0.030
Error 10 0.09937 0.00994
Total 14 0.26684
Since p = 0.030 is less than α = 0.05, we reject the null hypothesis that there is no difference in
the mean distance for each device, and conclude that at least one is different from the other four.
e Minitab output follows:
Source DF SS MS F P
Device 4 0.9136 0.2284 16.34 0.000
Error 10 0.1398 0.0140
Total 14 1.0534
Since p ≈ 0 is less than α = 0.05, we reject the null hypothesis that there is no difference in the
mean interstory drift for each device, and conclude that at least one is different from the other
four.
f Minitab output follows:
Source DF SS MS F P
Device 4 115.76 28.94 9.62 0.002
Error 10 30.09 3.01
Total 14 145.85
Since p = 0.002 is less than α = 0.05, we reject the null hypothesis that there is no difference in
the mean acceleration for each device, and conclude that at least one is different from the other
four.
g Minitab output follows:
Source DF SS MS F P
Device 1 0.5866 0.5866 24.44 0.008
Error 4 0.0960 0.0240
Total 5 0.6826
Comparing the interstory drifts from buildings equipped with brushless DC dampers and those
equipped with none at all, with p = 0.008, we can conclude that it is better to have brushless DC
dampers.
12.79 a Hypotheses: H0 : The different materials all have the same mean Bauschinger modulus reduction.
Ha : at least one material has a different mean Bauschinger modulus reduction.
Minitab output follows:
Source DF SS MS F P
Total strain 3 0.0393587 0.0131196 67.75 0.000
Material 3 0.0009202 0.0003067 1.58 0.260
Error 9 0.0017428 0.0001936
Total 15 0.0420218
Since amongst the different materials we find p = 0.260 is greater than α = 0.05, we cannot reject
the null hypothesis that the mean modulus reduction is the same for each material.
b In order for this type of analysis to be valid, the test runs must be randomized, so some other
method must be used to analyze the data.

You might also like