Professional Documents
Culture Documents
1
2 Chapter 1: Data Collection and Exploring Univariate Distributions
1.7 a Pareto charts for lead pollution sources in 1980, 1990 and 2000:
Chapter 1: Data Collection and Exploring Univariate Distributions 3
b Lead emissions seem to have decreased since 1980, especially in the areas of transportaion and
miscellaneous fuel combustion sources.
c The evidence seems to suggest that we are releasing lead pollutants into our environment at a
decreased rate since 1980.
The largest number were employed by the information technology services, followed by engineering
and RFD services, and missile space vehicle manufacturing.
b Bar Chart of number of employees per company amongst Alabama Aerospace fields:
Although information technology services employed the largest number of employees, they were
not, on average, large employers. Engineering RFD services and missile space vehicle manufactur-
ing employed fewer people than the information technology services, yet they employed far more
people on average per company.
4 Chapter 1: Data Collection and Exploring Univariate Distributions
1.11 a Bar chart of crude Steel production by region in 2004 and 2003:
b In general, the crude oil production has increased from 2003 to 2004.
1.13 The data are grouped together in a histogram, losing identity of individual observations, which are still
retained by dotplot. A small number of observations makes it difficult to notice any patterns. Gaps in
the data are visible from a dotplot but are not identified from a histogram.
0 + 27 + 12 + 0
1.15 a = .078 = 7.8%
500
b There were no rods of length .999, and an abnormally large amount of rods of length 1.000.
This may indicate that someone may have been inappropriately placing .999 rods into the 1.000
category to prevent them from being declared defective.
1.17 a Yes, in 1890, most of the population was in the younger age ranges. In 2005, a larger percentage
of the population are in the upper age ranges. This might suggest that there have been some sort
of medical advances to improve life expectancy and quality of life over time.
b Percent of population under 30 in 1890
= 25% + 22% + 18% = 65%.
Percent of population under 30 in 2005
= 13.3% + 14.5% + 13.4% = 41.2%
c The percentage of older population has increased. In 1890, the percentage of population in
different age categories decreased steadily with the increasing age. In 2005, it is fairly evenly
distributed across different age groups except for the two oldest age groups.
Chapter 1: Data Collection and Exploring Univariate Distributions 5
1.19 a Histograms and dotplots displaying indices of industrial production in 1990 and 1998:
6 Chapter 1: Data Collection and Exploring Univariate Distributions
There is a considerable increase in average index from 1990 to 1998, indicating an increase in
industrial product by most of the countries in general. The indexes were more spread out in 1990
compared to 1998. The distribution is right-skewed in 1998. There might be an outlier on the
upper end in 1990.
b Histogram and dotplot of the difference in production from 1990 to 1998:
Most countries showed improvement (an increase of up to 40 points) in the industrial production;
one country in particular showed a tremendous amount of improvement. Only two countries
showed a decrease.
Chapter 1: Data Collection and Exploring Univariate Distributions 7
b
(−38.39) + (−20.34) + (−46.42) + . . . + (−11.54)
x̄ =
10
−6.99
= = −.699
r10
((−38.39 − (−.699))2 + . . . + (−11.54 − (−.699))2 )
s=
9
r
22376.5
= = 49.86
9
c Minitab output follows:
Descriptive Statistics: % Change
Variable N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
% Change 10 -0.699 15.8 49.9 -46.4 -36.9 -24.5 56.7 92.4
(−20.34) + (−28.59)
Median = = −24.47
2
IQR = 56.7 − (−36.9) = 93.6
d Probably not, because the distribution is skewed with outliers on the higher end that affect the
values of mean and standard deviation. Minitab output follows:
Descriptive Statistics: % Change
Variable N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
% Change 9 -11.0 13.3 39.9 -46.4 -37.4 -28.6 22.4 57.9
1.23 The word ‘average’ here probably refers to the mean, in which case a few winters with very deep snow
pack (right skewed data) would have made the mean larger than the median (and hence more than
50% of the data would be below the mean)
8 Chapter 1: Data Collection and Exploring Univariate Distributions
(4.0)(30) + (4.2)(33)
1.25 a Composite Mean = = 4.10476
63
(4.2)(30) + (2.7)(29) + (3.0)(29) + (4.2)(30) + (3.0)(30)
b Composite Mean = = 3.4277
30 + 29 + 29 + 30 + 30
1.27 a
2.8 + 3.1 + 4.7 + . . . + 9.0
x̄MP = = 5.78
r 12
(2.8 − 5.78)2 + (3.1 − 5.78)2 + . . . + (9.0 − 5.78)2
sMP =
11
r
40.6625
= = 1.923
11
14 + 15 + 15 + . . . + 22
x̄FOB = = 15
r 12
(14 − 15)2 + (15 − 15)2 + . . . + (22 − 15)2
sFOB =
11
r
182
= = 4.068
11
b The variation in percent bridges recorded among southeastern states seem to be comparable
for structurally deficient and functionally obsolete bridges, however, there are a higher mean
percentage of functionally obsolete bridges than structurally deficient ones.
Chapter 1: Data Collection and Exploring Univariate Distributions 9
Both, the arrival and departure time distributions are left-skewed, arrival times more so than
the departure times. The median percentage of on-time departures is higher than the median
percentage of on-time arrivals. Both the distributions have about the same range. Both the
distributions have outliers on the lower end, indicating a low-performing airport (or airports).
1
b = 3.125%
32
0
c = 0%
32
d For arrival data, we find that x̄ = 81.33 and s = 4.558. For departure data, we find that x̄ = 85.23
and s = 3.417.
Arrivals Departures
k (x̄ − ks, x̄ + ks) % Data in Interval (x̄ − ks, x̄ + ks) % Data in Interval
1 (76.772, 85.888) 78.1% (81.813, 88.647) 71.9%
2 (72.214, 90.446) 93.75% (78.396, 92.064) 96.9%
10 Chapter 1: Data Collection and Exploring Univariate Distributions
Departure data seems to agree more strongly with the empirical rule, which says that around
68% should lie within 1 standard deviation of the mean and 95% should lie within 2 standard
deviations of the mean.
e The range representing values within 5% of the mean is
(81.33 − 0.05(81.33), 81.33 + .05(81.33)) = (77.2635, 85.3965). 24 or 75% of airports have percent
on-time arrivals in this range.
f The range representing values within 5% of the mean is
(85.23− 0.05(85.23), 85.23+ .05(85.23)) = (80.9685, 89.4915). 28 or 87.5% of airports have percent
on-time arrivals in this range.
g Looking at the boxplots, we see that for arrival times, the three lowest: Chicago O’Hare, Newark
Int and New York LaGuardia qualify as outliers, and for departure times, Chicago O’Hare is an
outlier.
80.5 − 81.33
h zATLarrivals = = −0.1821
4.558
83.5 − 85.23
zATLdepartures = = −0.5063
3.417
67 − 81.33
zCHIarrivals = = −3.1439
4.558
73.4 − 85.23
zCHIdepartures = = −3.4621
3.417
Atlanta is -0.1821 standard deviations below the mean for percent on-time arrivals and -0.5063
standard deviations below the mean for percent on-time departures. Atlanta is better with on-
time arrivals than on-time departures. Chicago O’Hare is -3.1439 standard deviations below the
mean for percent on-time arrivals and -3.4621 standard deviations below the mean for percent
on-time departures. O’Hare is also better with on time arrivals than on-time departures.
1.33 a Histograms and Boxplots for motor vehicle deaths in 1980 and 2002:
Chapter 1: Data Collection and Exploring Univariate Distributions 11
The skewness in the distribution and the abundance of outliers in the 1980 data indicate that the
median and IQR will describe these datasets better than the mean and median.
b Washington, DC; Idaho; Montana; West Virginia; Wyoming; Arizona; New Mexico; Louisiana;
and Nevada. These states, except for DC, have low population densities, which may mean that
medical teams must travel large distances to provide help to accident victims. In DC, medical
teams should be able to arrive at accidents much more quickly.
c Based on the data, even though more vehicles are probably using the highways in 2002 than
in 1980, the median rate of motor vehicle deaths has decreased, which may indicate that safety
measures have improved in that time.
12 Chapter 1: Data Collection and Exploring Univariate Distributions
The distribution of annual temperatures in Central Park is slightly left-skewed. The temperatures
ranged from about 50◦ F to 57◦ F with a mean about 54◦ F. There are no outliers. The distribution
of temperatures at Newnan is slightly right-skewed. The temperatures ranged from about 58◦ F
to 66◦ F with a mean about 62◦ F. There are no outliers.
c The shapes of the two distributions indicate that Central Park has seen more years with warmer
temperatures and Newnan more years with cooler temperatures during the last century. On the
average, Newnan is warmer than the Central Park. The range of temperatures is about the same
at both locations.
1.37 Bar chart of classification of voters by income:
The percentage of eligible voters who voted in the 2000 presidential election increased steadily with
the household income group. From the lowest income group, the lowest percentage of voters voted,
whereas from the highest income group the highest percentage of voters voted in this election.
14 Chapter 1: Data Collection and Exploring Univariate Distributions
The median number of people on collapsing bridges was between 26 and 150; the data are skewed to
the right, so most of the bridges had a relatively small crowd when they collapsed; the spread is small,
a vast majority of the collapses occurred with a relatively small number of people on the bridge. The
crowd size on collapsing bridges ranged from less than 26 to more than 750.
1.41 a Bar Chart showing energy, max peak demand, and thermal savings over time:
b Every month the energy savings are the highest and the thermal savings are the lowest. The
energy savings show a cycle with highest savings during the summer months and lowest savings
during the winter months. On the other hand, thermal savings are highest during the winter
months and lowest during the summer months, showing exactly opposite cycles. The maximum
peak demand savings are higher in general during summer months and lower in the winter months.
Chapter 1: Data Collection and Exploring Univariate Distributions 15
No outliers.
16 Chapter 1: Data Collection and Exploring Univariate Distributions
Fuel combustion is the largest contributor of sulfur dioxide emissions. Although the amount of contri-
bution decreased over the years, it is still a major contributor. Amount of contribution by industrial
processes decreased over the years, but the percentage of total emission increased over the years. The
percent contribution by transportation increased slightly.
18 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
Chapter 2
b The mortality rate is highest among the middle-seat passengers compared to the front or rear-seat
passengers. The mortality rate is lowest among the rear-seat passengers.
19
20 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
A public sewer is used most commonly in all four geographical areas. Other methods of sewage disposal
are the least used in all four geographical areas. The western region has the highest percentage of public
sewage users and the southern region the lowest. The southern region has the highest rate of septic
system users.
The total sulfur dioxide estimates decreased steadily until 1994; then a sudden drop was observed
from 1994 to 1995. After 1995, they increased steadily till 1998 and then decreased.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 21
The sulfur dioxide emission estimates from fuel combustion, particularly from electrical utilities,
decreased steadily until 1994 and dropped considerably in 1995. They increased for the next three
years and then decreased. All other categories are minor contributors of sulfur dioxide emissions,
and their amounts decreased slightly over the years, except for transportation, which showed an
increase from 1995 to 1996 and increased slightly thereafter.
2.7 a Time series plot of energy cost for each sector:
b The average fuel price showed an increasing trend in residential and commercial sectors, with sud-
den increases in the early 1980s. Both sectors showed very similar trends with similar prices. The
transportation sector has higher average fuel prices. Although the industrial and transportation
sectors showed similar trends in average fuel prices over the years, the difference in the average
fuel prices increased slightly. In both of these sectors, the prices increased till the mid-80s and
then experienced a drop followed by a decade of stable prices. Then the prices showed a steady
increasing trend.
22 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
b Total U.S. energy consumption rose from 1990 to 1996, then fell until 1998, then rose again to
1999.
c Time series plot of hydroelectric power consumption as a percentage of total consumption:
The percentage of hydroelectric power fell from 1990 to 1994, then rose again until 1997, after
which it declined again.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 23
2.11 a Time series plot for urban and rural federal-aid highways:
b From 1980 to 1998, the highways constructed in urban areas (in thousand miles) have increased
steadily and those in the rural areas have decreased steadily. However, rural area highways still
represent a major portion of federally funded highways, consisting of about four times that of
urban miles.
2.13 Time series plots for age-adjusted deaths caused various diseases:
The age-adjusted death rate per 100,000 population due to heart disease decreased steadily until the
year 2001, reducing it to about half of what it was in 1960. The death rate due to cancer, on the other
hand, showed a slight increase until 1995 and then a steady decrease. The death rate due to diabetes
showed a slight decrease with some fluctuations. The death rates due to influenza and liver disease
showed a slight decrease over the years.
24 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
There is extremely strong positive correlation between median weekly earnings and years of school-
ing. As the years of schooling increase so do the median earnings. Although there is a slight
curvature, a linear fit seems like a very good fit.
b Scatterplot of unemployment vs years of schooling:
There is very strong negative correlation between the years of schooling and unemployment rate.
The higher the number of years of schooling, the lower the unemployment rate. Linear will be a
good fit but a non-linear fit (like a quadratic) will be even better.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 25
b There is a moderate positive correlation between the per capita expenditures and per capita tax
revenue. Generally, higher per capita tax revenue resulted in an increased per capita expenditure.
There is one outlier that distorts the data.
c Scatterplot of the difference in expenditures and taxes vs expenditures:
There does not seem to be any relation between expenditure and the differences. There is one
outlier state that gives a wrong impression of positive relation.
26 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
d Scatterplot of per capita expenditures vs per capita tax revenue with Alaska removed:
The scatterplot shows a positive relation between expenditure and tax revenue.
2.19 a Scatterplot of rankings by executives versus rankings by professors:
b There is a strong positive relation between ranking by executives and professors. In general skills
ranked by professors are also ranked higher by executives.
d Strength of relation is easily determined from the value of the correlation coefficient but not
necessarily from the scatterplot.
−0.993
2.23 a rREV/EXP = = −0.020
49
−4.437
rREV/TAX = = −0.091
49
32.479
rTAX/EXP = = 0.663
49
The correlation coefficient of 0.789 and the scatterplot from exercise 2.19 indicate a strong positive
relationship between professors’ responses and executives’ responses.
−7.87
2.27 rmesh/wout = = −0.984
8
−7.94
rmesh/with = = −0.993
8
−7.72
rmesh/diff = = −0.965
8
a r = −0.984 There is a very strong negative correlation between mesh size and iterations without
reconjugation. (see tables)
b r = −0.993 There is a very strong negative correlation between mesh size and iterations with
reconjugation. (see tables)
c r = −0.965 There is a very strong negative correlation between mesh size and the difference in
iterations without and with conjugation.
b A 1 dollar increase in per capita tax revenue tended to result in an increased per capita expenditure
of about 35 cents.
There is a very strong negative linear relationship between body fat and body density. The
relationship is linear.
b x̄density = 1.0449 ȳ%fat = 23.88
X X
(xi − x̄)(yi − ȳ) = −2.619 (xi − x̄)2 = 0.00568
P
(xi − x̄)(yi − ȳ) −2.619
β̂1 = = = −461
(xi − x̄)2
P
0.00568
β̂0 = ȳ − β̂1 x̄ = 23.88 − (−461)(1.0449) = 505
From the scatterplot we can see that the data are separated into 3 distinct clusters. It is difficult
to determine if the relationship is a linear one because of the lack of data in between the clusters.
b The weigh-in-motion readings seem to have a much stronger linear relationship with the static
weights of vehicles after calibration than before.
X
1 8.685
c ry1 vsx = z x zy = = 0.965
n−1 9
X
1 8.964
ry2 vsx = z x zy = = 0.996
n−1 9
The correlation of weigh-in-motion readings and static weight is much stronger after calibra-
tion than before calibration. This means that the weigh-in-motion readings after calibration are
better predictors of the static weight.
d Yes. If the r-value is 1, then there is an exact linear relationship between y2 and x; i.e. y2 = a+bx.
However, unless a = 0 and b = 1, the readings could still disagree. (e.g. if the weigh-in-motion
always gave exactly half of the static weight.)
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 31
b Most of the residuals seem to have a negative linear correlation. Also, there is one distant outlier.
32 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
c Because the residuals are not randomly distributed around 0, a linear relationship may not be
appropriate.
b The residual plot indicates the possibility of two groups. The first part indicates a decreasing
trend among the residuals. The latter part shows no pattern.
c A linear model may not be appropriate. The possibility of groups must be investigated.
2.49 a Residual plot of percent body fat vs body density:
There is a fairly strong positive linear relation between crude oil and gasoline prices.
b x̄crude = 19.53 ȳgas = 86.87
X X
(xi − x̄)(yi − ȳ) = 1893.59 (xi − x̄)2 = 642.22
P
(xi − x̄)(yi − ȳ) 1893.59
β̂1 = = = 2.95
(xi − x̄)2
P
642.22
β̂0 = ȳ − β̂1 x̄ = 86.87 − (2.95)(19.53) = 29.3
2.8 Transformations
2.53 a x̄x/a = 0.337 ȳStress = 289.4
X X
(xi − x̄)(yi − ȳ) = −229.02 (xi − x̄)2 = 0.93
P
(xi − x̄)(yi − ȳ) −229.02
β̂1 = = = −247
(xi − x̄)2
P
0.93
β̂0 = ȳ − β̂1 x̄ = 289.4 − (−247)(0.337) = 343
83.2 percent of the variation in hoop stress can be explained as a linear relationship with x/a
c Scatterplot of hoop stress vs x/a:
The scatterplot clearly indicates a curvature to the data. Nonlinear is more appropriate.
d Of the transformations discussed in this chapter, power transformation seems to work best. Com-
paring the ln of the hoop stress measurements with the ln of the x/a measurements, we get:
So, when x/a is 0.30, the hoop stress should be around 244.45, according to our model.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 35
2.55 a Scatterplots of relative water depth vs tidal velocity at stations 16 and 18:
CO and SO2 show steady decline. Pb declined sharply during the first decade and then declined
slowly over the next decade. NO2 remained fairly constant during the first decade and declined
over the next. Overall, ozone level declined, with a few ups and downs.
2.59 Yes. The number of semiactive dampers used increased with the height as well as the number of stories:
2.63 a Time series plot of number of households in the US from 1970 to 2002:
The number of households in the US has increased steadily from 1970 to 2002.
b Bar chart of percentage of household sizes in 1900 and 2002:
The distribution of household sizes in 2000 is right skewed with a median household size of 2
persons. Household sizes in 1900 are less skewed with a median household size of 4 persons, and
a large number of households with 7 or more persons. In 1900 about 20% of households had 7 or
more persons, compared to 1.4% in 2000. In 1900, only about 5% of households had 1 person,
compared to over 25% in 2000. Overall, household size has decreased considerably.
c The number of households has increased, but the number of persons per household has decreased.
Chapter 2: Exploring Bivariate Distributions and Estimating Relations 39
2.65 a Scatterplot of zeta potential against the solution pH for each NaNO3 concentration:
b The pH and the zeta potential display strong negative linear correlations for both concentrations.
40 Chapter 2: Exploring Bivariate Distributions and Estimating Relations
Obtaining Data
3.3 a Experiment - The data was gathered in a controlled environment to compare before and after the
classes.
b Different engineers and inspectors may have different levels of knowlede. The pretest was given
to determine the existing level of knowledge, in order to measure the level of improvement as a
result of training.
3.5 Experiment - The data was gathered in a controlled manner to compare the four thread types.
3.11 e.g. the computer system may help alleviate bias from people who may be embarrassed or otherwise
unwilling to report having watched certain shows.
41
42 Chapter 3: Obtaining Data
3.17 a E.G. Divide the 30 small businesses into two groups randomly. Tell one group that they would be
charged peak hour rates and tell the other group that they would be charged the regular rates.
Measure their electricity usage and compare.
b No. The businesses would need to be informed of the charges the utility was imposing.
Probability
4.3 Printer
No Yes Total
No 13 7 20
Modem Yes 2 3 5
Total 15 10 25
43
44 Chapter 4: Probability
Shaded region is AB ∪ AC
8! 8 · 7 · 6 · 5! 8·7·6
= = = 168.
2!5!1! 2 · 5! 2
46 Chapter 4: Probability
b There are eight possible choices for the taxi at airport C. Therefore, if the choice is made at
random, P(Jones ends up at airport C) = 1/8.
employee ranked number
4.23 a P
one is selected
number of ways number of ways two
employee ranked number other employees can be chosen
one can be chosen from the remaining four
=
total number of ways
a sample of three
can be chosen
1 4
1 2 3
= =
5 5
3
highest ranked employee
employee ranked number
b P among those selected =1−P
one is selected
has rank ≤ 2
3 2
=1− =
5 5
employees ranked 4 and 5
c P
are selected
number of ways number of ways one
employee ranked other employee can be chosen
4 and 5 can be chosen from the remaining three
=
total number of ways
a sample of three
can be chosen
2 3
2 1 3
= =
5 10
3
number of ways three
orders may go to
different distributors
all orders go to
4.25 a P =
different distributors (total number of outcomes)
P35 5·4·3 60
= 3 = = = 0.48
5 125 125
number of ways to choose the
distributor who will receive
all three orders
all orders go to
b P =
same distributor (total number of outcomes)
5
1 5 1
= 3 = = = 0.04
5 125 25
Chapter 4: Probability 47
exactly two of the number of ways to choose the total
c P three orders go to one = distributor who will receive number of
particular distributor the two orders outcomes
number of ways the other number of ways of choosing the
order may go to any of two orders going to the particular
the other four distributors distributor from the three others
×
(total number of outcomes)
5 4 3
1 1 2 5·4·3 60
= 3
= = = 0.48
5 125 125
4.27 a The number of ways of partitioning nine wrenches into three groups, each containing 3 wrenches,
is
9!
= 1, 680
3!3!3!
number of ways of partitioning the seven new
wrenches into groups of 1, 3, and 3 wrenches
b Required probability =
number of ways of partitioning nine
wrenches into three equal groups
7!
1!3!3! 140
= = = 0.0833
9! 1, 680
3!3!3!
4.35 Let L = the event that a selected keyboard has a faulty letter key.
Let S = the event that a selected keyboard is produced in the South Carolina facility.
15 + 75 90
P (L) = = = 0.36
15 + 75 + 45 + 30 + 40 + 45 250
P (LS) 75/250 75
P (L|S) = = = = 0.50
P (S) (75 + 30 + 45)/250 150
Since P (L) 6= P (L|S), we can conclude that the events of selecting a keyboard with a faulty letter key
and selecting a keyboard produced in South Carolina are not independent.
b Let Ei = event that ith resistor has resistance in excess of 10.5 ohms,
i = 1, 2. Then P (at least one has an actual value greater than 10.5)
= P (E1 E2 ∪ E1 E 2 ∪ E 1 E2 ) = P (E1 )P (E2 ) + P (E1 )P (E 2 ) + P (E 1 )P (E2 )
= (0.1)(0.1) + 2(0.1)(0.9) = 0.19 or
P (at least one has an actual value greater than 10.5)
= 1 − P (E 1 E 2 ) = 1 − (0.9)2 = 0.19.
4.45 Let Ci = event that relay i closes properly, i = 1, 2. Then
P (current flows in series system)
= P (both relays are closed)
= P (C1 C2 ) = P (C1 )P (C2 ) = (0.9)(0.9) = 0.81
P (current flows in parallel system)
= P (at least one of the relays is closed)
= P (C1 ∪ C2 ) = P (C1 ) + P (C2 ) − P (C1 C2 )
= 0.9 + 0.9 − (0.9)(0.9) = 0.99
4.47 Let F denote the event that a worker fails to learn the skill correctly.
P (A)P (F |A) (0.70)(0.20)
P (A|F ) = =
P (F |A)P (A) + P (F |B)P (B) (0.20)(0.70) + (0.10)(0.30)
= 0.8235
4.49 a The column percentages are based on the column totals and hence do not predict the percentages
of the work force as a whole.
b 2.385 million.
c Combined category
Type of Employer Percentage
Industry 59.25
Educational institution 21.57
Nonprofit organization 3.72
Federal government 8.84
Military 0.62
Other government 4.46
Other and unknown 1.53
This distribution is not as skewed as the one for engineers. It is similar to the overall distribution
of employment by type of employer.
d 0.514 million.
e The distributions for physical scientists and for social scientists are similar, whereas the one for
engineers is skewed.
50 Chapter 4: Probability
12
≤ 159
788
49
160 − 209
3, 098
69
210 − 259
2, 879
37
≥ 260
1, 152
c It appears that the aspirin helps to prevent M.I.’s as long as the subject has a low cholesterol
level. Otherwise, as the cholesterol level increases, aspirin appears to lose its effectiveness.
24 3
4.53 a P (P |M ) = =
24 + 16 5
24 2
b P (M |P ) = =
24 + 36 5
24 24 + 36 24 + 16
c P (P M ) = = = P (P )P (M ). Therefore, events P and M are inde-
100 100 100
pendent.
36 60 36 + 24
d P (P F ) = = = P (P )P (F ). Therefore, events P and F are independent.
100 100 100
Chapter 4: Probability 51
4.55 Denote a sample consisting of the ith defective and the j th nondefective as
Di Ni (i = 1, 2, 3, j = 1, 2, 3, 4), two nondefectives as
Nj Nj ′ (j, j ′ = 1, 2, 3, 4, j 6= j ′ ), and two defectives as
Di Di′ (i, i′ = 1, 2, 3, i 6= i′ ).
a The outcomes are
N1 N2 N2 N3 N3 N4 N4 D 1 D1 D2
N1 N3 N2 N4 N3 D 1 N4 D 2 D1 D3
N1 N4 N2 D 1 N3 D 2 N4 D 3 D2 D3
N1 D 1 N2 D 2 N3 D 3
N1 D 2 N2 D 3
N1 D 3
b A = {N1 N2 , N1 N 3 , N1 N 4 , N2 N 3 , N2 N 4 , N3 N 4 }
c Assigning equal probabilities to the 21 outcomes, we have
1 2
P (A) = 6 · =
21 7
4.57 a Using the multiplication rule, there are n1 n2 = 6 · 6 = 36 outcomes in S.
b Denote the event of rolling i on the first die and j on the second die as (i, j), i = 1, ..., 6, j = 1, ..., 6,
and denote the event of rolling a seven as A. Then A = {(1,6), (6,1), (5,2), (2,5), (3,4), (4,3)}.
Assigning equal probabilities to each of the elements in S, we then have
1 1
P (A) = 6 · = .
36 6
4.59 Denote the event that the person is traveling on business as B, on a major airline as M, on a
private airline as R, and on a commercially owned plane not belonging to an airline as C. Note
that P(M) + P(R) + P(C) = 1
a P(B) = P(BM) + P(BR) + P(BC)
= P(M)P(B|M) + P(R)P(B|R) + P(C)P(B|C)
= (0.6)(0.5) + (0.3)(0.6) + (0.1)(0.9) = 0.57
b P(BR) = P(R)P(B|R) = (0.3)(0.6) = 0.18
c P(B|C) = 0.9
d From parts (a) and (b) we have
P (BR) 0.18
P (R|B) = = = 0.3158.
P (B) 0.57
4.61 Using the multiplication rule,
9 · 10 · 10 · 10 · 10 · 10 · 10 = 9,000,000
4.63 Using the multiplication rule, 3 · 3 · 2 = 18 experimental runs are needed.
number of ways of number of ways of
drawing an ace drawing a face card
4.65 a P(draw an ace and face card) =
total number of ways
of drawing two cards
4 · 12
= = 0.0362
52
2
52 Chapter 4: Probability
number of ways of 13
drawing 5 spades 5
b P(draw 5 spades) = = = 0.000495
total number of ways 52
of drawing 5 cards 5
number of ways of drawing
(number of suits)
5 cards from a given suit
c P(draw 5 cards of the same suit) =
total number of ways
of drawing 5 cards
13
4
5
= = 0.001981
52
2
1 1 1 1 1
4.67 P(match) = P(two tails) + P(two heads) = · + · =
2 2 2 2 2
4.71 Let Ni , Si , Ei , Wi denote, respectively, the events that the patrolman goes north, south, east, and
west, and the ith intersection, i = 1, 2, ... . Then P (Ni ) = P (Si )
1
= P (Ei ) = P (Wi ) = .
4
a P(patrolman reaches boundary in eight blocks)
= P [(N1 N2 N3 N4 N5 N6 N7 N8 ) ∪ (S1 S2 S3 S4 S5 S6 S7 S8 )
∪(E1 E2 E3 E4 E5 E6 E7 E8 ) ∪ (W1 W2 W3 W4 W5 W6 W7 W8 )]
= P (N1 N2 N3 N4 N5 N6 N7 N8 ) + P (S1 S2 S3 S4 S5 S6 S7 S8 )
+P (E1 E2 E3 E4 E5 E6 E7 E8 ) + P (W1 W2 W3 W4 W5 W6 W7 W8 )
8 7
1 1
=4 =
4 4
b If the patrolman initially goes north, there are nine unique routes by which he can return to the
starting point after walking exactly four blocks; i.e., he can take any of the following routes:
N 1 N 2 S3 S4 N 1 E2 S3 W 4 N 1 W 2 S3 E4
N 1 S2 N 3 S4 N 1 S2 S3 N 4 N 1 S2 E 3 W 4
N 1 S2 W 3 E 4 N 1 E 2 W 3 S4 N 1 W 2 E3 S4
Similarly, for each of the other three directions he can take initially, there are nine unique routes
that will return him to the starting point. Therefore, P(returning to the starting point in four
blocks)
4
1
=(number of distinct routes) × P(given distinct route) = 4 · 9 ·
4
3
1
=9 .
4
4.73 Eight minutes are available for typing blood, allowing time for screening up to four donors. Denote
the event of the ith donor having Rh-positive blood as Ai , i = 1, 2, 3, 4. Then the probability that the
victim will be saved
= P (A1 ∪ A1 A2 ∪ A1 A2 A3 ∪ A1 A2 A3 A4 )
= P (A1 ) + P (A1 A2 ) + P (A1 A2 A3 ) + P (A1 A2 A3 A4 )
= (0.4) + (0.6)(0.4) + (0.6)2 (0.4) + (0.6)3 (0.4) = 0.8704.
P [(A ∪ B) ∩ C] P (AC ∪ BC)
4.75 P [(A ∪ B)|C] = = (by the distributive law)
P (C) P (C)
P (AC) + P (BC) − P (ABC) P (AC) P (BC) P (ABC)
= = + −
P (C) P (C) P (C) P (C)
= P (A|C) + P (B|C) − P (AB|C)
54 Chapter 4: Probability
For design B, the current flows if relays 1 and 2 are closed or if relays 3 and 4 are closed. Thus,
for design B,
Therefore, design A has a higher probability of current flowing when the switch is thrown.
4.83 a P (A ∪ B) ≤ 1
i.e., P (A) + P (B) − P (AB) ≤ 1
i.e., P (AB) ≥ P (A) + P (B) − 1.
Chapter 4: Probability 55
= P (AB ∪ BA)
= P (AB) + P (BA)
= [P (A) − P (AB)] + [P (B) − P (AB)]
= P (A) + P (B) − 2P (AB)
P (AB) P (ABC)
4.85 P (A)P (B|A)P (C|AB) = P (A) · · = P (ABC)
P (A) P (AB)
4.87 LetN = the event that a selected individual knows nothing at all about engineering.
LetV = the event that a selected individual is very unlikely to become an engineer.
LetA = the event that a selected individual knows a lot about engineering.
LetL = the event that a selected individual is very likely to become an engineer.
179 + 132 + 53 + 8 + 4 376
a P (V ) = = = 0.664
566 566
179/566 179
b P (N |V ) = = = 0.476
(179 + 132 + 53 + 8 + 4)/566 376
10/566
c P (L|A) = = 0.417
24/566
3/566
P (L|N ) = = 0.016
190/566
So, there is a greater chance that one is very likely to become an engineer if one knows a lot
about engineering, than given that one knows nothing at all about engineering.
56 Chapter 5: Discrete Probability Distributions
Chapter 5
57
58 Chapter 5: Discrete Probability Distributions
3
5.3 P(X = 0) = (0.363)0 (0.637)3 = 0.2585
0
3
P(X = 1) = (0.363)1 (0.637)2 = 0.4419
1
3
P(X = 2) = (0.363)2 (0.637)1 = 0.2518
2
3
P(X = 3) = (0.363)3 (0.637)0 = 0.0478
3
This answer assumes independence of up-at-bats. This assumption may not be reasonable, since
pitchers may change between up-at-bats. Boggs might get tired as the game progresses, etc. It
appears that it is not unusual for a good hitter to go 0 for 3 in one game, since the probability of
1
this for Boggs is more than .
4
P (X + Y = 0) = P (X = 0)P (Y = 0)
8 2744
= · = 0.24090
27 3375
P (X + Y = 1) = P (X = 0)P (Y = 1) + P (X = 1)P (Y = 0)
8 588 12 2744
= · + · = 0.41297
27 3375 27 3375
Chapter 5: Discrete Probability Distributions 59
P (X + Y = 2) = P (X = 1)P (Y = 1) + P (X = 2)P (Y = 0)
+ P (Y = 2)P (X = 0)
12 588 6 2744 8 42
= · + · + · = 0.26179
27 3375 27 3375 27 3375
P (X + Y = 3) = P (X = 0)P (Y = 3) + P (X = 1)P (Y = 2)
+ P (X = 2)P (Y = 1) + P (X = 3)P (Y = 0)
8 1 12 42 6 588 1 2744
= · + · + · + + = 0.07445
27 3375 27 3375 27 3375 27 3375
P (X + Y = 4) = P (X = 1)P (Y = 3) + P (X = 2)P (Y = 2)
+ P (X = 3)P (Y = 1)
12 1 6 42 1 588
= · + · + · = 0.00935
27 3375 27 3375 27 3375
P (X + Y = 5) = P (X = 2)P (Y = 3) + P (X = 3)P (Y = 2)
6 1 1 42
= · + · = 0.00053
27 3375 27 3375
1 1
P (X + Y = 6) = P (X = 3)P (Y = 3) = · = 0.00001
27 3375
number of ways of number of ways of
choosing x from 2 choosing 2 − x from 2
5.9 a. p(x) =
total number of ways of
choosing a sample of 2 from 4
2 2
x 2−x
= x = 0, 1, 2; i.e.,
4
2
x p(x)
1
0
6
2
1
3
1
2
6
60 Chapter 5: Discrete Probability Distributions
1 3
x 2−x
b. p(x) = x = 0, 1; i.e.,
4
2
x p(x)
1
0
2
1
1
2
c. P(X = 0) = 1
c. Box II, since for Box I the highest possible net gain is $1 with a probability of 1/3, but for Box
II the highest possible gain is $3 with a probability of 1/5. Note that V (GI ) < V (GII ); i.e., Box
I net gain varies less from E(Gi ) = 0 than Box II net gain.
Chapter 5: Discrete Probability Distributions 61
5.13 Let X = age of death of a person infected with the AIDS virus through 1995. Then we can
estimate the mean of X from Figure 5.8 by letting the possible values of X be the mean ages for
the different age groups in the pie chart. Hence,
E(X) = 6P (X = 6) + 21P (X = 21) + 34.5P (X = 34.5)
+ 44.5P (X = 44.5) + 54.5P (X = 54.5) + 60P (X = 60)
= 6(0.01) + 21(0.18) + 34.5(0.45) + 44.5(0.25) + 54.5(0.08) + 60(0.04)
= 37.25 (Answers may vary.)
Var(X) = E(X 2 ) − (E(X))2
E(X 2 ) = 36(0.01) + 441(0.18) + 1190.25(0.45) + 1980.25(0.25)
+ 2970(0.08) + 3600(0.04) = 1492.02
Var(X) = 1492.02 − (37.25)2 = 104.458 (Answers may vary.)
1 1
( ) ( )
Std(X) = (Var(X)) 2 = (104.46) 2 = 10.22 (Answers may vary.)
The median should be toward the upper end of the 30−39 age spectrum, thus the median and
mean are comparable.
5.15 E(number of sales) = 0 · p(0) + 1 · p(1) + 2 · p(2) = 0(0.7) + 1(0.2) + 2(0.1) = 0.4
V(number of sales) = 02 · p(0) + 12 · p(1) + 22 · p(2) − [E(number of sales)]2
= 0(0.7) + 1(0.2) + 4(0.1) − (0.4)2 = 0.44
p √
Standard deviation of Sales = V(sales) = 0.44 = 0.6633.
5.17 a. Let X = weekly number of breakdowns. Using Tchebysheff’s theorem, we have
1
P (µ − kσ < X < µ + kσ) ≥ 1 − 2
k
1 √
For 1 − 2 = 0.9, we have k = 10, and thus the desired interval is
k √ √
(µ − kσ, µ + kσ) = [4 − 10(0.8), 4 + 10(0.8)] = (1.4702, 6.5298).
8−µ 8−4
b. Eight breakdowns is = = 5 standard deviations from the mean. The interval
σ 0.8
1
(µ − 5σ, µ + 5σ) or (0, 8) must contain at least 1 − 2 = 0.96 of the probability. Thus, at
5
most 4% of the probability mass can exceed 8 breakdowns and the director is safe in his claim.
80 − 100
b. No, since 80 6⊂ (84.1886, 115.8114). Also, note that 80 is = −4 standard deviations
5
from the mean. Then P (X ≤ µ − 4σ) ≤ P (|X − µ|
1
≥ 4σ) ≤ 2 = 0.0625. Therefore, one would expect less than 6.25% of batteries to die out in less
4
than 80 minutes.
62 Chapter 5: Discrete Probability Distributions
5.23 Let X = number of underfilled boxes. Then X has a binomial distribution with parameters
n = 25, p as given.
a. P (X ≤ 2) = 0.537
b. P (X ≤ 2) = 0.098
Therefore, at least n = 8 people must donate blood for the probability of having at least 5 Rh+
donors to be greater than 0.9.
Chapter 5: Discrete Probability Distributions 63
5.29 Let X = number of firms out of a sample of five that say ”quality of life” is an important factor.
Then, assuming independence among firms, X has a binomial distribution with parameters n = 5,
p = 0.55, and
5
5
(0.55)x (0.45)5−x
P
P (X ≥ 3) =
x=3 x
5 3 2 5 4 1 5
= (0.55) (0.45) + (0.55) (0.45) + (0.55)5
3 4 5
= 0.3369 + 0.2058 + 0.0503 = 0.5931.
5.31 Let X = number of radar sets out of n that detect an intruding aircraft. Then X has binomial
distribution with parameters n, and p = 0.9.
2
a. P (X ≥ 1) = 1 − P (X = 0) = 1 − (0.9)0 (0.1)2 = 0.99
0
4
b. P (X ≥ 1) = 1 − P (X = 0) = 1 − (0.9)0 (0.1)4 = 0.9999
0
5.33 Let X = number of components out of the four that last longer than 1000 hours. The probability
that a given component lasts longer than 1000 hours is 0.8; thus X has a binomial distribution
with parameters n = 4, p = 0.8.
4
a. P (X = 2) = (0.8)2 (0.2)2 = 0.1536
2
4 0 4 4
b. P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − (0.8) (0.2) − (0.8)1 (0.2)3
0 1
= 1 − 0.0016 − 0.0256 = 0.9728
5.37 Let X = number of defective motors out of ten in the warehouse. Then X is binomially distributed
with n = 10, p = 0.08. Let Y = net gain = (selling price for the ten motors) − (twice the selling
price of a motor) · X = 10(100) − 200X
= 1,000 − 200X.
E(Y ) = E(1, 000 − 200X) = 1, 000 − 200E(X) = 1, 000 − 200np = 1, 000
−200(10)(0.08) = 840
64 Chapter 5: Discrete Probability Distributions
5.39 a. P (Y ≥ 4) = 1 − P (Y ≤ 3) = 1 − P (Y = 2) − P (Y = 3)
2−1 2 0 3−1
=1− (0.4) (0.6) − (0.4)2 (0.6)1 = 1 − 0.16 − 2(0.4)2 (0.6)
2−1 2−1
= 0.648
5.41 Let Y = the trial on which the third nondefective engine is found. Then Y has a negative binomial
distribution, with p = 0.9, r = 3.
y−1 r 4
a. P (Y = 5) = p(5) = p (1 − p)y−r = (0.9)3 (0.1)2 = 0.04374
r−1 2
b. P (Y ≤ 5) = P (Y = 3) + P (Y = 4) + P (Y = 5) = p(3) + p(4) + p(5)
3−1 3 0 4−1 3 1 5−1
= (0.9) (0.1) + (0.9) (0.1) + (0.9)3 (0.1)2
3−1 3−1 3−1
5.43 a. Let Y be defined as in Exercise 3.40. Then Y has a geometric distribution with p = 0.9, and
1 10
E(Y ) = =
p 9
1−p 0.1
V (Y ) = = = 0.1235.
p2 (0.9)2
b. Let Y be defined as in Exercise 3.41. Then Y has a negative binomial distribution with parameters
p = 0.9, r = 3, and
r 30
E(Y ) = = = 3.33
p 9
r(1 − p) 0.3
V (Y ) = = = 0.3704.
p2 (0.9)2
5.47 a. Let Y = number of the well in which oil was first struck. Then Y has a geometric distribution
with parameter p = 0.2, so
P (Y = 3) = (1 − p)(3−1) p1 = (0.8)2 (0.2)1 = 0.128.
b. Let Y = number of the well in which the third oil strike occurs. Then Y has a negative binomial
distribution with parameters p = 0.2, r = 3, so
5−1
P (Y = 5) = (0.2)3 (0.8)2 = 0.03072.
3−1
The solutions to parts (a) and (b) require the assumption of independence of the wells.
5.49 Let Y = number of tires that must be selected in order to get four good ones. Then Y has a
negative binomial distribution with parameters p = 0.9, r = 4.
6−1
a. P (Y = 6) = (0.9)4 (0.1)2 = 0.06561
4−1
r 4
b. E(Y ) = = = 4.4444
p 0.9
r(1 − p) 4(0.1)
c. V (Y ) = = = 0.4938
p2 (0.9)2
5.51 a. Let Y = number of customers it takes to sell the three white appliances. Then Y has a negative
1
binomial distribution with parameters p = , r = 3, and
2
3 5−3
5−1 1 1 1 1 3
P (Y = 5) = =6 = .
3−1 2 2 8 4 16
b. Let X = number of customers it takes to sell the brown appliances. Then X has the same dis-
1
tribution as Y, a negative binomial distribution with parameters p = , r = 3, and P(X = 5) =
2
3
P(Y = 5) = .
16
3 3−3 3
3−1 1 1 1 1
c. P (Y = 3) = = =
3−1 2 2 2 8
d. P(all the whites ordered before all browns) = P(Y ≤ 5)
3 1
= p(3) + p(4) + p(5) = + p(4) +
16 8
3
3 4−1 1 1 1
= + +
16 3−1 2 2 8
3 3 1 1
= + + =
16 16 8 2
66 Chapter 5: Discrete Probability Distributions
5.55 Let Y = number of fatalities per 109 vehicle miles with NMSL in effect. Then Y has a Poisson
distribution with parameter λ = 16.
a. P (Y ≤ 15) = F (15) = 0.467
b. P (Y ≥ 20) = 1 − P (Y ≤ 19) = 1 − F (19) = 1 − 0.812 = 0.188
5.57 a. Let Y = number of teleport inquiries in one millisecond. Then Y has a Poisson distribution with
parameter λ = 0.2 and
(0.2)0 −0.2
P (Y = 0) = e = e−0.2 = 0.8187.
0!
b. Let X = number of teleport inquiries in three milliseconds. Then X has a Poisson distribution
with parameter λ = 3(0.2) = 0.6 and
(0.6)0 −0.6
P (X = 0) = e = e−0.6 = 0.5488.
0!
5.59 Let Y = number of customer arrivals in a given hour. Then Y has a Poisson distribution with
λ = 8.
a. P (Y = 8) = P (Y ≤ 8) − P (Y ≤ 7) = 0.593 − 0.453 = 0.140
b. P (Y ≤ 3) = 0.042
c. P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − 0.003 = 0.997
5.61 a. Let X = number of customers that arrive in a given two-hour period of time. Then X has a
Poisson distribution with λ = 2(8) = 16 and
162 −16
P (X = 2) = e = 128e−16 = 1.44 × 10−5 .
2!
Chapter 5: Discrete Probability Distributions 67
b. The two one-hour time periods are nonoverlapping, and therefore X = total number of customers
that arrive in the given two-hour time period has a Poisson distribution with λ = 2(8) = 16, and,
as for part (a), P (X = 2)
= 1.44 × 10−5 .
Consistent with this answer, note the following. Let Y1 = number of customers that arrive 1−2
pm, and Y2 = number of customers that arrive 3−4 pm. Then Y1 and Y2 are each distributed as
Poisson with λ = 8 and
5.63 Let X = number of imperfections in an eight-square yard sample. Then X has a Poisson distri-
bution with λ = 8(4) = 32. Let C = 10X = cost of repair. Then
E(C) = 10E(X) = 10λ = 10(32) = 320
V (C) = (10)2 V (X) = 100λ = 100(32) =3,200
p √ √
The standard deviation of C = V (C) = 3, 200 = 40 2 = 56.5695
∞ λy e−λ P∞ λ(y−2) P ∞ λx
= λ2 e−λ y=2 = λ2 e−λ x=0
P
5.65 E[Y (Y − 1)] = y(y − 1)
y=0 Y! (y − 2)! x!
2 −λ λ 2
=λ e e =λ
V (Y ) = E(Y 2 ) − [E(Y )]2 = E(Y 2 ) − E(Y ) + E(Y ) − [E(Y )]2
= E[Y (Y − 1)] + E(Y ) − [E(Y )]2
= λ2 + λ − (λ)2 = λ
b. Let X = the number of cars arriving in a given eight hours. Then X has a Poisson distribution
with λ = 8(4) = 32 and
11 11
X 32x e−32 X 32x
P (X ≤ 11) = = e−32
x=0
x! x=0
x!
322 322 3211
= e−32 1 + 32 + + + ... +
2 6 11!
= e−32 (1.345732 × 109 ) = 0.000017.
68 Chapter 5: Discrete Probability Distributions
5.71 Let Y = number of local firms selected. Then Y has a hypergeometric distribution with parameters
k = 4, n = 3, N = 6.
a. P (at least one not local) = P (not all local) = 1 − P (Y = 3) = 1 − p(3)
4 2
3 0 4 4
=1− =1− =
6 20 5
3
4 2
3 0 1
b. P (Y = 3) = =
6 5
3
b. k = 4
y p(y)
4 6
0 3 1
0 =
10 6
3
4 6
1 2 1
1 =
10 2
3
4 6
2 1 3
2 =
10 10
3
4 6
3 0 1
3 =
10 30
3
5.75 Let Y = number of misfiring plugs among the four removed. Then Y has a hypergeometric
distribution with N = 8, n = 4, k = 2, and
2 6
2 2 3
P (Y = 2) = = .
8 14
4
5.77 Let Y = number of accounts past due that the auditor sees. Then Y has a hypergeometric
distribution with N = 8, n = 3, k as given, and
k 8−k 8−k
0 3 3
P (Y ≥ 1) = 1 − P (Y = 0) = 1 − =1− .
8 56
3
8−2
3 20 9
a. k = 2; P (Y ≥ 1) = 1 − =1− =
56 56 14
8−4
3 4 13
b. k = 4; P (Y ≥ 1) = 1 − =1− =
56
56 14
8−7
3
c. k = 7; P (Y ≥ 1) = 1 − = 1 − 0 = 1, since the auditor must choose at least two
56
past-due accounts.
70 Chapter 5: Discrete Probability Distributions
5.79 Note that Y has a hypergeometric distribution with parameters N = 20, n = 5, and k as given.
a. k = 0: P (Y ≤ 1) = 1
b. k = 1: P (Y ≤ 1) = 1
c. k = 2:
2 20 − 2 2 20 − 2
0 5−0 1 5−1 21 15 18
P (Y ≤ 1) = p(0) + p(1) = + = + =
20 20 38 38 19
5 5
d. k = 3:
3 17 3 17
0 5 1 4 91 105 49
P (Y ≤ 1) = p(0) + p(1) = + = + =
20 20 228 228 57
5 5
e. k = 4:
4 16 4 16
0 5 1 4 1092 1820 728
P (Y ≤ 1) = p(0) + p(1) = + = + =
20 20 3876 3876 969
5 5
5.81 Let Y = number of defectives from line I. Then Y has a hypergeometric distribution with param-
eters N = 10, n = 5, and
4 6
2 3 10
P (Y = 2) = = .
10 21
5
Therefore,
E(Y ) = M ′ (0) = n[pet + (1 − p)]n−1 pet |t=0 = np
E(Y 2 ) = M ′′ (0) = n(n − 1)[pet + (1 − p)]n−1 (pet )2 + n[pet + (1 − p)]n−1 pet |t=0
= n(n − 1)p2 + np
V (Y ) = E(Y 2 ) − [E(Y )]2 = n(n − 1)p2 + np − (np)2 = np(1 − p)
5.85 MY (t) = E(etY ) = E(et(aX+b) ) = E[etb e(at)X ] = etb E[e(at)X ] = etb MX (at)
Chapter 5: Discrete Probability Distributions 71
5.87 Binomial probability histograms for n=5 and p=0.1, 0.5 and 0.9
72 Chapter 5: Discrete Probability Distributions
5.89 Let Y = number of radar sets out of the five that detect the plane. Then Y has a binomial
distribution with parameters n = 5, p = 0.9.
5
P (Y = 4) = (0.9)4 (0.1)1 = 0.32805
4
and
5
P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.9)0 (0.1)5 = 0.99999
0
5 0
5.91 P (Y ≤ a) = p (1 − p)5 = (1 − p)5
0
a. (1 − p)5 = (1)5 = 1
b. (1 − p)5 = (0.9)5 = 0.5905
c. (1 − p)5 = (0.7)5 = 0.1681
d. (1 − p)5 = (0.5)5 = 0.03125
e. (1 − p)5 = (0)5 = 0
p P (Y ≤ a)
0.05 0.9774
0.10 0.9185
0.20 0.7373
0.30 0.5282
0.40 0.3370
Operating characteristic curve for n=25 and a=5:
Chapter 5: Discrete Probability Distributions 73
p P (Y ≤ a)
0.05 0.9988
0.10 0.9666
0.20 0.6167
0.30 0.1935
0.40 0.0294
a. n = 25, a = 5
b. n = 25, a = 5
5.95 Let Y = number of colonies in a given dish. Then Y has a Poisson distribution with a mean of
λ = 12
a. P (Y ≥ 10) = 1 − P (Y ≤ 9) = 1 − 0.242 = 0.758
p √ √
b. E(Y ) = λ = 12, and the standard deviation of Y = V (Y ) = λ = 12
= 3.4641
c. Using Tchebysheff’s theorem, we have
p p 1
P [E(Y ) − 2 V (Y ) < Y < E(Y ) + 2 V (Y )] ≥ 1 − 2 = 0.75
2
I.e., the desired interval is [12 − 2(3.4641), 12 + 2(3.4641)] = (5.0718, 18.9282).
5.99 Let Y = number of left-turning vehicles out of n vehicles arriving while the light is red. Then Y
has a binomial distribution with parameters n = 5, p = 0.2, P (Y ≤ 3) = 0.993. This number may
be computed directly as
P (Y ≤ 3) = 1 − P (Y ≥ 4) = 1 − P (Y = 4) − P (Y = 5)
5 5
=1− (0.2)4 (0.8)1 − (0.2)5 (0.8)0 = 0.993.
4 5
= e−λ eλ = 1.
5.103 Let Y = total number of requests for welding units until the third brand-A unit is used. Then Y
has a negative binomial distribution with parameters r = 3, p = 0.7, and
5−1
P (Y = 5) = (0.7)3 (0.3)(5−3) = 6(0.7)3 (0.3)2 = 0.18522.
3−1
5.105 Let Y = number of people that have to be interviewed before encountering a consumer who prefers
brand A. Then Y has a geometric distribution with parameter p = 0.6.
P (Y = 5) = (1 − p)5−1 p = (0.4)4 (0.6) = 0.01536
and
P (Y ≥ 5) = 1 − P (Y ≤ 4)
= 1 − p(1) − p(2) − p(3) − p(4)
= 1 − (0.6) − (0.4)(0.6) − (0.4)2 (0.6) − (0.4)3 (0.6)
= 1 − (0.6) − (0.24) − (0.096) − (0.0384)
= 0.0256
5.107 Note that Y has a binomial distribution with parameters n = 1,000, p = 0.9, so that
E(Y ) = np =1,000(0.9) = 900, and
V (Y ) = np(1 − p) =1,000(0.9)(0.1) = 90.
Using Tchebysheff’s theorem with k = 2, we have P (µ − 2σ < Y < µ
1
+ 2σ) > 1 − 2 ; i.e.,
√ 2 √
P (900 − 2 90 < Y < 900 + 2 90) = P (881.026 < Y < 918.974) ≥ 0.75.
Chapter 5: Discrete Probability Distributions 75
1
5.109 a. Note that Y has a binomial distribution with parameters n = 4, p = , and distribution function
y 4−y 3
4 1 2
p(y) = .
y 3 3
y p(y)
4
2
0
3
3 4
1 2 2
1 4 =2
3 3 3
2 2 3
1 2 2
2 6 =
3 3 3
3 1 3
1 2 1 2
3 4 =
3 3 3 3
4
1
4
3
3 4
1 2 1 1
b. P (Y ≥ 3) = p(3) + p(4) = + =
3 3 3 9
1 4
c. E(Y ) = np = 4 =
3 3
1 2 8
d. V (Y ) = np(1 − p) = 4 =
3 3 9
5.111 a. Here, Y has a hypergeometric distribution with parameters N = 100, n = 20, k = 40, and
40 60
10 10
p(10) = = 0.1192.
100
20
b. Here, Y has a binomial distribution with parameters n = 20, p = 0.40.
p(10) = F (10) − F (9) = 0.872 − 0.755 = 0.117
Thus it appears that N is large enough that the binomial probability function is a good approxi-
mation to the hypergeometric probability function.
76 Chapter 5: Discrete Probability Distributions
5.113 Let Y = number of items sold on a given day, P = daily profit, and X = number of items stocked.
Note that for Y ≤ X
P = 1.2Y − X
E(P ) = 1.2E(Y ) − X
and
E(Y |X = 1) =1
E(Y |X = 2) =2
E(Y |X = 3) = 2p(2) + 3P (Y ≥ 3) = 2(0.1) + 3(0.9) = 2.9
E(Y |X = 4) = 2p(2) + 3p(3) + 4p(4) = 2(0.1) + 3(0.4) + 4(0.5) = 3.4.
Hence
E(P |X = 1) = 1.2(1) − 1 = 0.2
E(P |X = 2) = 1.2(2) − 2 = 0.4
E(P |X = 3) = 1.2(2.9) − 3 = 0.48
E(P |X = 4) = 1.2(3.4) − 4 = 0.08.
77
78 Chapter 6: Continuous Probability Distributions
0 x<5
5 x−7 3 x−7
(x − 7)3
3 3 w
Z Z
b F (x) = (7 − y)2 dy = w2 dw = = +1 5≤x≤7
x 8 8 −2 8 −2
8
1 x>7
3
(6 − 7) 7
c P (X < 6) = F (6) = +1=
8 8
P (X < 5.5) (5.5 − 7)3 + 8 7 37
d P (X < 5.5|X < 6) = = / =
P (X < 6) 8 8 56
Z 1 1
1 1 3
2xdx = x2
6.7 a P (X > ) = =1− =
2 1/2 1/2 4 4
1 1 1
P X> , X> P >
1 1 2 4 2
b P X > |X > = =
2 4 1 1
P X> P X>
4 4
3
4
=Z 1 4 =
5
2xdx
1/4
1 1 1
P X> , X> P X>
1 1 4 2 2
c P X> |X > = = =1
4 2 1 1
P X> P X>
2 2
Chapter 6: Continuous Probability Distributions 79
0 x<0
Z x
2
d F (x) = 2ydy = x 0≤x≤1
0
1 x>1
a+b 50 + 70
a E(X) = = = 60
2 2
(b − a)2 (70 − 50)2 100
V (X) = = =
12 12 3
b Let T = number of trucks needed. Then we have E(X)/T = 15; i.e., 60/15 = 4 trucks are needed.
= 0.1481.
6.33 Let X = water demand. Then X has an exponential distribution with parameter θ = 100.
a P (X > 200) = 1 − F (200) = 1 − 1 − e−200/100 = e−2 = 0.1353
= 460.52 cfs.
82 Chapter 6: Continuous Probability Distributions
Z ∞
6.35 a Note that since x(n−1) e−(x/θ) dx = Γ(n)θn , then for k integer valued we have
0
∞
1 1
Z
E(X k ) = xk e−x/θ dx = Γ(k + 1)θ(k+1) = θk k!
θ 0 θ
So
E(X) = (10)1! = 10
E(X 2 ) = (10)2 2! = 200
E(X 3 ) = (10)3 3! = 6,000
E(X 4 ) = (10)4 4! = 240,000
and
E(C) = 100 + 40E(X) + 3E(X 2 ) = 100 + 40(10) + 3(200) = 1,100
E(C 2 ) = E [100 + 40(X) + 3(X 2 )]2
6.49 Let Xi = time to completion of the given task, i = 1, 2. Then Xi has a Gamma distribution
with parameters α = 1, β = 10, and Y = X1 + X2 has a Gamma distribution with parameters
α = 2(1) = 2, β = 10.
6.59 Let X = diameter. Then X has anormal distribution with parameters µ = 1.005, σ = 0.01, and
0.98 − 1.005
P (X < 0.98) + P (X > 1.02) = P Z <
0.01
1.02 − 1.005
+ P Z>
0.01
= P (Z < −2.5) + P (Z > 1.5) = (0.5 − 0.4938) + (0.5 − 0.4332)
= 0.0730.
6.61 Let X = resistances of wires produced by Company A. Then X has a normal distribution with
parameters µ = 0.13, σ = 0.005.
0.12 − 0.13 X − 0.13 0.14 − 0.13
a P (0.12 < X < 0.14) = P < <
0.005 0.005 0.005
= P (−2 < Z < 2) = 2P (0 < Z < 2) = 2(0.4772) = 0.9544
b Let Y = number of wires of a sample of four from Company A that meet specifications. Then Y
has a binomial distribution with parameters n = 4, p = 0.9544, and
4
P (Y = 4) = (0.9544)4 (1 − 0.9544)0 = 0.8297.
4
−5 − 0 5−0
6.63 P (|X| > 5) = P (X < −5) + P (X > 5) = P Z < +P Z >
10 10
= 2P (Z > 0.5) = 2(0.5 − 0.1915) = 0.6170
−10 − 0 10 − 0
P (|X| > 10) = P (X < −10) + P (X > 10) = P Z < +P Z >
10 10
= 2P (Z > 1) = 2(0.5 − 0.3413) = 0.3174
6.65 Let X = monthly sickleave time. Then X has a normal distribution with parameters µ =
200, σ = 20.
X − 200 150 − 200
a P (X < 150) = P < = P (Z < −2.5) = 0.5 − 0.4938
20 20
= 0.0062
x0 − 200 set
b Let x0 = desired time budgeted. Then P (X > x0 ) = P (Z > ) →= 0.1. Note that
20
x0 − 200
P (X > 1.28) = 0.1, so we have = 1.28, i.e., x0 = 225.6 hours.
20
6.67 Let X = amountof fill per
box. Then X has a normal distribution with parameters µ, σ = 1, and
16 − µ set 16 − µ
P (X > 16) = P Z →= 0.01. Note that P (Z > 2.33) = 0.01, so we have = 2.33;
1 1
i.e., µ = 13.67 ounces.
6.69 a Yes, it does appear that the total points can be modeled by a normal distribution.
b According to the empirical rule, 68% of the data should lie one standard deviation above and
below the mean and 95% of the data should lie within two standard deviations above and below
the mean. Hence, consider the interval (x̄ − s, x̄ + s) = (143 − 26, 143 + 26) = (117, 169).
Notice that more than 77% of the games had total scores within (117, 169). Now consider the
interval (x̄ − 2s, x̄ + 2s) = (143 − 2(26), 143 + 2(26)) = (91, 195). Notice that less than 5%
of the total scores fall outside of this region.
c No and no. A score of 200 is greater than two standard deviations away from the mean. Such a
score should occur less than 2.5% of the time, according to the empirical rule. A score of 250 is
greater than three standard deviations away from the mean, making it even less likely to occur.
86 Chapter 6: Continuous Probability Distributions
d About 4 games.
6.71 a Q-Q plots for male and female distributions:
α
6.79 We were shown in the previous section that E(X) = .
(α + β)
1 Z 1
Γ(α + β) α−1 Γ(α + β) α+2−1
Z
E(X 2 ) = x2 x (1 − x)β−1 dx = x (1 − x)β−1 dx
0 Γ(α)Γ(β) 0 Γ(α)Γ(β)
Γ(α + β)Γ(α + 2) 1 Γ(α + β + 2) α+2−1
Z
= x (1 − x)β−1 dx
Γ(α)Γ(α + 2 + β) 0 Γ(α + 2)Γ(β)
Γ(α + β)Γ(α + 2) Γ(α + β)(α + 1)αΓ(α) α(α + 1)
= = =
Γ(α)Γ(α + 2 + β) Γ(α)(α + β + 1)(α + β)Γ(α + β) (α + β)(α + β + 1)
Then we have
2
α(α + 1) α
V (X) = E(X 2 ) − [E(X)]2 = −
(α + β)(α + β + 1) α+β
2
α(α + 1)(α + β) − α (α + β + 1) α
= = .
(α + β)2 (α + β + 1) (α + β)2 (α + β + 1)
1
Z 1
2 3 4
6.81 a P (X > 0.4) = 12x (1 − x)dx = 4x − 3x = 0.8208
0.4 0.4
α 3
b E(V ) = 5 − 0.5E(X) = 5 − (0.5) = 5 − (0.5) = 4.7
α+β (3 + 2)
αβ (3)(2)
V (V ) = (0.5)2 V (X) = (0.25) = (0.25)
(α + β)(α + β + 1) (3 + 2)2 (3 + 2 + 1)
= 0.01
6.83 Let X = measurement error. Then X has a Beta distribution with parameters α = 1, β = 2.
Z 0.5 Z 0.5
Γ(1 + 2) (1−1)
a P (X < 0.5) = x (1 − x)(2−1) dx = 2(1 − x)dx
0 Γ(1)Γ(2) 0
0.5
= (2x − x2 ) = 0.75
0
α 1
b E(X) = =
α+β 3
αβ 2 1
V (X) = 2
= 2
=
(α + β) (α + β + 1) (1 + 2) (1 + 2 + 1) 18
r
p 1
So the standard deviation of X is V (X) = = 0.2537.
18
6.85 Let X = proportion of pure iron, with X having a Beta distribution with parameters α = 3, β =
1.
Z 1 Z 1 1
Γ(4) 7
x(3−1) (1 − x)(1−1) dx = 3x2 dx = x3 =
a P (X > 0.5) =
0.5 Γ(3)Γ(1) 0.5 0.5 8
b Let Y = number of samples out of three that have less than 30% pure iron. 0.3Then Y has a binomial
Z 0.3
2 3
distribution with parameters n = 3, p = P (X < 0.3) = 3x dx = x = 0.027, and
0 0
3
P (Y = 2) = (0.027)2 (1 − 0.027)1 = 0.002128.
2
Chapter 6: Continuous Probability Distributions 89
6.89 Let X = the ultimate tensile strength of the steel wire. Then X has a Weibull distribution with
parameters γ = 1.2, θ = 270, and
1.2
P (X > 300) = 1 − F (300) = 1 − 1 − e−(300) /270 = 0.03091.
6.91 Let X = pressure in thousands of pounds exerted on the tank. Then X has a Weibull distribution
with parameters γ = 1.8, θ = 1.5, and
1.8
P (X > 2) = 1 − F (2) = 1 − 1 − e−(2) /1.5 = 0.09813.
Estimating the slope and intercept from the scatter plot, we see that the slope = γ ≈ 5.13, and
the y-intercept = ln(θ) ≈ 35.9
Z ∞ m 3/2 r Z ∞
2 −v 2 (m/2KT ) 1 2 2 2KT
6.95 a E(V ) = v4π v e dv = 2 v 2 v (2−1) e−v /θ dv for θ = ;
0 2πKT πθ 0 θ m
i.e.,
r
1
E(V ) = 2 E(X 2 )
πθ
2KT
where X has a Weibull distribution with parameters γ = 2, θ = ; i.e.,
r m
1
V (X) + [E(X)]2
E(V ) = 2
πθ
r ( 2/2 2 2 !)
1 2KT 1 1 2KT 1
=2 Γ 1+ − Γ 1+ + Γ 1+
πθ m 2 2 m 2
r r
m 2KT π π 2KT
=2 1− + =2 .
2πKT m 4 4 mπ
90 Chapter 6: Continuous Probability Distributions
6.10 Reliability
6.97 Rs (t) = P (X1 > t, X2 > t, . . . , Xn > t)
= P (X1 > t)P (X2 > t) · · · P (Xn > t)
= [P (X > t)]n
= [e−t/θ ]n
= e−nt/θ , t > 0
Z ∞ ∞
−nt/θ −θ −nt/θ
E(S) = e dθ = e
0 n
0
θ
=
n
6.99 Consider a parallel system. Then, we need the number of relays n such that
.999 = 1 − [1 − .9]n = 1 − (.1)n .
So, n = 3.
∞ ∞
1 1
Z Z
tZ 2 tZ 2 2 2
6.103 M Z2 (t) = E(e )= e √ e−x /2 dz = √ e−x (1−2t)/2 dz
−∞ 2π −∞ 2π
∞
1
Z
−x2 (1−2t)/2
= (1 − 2t)−1/2 p e dz
2π(1 − 2t)−1/2
−∞
Since the integrand is a normal density function with parameters µ = 0,
σ = (1 − 2t)−1/2 , we have
MZ 2 (t) = (1 − 2t)−1/2 .
Therefore, using the result of Exercise 4.95 and the uniqueness property of the moment-generating
1
function, Z 2 has a Gamma distribution with parameters α = , β = 2.
2
2 2
cy 3 y2
=c 8 +2
Z
2
6.105 a 1= (cy + y)dy = +
0 3 2
0 3
3
i.e., c = −
8
0 y<0
Z y
b F (y) = (x − (3x2 /8))dx = (y 2 /2) − (y 3 /8) 0≤y≤2
0
1 y>2
d F (−1) = 0
F (0) = 0
1 1 3
F (1) = − =
2 8 8
(1/2)2 (1/2)3
1 1 1 1 7
e P 0<Y < =F − F (0) = − = − =
2 2 2 8 8 64 64
Z 2 2
3 4
2
3y y 3y =8−3=7
f E(Y ) = y y− dy = −
0 8 3 32
0 3 2 6
Z 2 2
4 5
2
3y y 3y = 4 − 12 = 8
y2 y −
E(Y 2 ) = dy = −
0 8 4 40 0 5 5
2
8 7 43
V (Y ) = E(Y 2 ) − [E(Y )]2 = − = = 0.2389
5 6 180
X − 2.4 3 − 2.4
6.107 Let X = student GPA. P (X > 3) = P > = P (Z > 1.2) = 0.5 − 0.3849 =
0.5 0.5
0.1151
6.109 Let Y = number of students out of three that possess a GPA in excess of 3.0. Then Y has a
binomial distribution with parameters n = 3, p = P (X > 3) = 0.1151, and
3
P (Y = 3) = (0.1151)3 (1 − 0.1151)0 = (0.1151)3 = 0.001525.
3
6.111 Let Y = number of defective bearings out of a sample of five. Then Y has a binomial distribution
with parameters n = 5, p = P (bearing is scrap) = 0.073 (from Exercise 4.105), and
5
P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.073)0 (1 − 0.073)5 = 1 − (0.927)5
0
= 0.3155.
Chapter 6: Continuous Probability Distributions 93
1
Γ(α + β) α−1 Γ(α + β) 1 α+k−1
Z Z
6.113 E(X k ) = xk x (1 − x)β−1 dx = x (1 − x)β−1 dx
0 Γ(α)Γ(β) Γ(α)Γ(β) 0
Γ(α + β)Γ(α + k) 1 Γ(α + β + k) α+k−1
Z
= x (1 − x)β−1 dx
Γ(α + β + k)Γ(α) 0 Γ(α + k)Γ(β)
Γ(α + β)Γ(α + k)
=
Γ(α + β + k)Γ(α)
Therefore,
Γ(α + β)Γ(α + 1) α
E(X) = =
Γ(α + β + 1)Γ(α) α+β
Γ(α + β)Γ(α + 2) (α + 1)α
E(X 2 ) = =
Γ(α + β + 2)Γ(α) (α + β + 1)(α + β)
2
(α + 1)α α
V (X) = E(X 2 ) − [E(X)]2 = −
(α + β + 1)(α + β) α+β
2
(α + β)(α + 1)α − α (α + β + 1)
=
(α + β + 1)(α + β)2
αβ
= .
(α + β + 1)(α + β)2
6.115 Let Xi = lifetime of component i, i = 1, 2, 3.
Z 200 200
1 −x/100
dx = e−x/100 = 1 − e−2
P (Xi < 200) = e
0 100 0
Then Y , the number of components that fail in 200 hours, has a binomial distribution with
parameters n = 3, p = P (Xi < 200) = 1 − e−2 , and
3 0 3 3 1
P (Y ≤ 1) = p (1 − p) + p (1 − p)2 = (e−2 )3 + 3(1 − e−2 )(e−2 )2
0 1
= 3e−4 − 2e−6 = 0.04999.
6.121 Let Y = waiting time for supplies. Then Y has a uniform distribution with parameters a = 1, b
= 4. Let C = cost of delay. Then
(
100 1≤y≤2
C=
100 + 20(y − 2) 2 < y ≤ 4
and
2 Z 4 Z 4
1 1 y−2
Z
E(C) = 100dy + [100 + 20(y − 2)] dy = 100 + 20 dy
1 4−1 2 4−1 2 3
4
20 y 2
20
= 100 + − 2y = 100 + (2) = 113.33
3 2 2 3
6.123 Let Y = weekly downtime. Then Y has a gamma distribution with parameters α = 3, β = 2, and
Z 10 Z 5
1 2 −y/2 1
P (Y ≤ 10) = 3
y e dy = w2 e−w dw = P (W ≤ 5)
0 Γ(3)2 0 Γ(3)
Y
where W = has a gamma distribution with parameters α = 3, β = 1. Then, for X a random
2
variable having a Poisson distribution with parameter λ = 5, by the result presented in Exercise
4.116, we have
P (Y ≤ 10) = P (W ≤ 5) = 1 − P (X ≤ 2) = 1 − 0.125 = 0.875.
ln 4 − 4
6.125 a P (X ≤ 4) = P (Y ≤ ln 4) = P Z ≤ = P (Z ≤ −2.61) = 0.5
1
−0.4955 = 0.0045
ln 8 − 4
b P (X > 8) = P (y > ln 8) = P Z > = P (Z > −1.92) = 0.5
1
+0.4726 = 0.9726
Z ∞
1 ∞ ty −|y| 1 0 y(t+1) 1 ∞ −y(1−t)
Z Z Z
6.127 M (t) = E(etY ) = ety f (y)dy = e e dy = e dy + e dy
−∞ 2 −∞ 2 −∞ 2 0
0 ∞ !
1 1 y(t+1) −1 −y(1−t)2 1 1 1 1
= e + e = + =
2 t+1
−∞ 1−t
0 2 t+1 1−t 1 − t2
Thus,
′ 2t
E(Y ) = M (0) = =0
(1 − t2 ) t=0
Chapter 7
7.1 Firms
1
I I (2,0)
9
1
I II (1,1)
9
1
I III (1,0)
9
1
II I (1,1)
9
1
II II (0,2)
9
1
II III (0,1)
9
1
III I (1,0)
9
1
III II (0,1)
9
1
III III (0,0)
9
95
96 Chapter 7: Multivariate Probability Distributions
x1
0 1 2
1 2 1
0
9 9 9
2 2
x2 1 0
9 9
1
2 0 0
9
4 4 1
p(x1 )
9 9 9
2
P (X1 = 1, X2 = 1) 1
c P (X1 = 1|X2 = 1) = = 9 =
P (X2 = 1) 2 2 2
+
9 9
7.3 a X1
0 1
0 0.0635 0.0775
1 0.1007 0.0556
X2 2 0.1630 0.0653
3 0.1691 0.0549
4 0.1929 0.0574
b
X1 P (X1 |X2 = 0) P (X1 |X2 = 1) P (X1 |X2 = 2)
0 0.4502 0.6455 0.7139
1 0.5498 0.3558 0.2861
Obviously, the older the child is, the better his/her chance of survival in a car accident without
wearing a seat belt.
c
X2 P (X2 |X1 = 0) P (X2 |X1 = 1)
0 0.0921 0.2495
1 0.1461 0.1788
2 0.2365 0.2102
3 0.2453 0.1768
4 0.2799 0.1847
No, this implies that if a child survives, then s/he will probably be older.
Chapter 7: Multivariate Probability Distributions 97
Z 1
1dx2 = 1 0 ≤ x1 ≤ 1
7.5 a f1 (x1 ) = 0
0 otherwise
1/2
1 1
Z
b P X1 ≤ = 1dx1 =
2 0 2
c f1 (x1 )f2 (x2 ) = 1 = f (x1 x2 ), for 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1. Therefore, X1 and X2 are independent.
Z 1/4 Z 3/4 Z 3/4 Z 1−x1
3 3
7.7 a P X 1 ≤ , X2 ≤ = 2dx2 dx1 + 2dx2 dx1
4 4 0 0 1/4 0
Z 3/4
1 3 3 1 7
=2 + 2(1 − x1 )dx1 = + =
4 4 1/4 8 2 8
1 1 1 1 1
b P X 1 ≤ , X2 ≤ =2 =
2 2 2 2 2
1 1 1
P X 1 ≤ , X2 ≤
1 1 2 2 2
c P X1 ≤ |X2 ≤ = = Z 1/2 Z 1−x
2 2 1 2
P X2 ≤ 2dx1 dx2
2 0 0
1 2
= Z 1/2 =
3
2 2(1 − x2 )dx2
0
1/2 1 1/2
1 1 3x1 15 21
Z Z Z
7.9 a P X1 < , X2 > = (x1 + x2 )dx2 dx1 =
+ dx1 =
2 0 4 1/4 0 4 32 64
Z 1 Z 1−x1 Z 1 2 1−x1
x
b P (X1 + X2 ≤ 1) = (x1 + x2 )dx2 dx1 = (x1 x2 + 2 ) dx1
0 0 0 2 0
Z 1
1 1
= (1 − x21 )dx1 =
0 2 3
c Note that
1 1
f1 (x1 )f2 (x2 ) = x1 + x2 + 6= x1 + x2 = f (x1 , x2 )
2 2
for 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1.
Therefore, X1 and X2 are not independent.
98 Chapter 7: Multivariate Probability Distributions
2
1 1 1 1 3−4
− −
12 3 12 9 1
So ρ = s 2 = = 36 = −
1 1 2
1
18 18
18
Z 2 Z 2 Z 2
7.21 a P (Y1 < 2, Y2 > 1) = e−y1 dy1 dy2 = (e−y2 − e−2 )dy2 = e−1 − 2e−2
1 y2 1
= 0.0972
∞ ∞ ∞
1
Z Z Z
−y1
b P (Y1 > 2Y2 ) = e dy1 dy2 = e−2y2 dy2 =
0 2y2 0 2
Z ∞Z ∞ Z ∞
−y1
c P (Y1 − Y2 ≥ 1) = e dy1 dy2 = e−(1+y2 ) dy2 = e−1
0 1+y2 0
Z y1
e−y1 dy2 = y1 e−y1 0 ≤ y1 < ∞
d f1 (y1 ) = 0
0 otherwise
Z ∞
e−y1 dy1 = e−y2 0 ≤ y2 < ∞
f2 (y2 ) = y2
0 otherwise
Z ∞ Z ∞ Z ∞
7.23 a E(Y1 − Y2 ) = (y1 − y2 )e−y1 dy1 dy2 = e−y2 dy2 = 1
0 y2 0
Z ∞Z ∞ Z ∞
b E[(Y1 − Y2 )2 ] = (y1 − y2 )2 e−y1 dy1 dy2 = 2e−y2 dy2 = 2
0 y2 0
V (Y ) = E(Y 2 ) − [E(Y )]2 = 2 − 12 = 1
c No, since
Z ∞ Z ∞ Z ∞
−y1
P (Y1 − Y2 > 2) = e dy1 dy2 = e−(2+y2 ) dy2 = e−2 = 0.1353.
0 2+y2 0
7.29 Let Y1 = number of persons between the ages of 18 and 24, Y2 = number of persons between
the ages of 25 and 44, and Y3 = number of persons between the ages of 45 and 64. Then
(Y1 , Y2 , Y3 ) has a multinomial distribution with parameters n = 5, p1 = 0.21, p2 =
0.28 + 0.19, p3 = 0.32. Therefore,
5!
P (Y1 = 2, Y2 = 2, Y3 = 1) = (0.21)2 (0.47)2 (0.32)1 = 0.09352.
2!2!1!
7.31 Let Y = number of applicants that have a college degree. Then, assuming applicants are selected
independently, Y has a binomial distribution with parameters n = 5, p = 0.10, and
5
P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.10)0 (0.90)5 = 1 − (0.90)5
0
= 1 − 0.59040 = 0.40951.
7.33 Let Y = number of items containing at least one defect. Then Y has a binomial distribution with
parameters n = 10, p = 0.10 + 0.05 = 0.15.
10
a P (Y = 2) = (0.15)2 (1 − 0.15)10−2 = 45(0.15)2 (0.85)8 = 0.2759
2
10
b P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (0.15)0 (0.85)10 = 1 − (0.85)10
0
= 1 − 0.19687 = 0.80313
r2 + r − rp
=
p2
102 Chapter 7: Multivariate Probability Distributions
2
r2 + r − rp r r(1 − p)
V (Y ) = E(Y 2 ) − [E(Y )]2 = − = .
p2 p p2
7.37 Let µi , σi denote the mean and variance of Xi , i = 1, 2. Then
MY (t) = E et(aX1 +bX2 ) = E etaX1 E etbX2 = MX1 (ta)MX2 (tb)
2 2 2 2 2 2 2 2 2
= etaµ1 +(ta) σ1 /2 etbµ2 +(tb) σ1 /2 = et(aµ1 +bµ2 )+t (a σ1 +b σ2 )/2
which is the moment-generating function of a normally distributed random variable with mean
µY = aµ1 + bµ2 , and variance σY2 = a2 σ12 + b2 σ22 .
7.39 Note that X1 has a normal distribution with parameters µ1 = 5, 000, σ12 = (300)2 , and X2
has a normal distribution with parameters µ2 = 4, 000, σ22 = (400)2 . From Exercise 5.38,
Y = X2 − X1 has a normal distribution with parameters µY = µ2 = µ1 = 4, 000 − 5, 000 =
−1, 000, σY2 = σ12 + σ22 = (300)2 + (400)2 = 250, 000 = (500)2 .
P(overload) = P (X2 > X1 ) = P (X2 − X1 > 0) = P (Y > 0)
Y − (−1, 000) 0 − (−1, 000)
=P > = P (Z > 2) = 0.5 − 0.4772 = 0.0228
500 500
∞ 1/2
1 1 1 1 1
Z Z
b P (x2 ≥ |X1 = ) = f (x2 |x1 = )dx2 = dx =
4 2 1/4 2 1/4
1 2 2
2
1
x1 −1
= 0 ≤ x2 ≤ x1 ≤ 1
f (x1 , x2 ) 1
Z
x1 lnx2
c f (x1 |x2 ) = = 1
f (x2 ) dx1
x
x2 1
0 otherwise
So
Z 1
1 1 1 1 ln2
P X1 ≥ |X2 = = − dx1 = (lnx1 |11/2 ) =
2 4 1/2 1 ln4 ln4
x1 ln
4
Z 1
1
(x1 + x2 )dx2 = x1 + 0 ≤ x1 ≤ 1
7.47 a f (x1 ) = 0 2
0 otherwise
Z 1
1
(x1 + x2 )dx1 = x2 + 0 ≤ x2 ≤ 1
f (x2 ) = 0 2
0 otherwise
1 1
b f (x1 )f (x2 ) = x1 + x2 + 6= f (x1 , x2 ) = x1 + x2 , for 0 ≤ x1 ,
2 2
x2 ≤ 1. Therefore, X1 and X2 are not independent.
1
f (x1 , x2 ) (x1 + x2 )/(x2 + ) 0 ≤ x1 , x2 ≤ 1
c f (x1 |x2 ) = = 2
f (x2 ) 0 otherwise
1 1 1
1 x2 1 1 1
Z Z Z
7.53 a E(X1 X2 ) = x1 x2 (x1 + x2 )dx1 dx2 = x2 + dx2 = + =
0 0 0 3 2 6 6 3
1
1 1 1 7
Z
E(Xi ) = xi xi +
dxi = + = for i = 1, 2
0 2 3 4 12
1 7 7 1
Cov(X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 ) = − =−
3 12 12 144
7 7 7
b E(3X1 − 2X2 ) = 3E(X1 ) − 2E(X2 ) = 3 −2 =
12 12 12
c Note that
Z 1
1 1 1 5
E(Xi2 ) = x2i xi + dxi = + = , i = 1, 2
0 2 4 6 12
2
5 7 11
V (Xi ) = − = , i = 1, 2
12 12 144
so we have
V (3X1 − 2X2 ) = 9V (X1 ) + 4V (X2 ) + 2(3)(−2)Cov(X1 , X2 )
11 11 1 155
=9 +4 − 12 − = = 1.0764.
144 144 144 144
7.55 Let X = number of defectives selected. Then X|p has a binomial distribution with parameters
n = 3, p, and
n x
P (X = x|p) = p (1 − p)n−x
x
n x
P (X = x, p) = p (1 − p)n−x
x
Z 1
n x
P (X = x) = p (1 − p)n−x dp.
0 x
Therefore,
Z 1 Z 1 Z 1
3
P (X = 2) = p2 (1 − p)3−2 dp = 3p2 (1 − p)dp = 3(p2 − p3 )dp
0 2 0 0
1 1 1
=3 − = .
3 4 4
7.57 Let G = net daily gain = X − Y . Then
E(G) = E(X − Y ) = E(X) − E(Y ) = µ − αβ = 50 − 4(2) = 42
V (G) = V (X) + V (Y ) − 2Cov(X, Y ) = V (X) + V (Y ) = σ 2 + αβ 2
= 10 + 4(2)2 = 26
and, using Tchebysheff’s theorem,
70 − E(G)
P (G > 70) = P [G − E(G) ≥ 70 − E(G)] ≤ P |G − E(G)| ≥ p
V (G)
p !2
p V (G) 26 26
× V (G) ≤ = 2
= = 0.03.
70 − E(G) (70 − 42) 784
Therefore, it is unlikely that her net gain for tomorrow will exceed $70.
Chapter 7: Multivariate Probability Distributions 105
1
7.59 We are given that f (x2 |X1 = x1 ) = for 0 ≤ x2 ≤ x1 ≤ 1. Then
x1
Z ∞ Z x1
1 x1
E(X2 |X1 = x1 ) = x2 f (x2 |x1 )dx2 = x2 dx2 = .
−∞ 0 x1 2
Therefore,
3 3
E X2 |X1 = = .
4 8
r r r 1
P P P r
7.61 E(X) = E Xi = E(Xi ) = =
i=1 i=1 i=1 p p
and, since the Xi ’s are independent, we have
r r r 1−p
P P P r(1 − p)
V (X) = V Xi = V (Xi ) = 2
= .
i=1 i=1 i=1 p p2
106 Chapter 8: Statistics, Sampling Distributions and Control Charts
Chapter 8
b x̄ = 19.96/12 = 1.6633
s2 = (35.7092 − (19.96)2 /12)/11 = 0.2281
c Using Tchebysheff’s theorem with k = 2, we estimate µ and σ by x̄ and s, respectively, to get
(x̄ − 2s, x̄ + 2s) = (0.7081, 2.6185).
107
108 Chapter 8: Statistics, Sampling Distributions and Control Charts
8.5 median
sample 1 111.4
sample 2 323.2
sample 3 169.3
sample 4 129.9
The data for single-family housing prices is right-skewed, meaning that the median may provide
a better estimation of the “center” of the data.
50
! √ √ !
X 200 n(Ȳ − µ) 50(4 − µ)
8.15 P Yi > 200 =P Ȳ > = P (Ȳ > 4) = P <
i=1
50 σ 2
√ !
50(4 − µ)
≈P Z> = 0.95
2
√
4−µ
This equation is true for 50 = −1.645, i.e:
2
(1.645)(2)
µ=4+ √ = 4.4653
50
100
! √ √
X n(X̄ − µ) n((120/n) − 1.5)
8.17 P Xi < 120 = P <
i=1
σ 1
√
n((120/n) − 1.5)
≈P Z<
1
√
n((120/n) − 1.5)
This probability equals 0.1 for = −1.28 since:
1
P (Z < −1.28) = 0.1
√
Thus, we have 1.5n − 1.28 n − 120 = 0. Using the quadratic formula, we have:
p
√ 1.28 ± (−1.28)2 − 4(1.5)(−120)
n= = −8.5278 or 9.3811
2(1.5)
Using the positive root, we have n = (9.3811)2 = 88.0052 ≈ 88
8.19 P (|X̄ − Ȳ − (µ1 − µ2 )| ≤ 0.04)
!
0.04
q
2 2
= P (X̄ Ȳ − (µ1 − µ2 ))/ (σ1 /n1 ) + (σ2 /n2 ) ≤ p
(0.01/n) + (0.02/n)
√
n(0.04)
=P |Z| ≤ √
0.03
√
n(0.04)
This probability equals 0.90 for √ = 1.645, i.e:
0.03
2
n = 0.03 1.645
0.04 = 50.74 ≈ 51
8.23 a The t-score for a sample mean resistance of 202 would be:
x̄ − µ 202 − 200
t= √ = √ = 0.775
s/ n 10/ 15
The t-score for a sample mean resistance of 199 would be:
x̄ − µ 199 − 200
t= √ = √ = −0.387
s/ n 10/ 15
With 15 − 1 = 14 degrees of freedom, our problem then becomes:
P (199 < X̄ < 202) = P (−0.387 < T < 0.775) = 0.4221
b If the total resistance of the 15 resistors is 5100 ohms, then the mean resistance would be
5100/15 = 340
The t-score for a sample mean resistance of 340 would be:
x̄ − µ 340 − 200
t= √ = √ = 54.22
s/ n 10/ 15
With 14 degrees of freedom, our problem then becomes: P (X̄ < 340) = P (T < 54.22) ≈ 1
8.25 a Assuming that the population of downtimes each day is normally distributed, then the t-score for
a mean downtime of 5 hours would be:
x̄ − µ 5−4
t= √ = √ = 5.59
s/ n 0.8/ 20
The t-score for a mean downtime of 1 hour would be:
x̄ − µ 1−4
t= √ = √ = −16.77
s/ n 0.8/ 20
With 20 − 1 = 19 degrees of freedom, our problem then becomes:
P (1 < X̄ < 5) = P (−16.77 < T < 5.59) ≈ 1
b If the total downtime over 20 days is 115 hours, then the mean downtime is 115/20 = 5.75. The
t-score for a mean downtime of 5.75 hours would be:
x̄ − µ 5.75 − 4
t= √ = √ = 9.78
s/ n 0.8/ 20
With 19 degrees of freedom, our problem then becomes: P (X̄ < 5.75) = P (T < 9.78) ≈ 1
c We must assume that the population of downtimes each day is normally distributed and that the
20 days are chosen randomly (as opposed to consecutively, etc.)
8.29 Let Y = number of customers out of the sample of 40 who make a purchase. Then Y has a
binomial distribution with parameters n = 40, p = 0.3. Let X be a normally distributed random
variable with parameters:
µ = np = 40(0.3) = 12 and σ 2 = np(1 − p) = 40(0.3)(0.7) = 8.4
X −µ 14.5 − 12
P (Y ≥ 15) ≈ P (X ≥ 14.5) = P ≥ √
σ 8.4
= P (Z ≥ 0.86) = 0.5 − 0.3051 = 0.1949
8.31 Let C = capacitor capacitance. Then
50 − 53
P (C < 50) = P Z < = P (Z < −1.5) = 0.06668
2
Let Y = the number of capacitors with capacitances below 50µF out of the sample of 64. Then
Y has a binomial distribution with parameters n = 64, p = P (C < 50) = 0.0668. Let X be a
normally distributed random variable with parameters:
µ = np = 64(0.0668) = 4.2752 and σ 2 = np(1 − p) = 64(0.0668)(0.9332) = 3.9896
X −µ 11.5 − 4.2752
P (Y ≥ 12) ≈ P (X ≥ 11.5) = P ≥ √
σ 3.9896
= P (Z ≥ 3.62) ≈ 0
8.33 a Let Y=number of right-turning vehicles out of the sample of 500. Then Y has a binomial distri-
bution with parameters n = 500, p = 1/3. Let X be a normally distributed random variable with
parameters:
µ = np = 500/3 and σ = np(1 − p) = 500(1/3)(2/3) = 1000/9
!
X −µ 150.5 − 500/3
P (Y ≤ 150) ≈ P (X ≤ 150.5) = P ≤ p
σ 1000/9
= P (Z ≤ −1.53) = 0.5 − 0.4370 = 0.0630
b Let U = number of vehicles out of the sample of 500 that proceed straight ahead. Then U has
the same distribution as Y in part (a), so:
P (at least 350 turn) = P (U ≤ 150) = P (Y ≤ 150) = 0.0630
8.35 Let Y = number of bids the firm wins out of the sample of 25. Then Y has a binomial distribution
with parameters n = 25 and p = 0.6. Let X be a normally distributed random variable with
parameters:
µ = np = 25(0.6) = 15 and σ 2 = np(1 − p) = 25(0.6)(0.4) = 6
X −µ 19.5 − 15
a P (Y ≥ 20) ≈ P (X ≥ 19.5) = P ≥ √ = P (Z ≥ 1.84)
σ 6
= 0.5 − 0.4671 = 0.0329
b P (Y ≥ 20) = 1 − F (19) = 1 − 0.971 = 0.029
c The assumption that contracts are awarded independently of each other is necessary for the
answers in (a) and (b) to be valid.
112 Chapter 8: Statistics, Sampling Distributions and Control Charts
8.53 Because the sample sizes are large, the sampling distributions of trial times for operators A and
B will be approximately normally distributed. Thus, the difference in the sample means will be
normally distributed with a mean of µ1 − µ2 = 15 − 15 = 0 and a standard deviation of:
s r
σ12 σ22 22 22
σx̄1 −x̄2 = + = + = 0.3266
n1 n2 75 75
The z-score for a difference in sample means of x̄1 − x̄2 = 5 seconds would be:
(x̄1 − x̄2 ) − (µ1 − µ2 ) 5−0
z= = = 15.31
σx̄1 −x̄2 0.3266
Our problem then becomes: P (X̄1 − X̄2 > 5) = P (Z > 15.31) ≈ 0
8.65 Since λ is both the mean and variance for the Poisson distribution, control limits for λ may
be simultaneously viewed as limits for the mean or variance. Hence, a seperate method is not
necessary.
Estimation
117
118 Chapter 9: Estimation
9.15 We need to estimate the percent of shrinkage to within B = 0.2 with confidence coefficient
1 − α = 0.98. From Exercise 7.11, the standard deviation is given as 1.2. Using this information,
we get
n = (z0.01 σ/B)2 ≈ (2.33(1.2)/0.2)2 = (13.98)2 = 195.44 or n = 196.
9.17 The sample size for estimating the proportion of failing resistors is desired with B = 0.05 and
1 − α = 0.95. From Exercise 7.15, we estimate p to be approximately 0.12. Then,
p p
n = (z0.025 p(1 − p)/B)2 ≈ (1.96 (0.12)(0.88)/0.05)2 = 162.27 or n = 163.
9.19 To estimate the proportion of cracked supports to within B = 0.1 with 1 − α = 0.98 and p
estimated by 0.4, we have
p p
n = (z0.01 p(1 − p)/B)2 = (2.33 (0.4)(0.6)/0.1)2 = 130.29 or n = 131.
n n
x2i = 1, 426, and n = 12. Thus,
P P
9.21 From the LC50 measurements, we find xi = 108,
i=1 i=1
2 1/2
(108)
108 1426 −
x̄ = = 9 and s =
12 = 6.4244. With 11 degrees of freedom, a 90%
12 11
9.25 A 95% confidence interval for the population variance of LC50 measurements, where from Exercise
6.20, n = 12, and s2 = 41.2727, is
(n − 1)s2 (n − 1)s2
11(41.2727) 11(41.2727)
, = ,
x20.025 (11) x20.975 (11) 21.9200 3.81575
= (20.7117, 118.9805).
The chi-square values are taken from χ2 table, with 11 degrees of freedom.
Chapter 9: Estimation 119
9.27 Assuming a normal population of TSP measurements, then, with x̄ = 72, s = 23, and n = 9, a
95% confidence interval for the true mean TSP is
√ √
x̄ ± t0.025 s/ n = 72 ± 2.306(23)/ 9 = (54.3207, 89.6793).
The t-table is used to find t0.025 with 8 degrees of freedom.
9.29 Given x̄ = 585, s = 38, and n = 12, we seek a 95% confidence interval for the mean tensile
strength. Assuming a normal population of tensile strength measurements and using t-table, we
find
√ √
x̄ ± t0.025 s/ n = 585 ± 2.201(38)/ 12 = (560.8558, 609.1442).
9.31 From Exercise 7.29, s2 = 59117.75 and n = 9. Then, assuming a normal population and using χ2
table, we compute a 90% confidence interval for the variance of the number of cycles to failure as
(n − 1)s2 (n − 1)s2
8(59117.75) 8(59117.75)
, = ,
x20.05 (8) x20.95 (8) 15.5073 2.73264
= (30, 498.0235, 173, 071.4620).
9.33 No, the statement refers to a particular confidence interval. The probability that this particular
confidence interval contains the true population proportion, p, is 1 (if it in fact does contain p)
or 0 (if it does not contain p). The correct interpretation is that, in repeated sampling, 95% of
similarly constructed confidence intervals will contain the true population proportion of Americans
that list football as their favorite sport.
= (−0.0406, −0.0194).
If coating B were superior in inhibiting corrosion, then it should have shallower pit depths. Thus,
µA − µB would be positive. Since the confidence interval includes negative values (in fact the
entire interval is composed of negative values), we cannot conclude that coating B is better than
coating A.
9.39 The largest possible variance would correspond to p1 = p2 = 0.5. Then we require
(half the width of confidence interval) ≤ Bound = B,
r
(0.5)(0.5) (0.5)(0.5)
1.96 + ≤ 0.1.
n n
Solving for n yields
(0.5)(0.5) + (0.5)(0.5)
n ≥ (1.96)2 n ≥ 192.08 so that n = 193.
(0.1)2
120 Chapter 9: Estimation
9.41 For a 95% confidence interval for the difference between the population means, first we compute
(n1 − 1)s21 + (n2 − 1)s22 (8)(5.88)2 + (6)(7.68)2
s2p = = = 45.0350
n1 + n2 − 2 9+7−2
and sp = 6.7108 so that with degrees of freedom = 14, the required interval is
r r
1 1 1 1
ȳ1 − ȳ2 ± t0.025 sp + = 43.71 − 39.63 ± 2.145(6.7108) +
n1 n2 9 7
= (−3.1742, 11.3342).
Using F table, a 90% confidence interval for the ratio of the true variances is
2
s22 (7.68)2 1 (7.68)2
s2 1
, F 0.05 (8, 6) = , (4.15)
s21 F0.05 (6, 8) s21 (5.88)2 3.58 (5.88)2
= (0.4765, 7.0797).
If intermittent training gives more variable results, then s22 /s21 > 1. Since this confidence interval
includes the value 1, we cannot conclude that intermittent training gives more variable results.
(n1 − 1)s21 + (n2 − 1)s22 2(0.02)2 + 2(0.07)2
9.43 First we compute s2p = = = 0.00265 so that
n1 + n 2 − 2 3+3−2
sp = 0.05148. Assuming normal populations with equal variance, a 95% confidence interval for
the difference between mean impulses for the two rackets is
r
1 1
x̄1 − x̄2 ± t0.025 sp + (degrees of freedom = 4)
n1 n2
r
1 1
= 2.41 − 2.22 ± 2.776(0.05148) + = (0.0733, 0.3067).
3 3
(n1 + 1)s21 + (n2 − 1)s22 14(0.04)2 + 6(0.05)2
9.45 We compute s2p = = = 0.00187 with
n1 + n2 − 2 15 + 7 − 2
sp = 0.04324. Then a 98% confidence interval for the difference between true mean H/C ratios
for altered and partly altered bitumen is
r
1 1
x̄1 = x̄2 ± t0.01 sp + (degrees of freedom = 20)
n1 n2
r
1 1
= 1.02 − 1.16 ± 2.528(0.04324) + = (−0.1900, −0.08996).
15 7
(n1 − 1)s21 + (n2 − 1)s22 39(0.14)2 + 19(0.04)2
9.47 From the given data, s2p = = = 0.01370 so that
n1 + n2 − 2 40 + 20 − 2
sp = 0.1171. Assuming normal populations and equal variances, we compute a 98% confidence
interval for the difference between mean HCl amounts asr follows:
r
1 1 1 1
x̄1 − x̄2 ± t0.01 sp + = 1.26 − 1.40 ± 2.33(0.1171) +
n1 n2 40 20
= (−0.2147, −0.0653).
9.49 With the aid of F table, a 90% confidence interval for the ratio of variances in Exercise 7.47 is
2
s22 (0.6990)2 1 (0.6990)2
s2 1
, F 0.05 (6, 7) = , (3.87)
s21 F0.05 (7, 6) s21 (0.2059)2 4.21 (0.2059)2
= (2.7375, 44.6018).
Chapter 9: Estimation 121
9.51 Let µ1 be the mean yield of 14mm rods and µ2 the mean yield of 10mm rods. We seek to estimate
µ1 − µ2 , the increase in the mean yield stress where
50(17.2)2 + 11(26.7)2
s2p = = 371.0457 sp = 15.6134.
51 + 12 − 2
Assuming normal population and equal variance, a 90% confidence interval with 61 degrees of
freedom is r
r
1 1 1 1
x̄1 − x̄2 ± t0.05 sp + = 499 − 485 ± 1.645(19.2625) + = (3.834, 24.1666).
n1 n2 15 12
9.53 The sampling error represents half the width of a confidence interval. In this case,
r
(0.6)(0.4) (0.5)(0.5)
zα/2 + = 0.045.
1000 1000
Thus zα/2 = 2.033 so that 1 − α ≈ 2(0.4788) = 0.9576. Hence, a sampling error of 0.045 corre-
sponds approximately to a 95% confidence interval for p1 − p2 . An approximate 95% confidence
interval for p1 − p2 is
r r
p̂1 (1 − p̂1 ) p̂(1 − p̂2 ) (0.6)(0.4) (0.5)(0.5)
p̂1 − p̂2 ± z0.025 + = 0.6 − 0.5 ± 1.96 +
n1 n2 1000 1000
= (0.0566, 0.1434).
2πσ σ
n
2
n 1 P xi − µ
Instead, we maximize L(µ, σ2 ) = − ln(2πσ 2 ) − .
2 2 i=1 σ
Then,
∂lnL(µ, σ 2 )
n x µ̂
P i
µ = µ̂ = ( )=0
∂µ
i=1 σ̂
2 2
σ = σ̂
n n x
P P i
so that xi − nµ̂ = 0 and hence µ̂ = = x̄. Next,
i=1 i=1 n
∂lnL(µ, σ 2 )
n
n 2π 1 P
µ = µ̂ = − + (xi − µ̂)2 = 0
∂σ 2 2 2πσ̂ 2 2σ̂ 4 i=1
σ 2 = σ̂ 2
Simplifying,
n
−σ̂ 2 + (xi − x̄)2 = 0 and hence
P
i=1
Pn 2
2 i=1 (xi − x̄) n−1 2
σ̂ = = s .
n 2
Chapter 9: Estimation 123
9.71 Since percentage points of the gamma distribution are not generally available, we will attempt
to express the distribution of Xi (gamma (α = 2, β)) in terms of the chi-square distribution
for which tables are available. Recall that the moment-generating function for the chi-square (ν)
distribution is (1 − 2t)−ν/2 .
−2
tβ 2t 2t
Since MXi (t) = (1−βt)−α = 1 − 2 , we have MXi = (1−2t)−2 . But MXi =
2 β β
E(e(2t/β)X
) = E(e
i t(2Xi /β)
) = M2Xi /β (t). This suggests defining Ui = 2Xi /β so that MUi (t) =
2t
MX i = (1 − 2t)−α = (1 − 2t)−2α/2 . Hence Ui is chi-square (ν = 2α = 2(2) = 4). The
β
n
P
sum of independent chi-square (νi ) random variables is again chi-square (νi = νi ). Thus, for
i=1
X1 , . . . , Xn , a random sample from a gamma (α = 2, β) distribution,
n Xn
X 2 2nx̄
Ui = Xi =
i=1
β i=1 β
The slope of L(θ) 6= 0, for any θ > 0, so there is no point in taking the derivative of L(θ).
However, L(θ) increases as θ decreases to zero. Thus we take θ as small as possible. But since
0 < Xi < θ, θ cannot be smaller than the largest value of Xi . Hence θ̂ = max Xi . In this case,
1≤i≤3
the maximum likelihood estimate of θ is θ̂ = max Xi = 0.06.
1≤i≤3
9.77 A 95% confidence interval for the mean hardness is (degrees of freedom = 14)
√
t0.025 s 2.145 90
x̄ ± √ = 65 ± √ = (59.7458, 70.2542).
n 15
1
(23.85)2 2
23.85 58.323 −
9.79 a We compute x̄ = = 2.385 and the standard deviation s =
10
10 9
= 0.4001. Assuming a normal population, a 95% confidence interval for the mean pre-etch window
width is (degrees of freedom = 9)
t0.025 s 2.262(0.4001)
x̄ ± √ = 2.385 ± √ = (2.0988, 2.6712).
n 10
34.26
b We compute x̄ = = 3.425 and the standard deviation
10
1
(34.26)2 2
120.5412 −
s=
10
9
= 0.5931. Assuming a normal population, a 95% confidence interval for the mean post-etch
window width is
t0.025 s 3.426 ± 2.262(0.5931)
x̄ ± √ = √ = (3.0018, 3.8502).
n 10
(n − 1)s2 (n − 1)s2 9(0.5931)2 9(0.5931)2
9.81 a , = , = (0.1664, 1.1724)
χ20.025 (n − 1) χ20.975 (n − 1) 19.0228 2.70039
2
s22 1 (0.5931)2 1 (0.5931)2
s2 1
b , F 0.05 (9, 9) = , 3.18
s21 F0.05 (9, 9) s21 (0.4001)2 3.18 (0.4001)2
= (0.6910, 6.9879)
c Assume both samples are independent and observations come from normal populations.
1/2
(5.17)2
2.6779 −
9.83 a We calculate s =
10 = 0.02359. Then a 95% confidence interval for the
9
population variance is
(n − 1)s2 (n − 1)s2 9(0.02359)2 9(0.02359)2
, = ,
χ20.025 (n − 1) χ20.975 (n − 1) 19.0228 2.70039
= (0.000263, 0.001855).
b Assume that weight proportions are normally distributed and the samples are independent. The
independence of the samples and the normality of the weight proportions need to be verified.
Chapter 9: Estimation 125
9.85 Let m be the number of observations allocated to sample 1 and let n − m be the number allocated
to sample 2. We need to minimize the length of the confidence interval from µ1 − µ2 given by
r
σ12 σ22
2zα/2 + .
m n−m
σ2 σ22
Equivalently, we will choose m to minimize V (x̄1 − x̄2 ) = 1 + . Taking the derivative of
m n−m
V with respect to m, we get
dV σ2 σ2 σ22
= 12 + 22 + = 0.
dm m m (n − m)2
nσ1 nσ2
Rearranging, we have σ12 (n − m)2 = m2 σ22 so that m = and hence n − m = .
σ1 + σ2 σ1 + σ 2
9.87 A point estimate of 2µ1 + µ2 is 2ȳ + x̄. Then V (2ȳ + x̄) = 4V (ȳ) + V (x̄)
σ2 3σ 2
4 3
=4 + = σ2 + . An estimate of the common variance σ 2 is given by
n m n m
(m − 1)s2x
(n − 1)s2y +
s2p = 3 .
n+m−2
A 95% confidence interval for 2µ1 + µ2 is then
r
4 3
2ȳ + x̄ ± tα/2 sp +
n m
where sp is the square root of s2p defined above and degrees of freedom = n + m − 2.
9.89 The number of defectives, X, is distributed binomial (n, p). The maximum likelihood estimate of
x number of defectives
p for fixed n is p̂ = . We seek the maximum likelihood estimate for r = .
n number of good items
p
Dividing numerator and denominator of r by the total number of items gives r = . Hence
1−p
r = g(p) is a function of p. The maximum likelihood estimate of r is then given by
x
n x
r̂ = g(p̂) = x = n − x.
1−
n
126 Chapter 10: Hypothesis Testing
Chapter 10
Hypothesis Testing
127
128 Chapter 10: Hypothesis Testing
10.19 Hypotheses: H0 : µ = 7 Ha : µ 6= 7
x̄ − µ0 6.8 − 7
Test Statistics: z = √ ≈ √ = −1.22
σ/ n 0.9/ 30
Rejection Region: |z| > z0.025 = 1.96
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the mean pH is significantly different from 7 at α = 0.05.
P-value: P (|Z| ≥ 1.22) = 2(P (Z ≥ 1.22)) = 2(0.5 − 0.3888) = 0.2224
10.21 Hypotheses: H0 : p ≥ 0.9 Ha : p < 0.9
p̂ − p0 35/40 − 0.9
Test Statistics: z = p =p = −0.53
p0 (1 − p0 )/n (0.9)(0.1)/40
Rejection Region: z < −z0.01 = −2.33
Conclusion: Fail to reject H0 at α = 0.01; i.e., there is insufficient evidence to conclude that
the specification is not being met at α = 0.01.
10.23 Summary Statistics: x̄ = 14, 510/6 = 2, 418.33
1/2
s = (35, 121, 500 − (14, 510)2 /6)/5 = 79.3515
Hypotheses: H0 : µ = 2500 Ha : µ < 2500
x̄ − µ0 2, 418.33 − 2, 500
Test Statistics: Assuming a normal population, t = √ = √
s/ n 79.3515/ 6
= −2.52
Rejection Region: t < −t0.01 = −3.365 (degrees of freedom = 5)
Conclusion: Fail to reject H0 at α = 0.01; i.e., there is insufficient evidence to conclude that
the mean range of the rockets is less than 2500 after storage.
10.25 Hypotheses: H0 : µ = 30 Ha : µ 6= 30
Test Statistics: Assuming that the stress resistance measurements are normally distributed,
x̄ − µ0 27.4 − 30
t= √ = √ = −7.47
s/ n 1.1/ 10
Rejection Region: |t| > t0.025 = 2.262 (degrees of freedom = 9)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to doubt the specification
for stress resistence of the plastic at α = 0.05.
10.27 Summary Statistics: x̄ = 34.26/10 = 3.426
1/2
s = (120.5412 − (34.26)2 /10)/9 = 0.5931
Hypotheses: H0 : µ = 3.5 Ha : µ 6= 3.5
x̄ − µ0
Test Statistics: Assuming the post-etch window widths are normally distributed, t = √ =
s/ n
3.426 − 3.5
√ = −0.3945
0.5931/ 10
Rejection Region: |t| > −t0.025 = 2.262 (degrees of freedom = 9)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the specifications are being violated at α = 0.05.
Chapter 10: Hypothesis Testing 129
x̄ − µ0 6.3889 − 6.5
t= √ = √ = −0.753
s/ n 0.4428/ 9
Rejection Region: |t| > t0.025 = 2.306 (degrees of freedom = 8)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude the
mean pH is different from the claimed value of 6.5 at α = 0.05.
10.35 Hypotheses: H0 : σ 2 ≤ 100 Ha : σ 2 > 100
(n − 1)s2 14(12)2
Test Statistics: χ2 = 2 = = 20.16
σ0 100
Rejection Region: χ2 > χ20.05 = 23.6848 (degrees of freedom = 14)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the standard deviation of haul times exceeds the claimed value of 10 minutes at α = 0.05.
10.53 Since the brands of gasoline are common to both automobiles, we block on the brands and analyze
the experiment as a paired sample experiment. We compute the differences for auto A minus auto
B: −0.9, −1, 0.9, 0.7, −0.2.
Summary Statistics: x̄D = −0.1, sD = 0.8803, n = 5
Hypotheses: H0 : µ D = 0 Ha : µD 6= 0
Test Statistics: Assuming a normal population of differences in gas mileage,
−0.1
t= √ = −0.254.
0.8803/ 5
Rejection Region : |t| > t0.025 = 2.776 (degrees of freedom = 4)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude a
significant difference between mean mileage figures for the two automobiles at α = 0.05.
132 Chapter 10: Hypothesis Testing
10.55 Since the type of powder is common to both procedures, we block on powder type and analyze
the experiment as a paired sample experiment. The differences for procedure I minus procedure
II are: −2, 1, −3, −2, 1, 3.
Summary Statistics: x̄D = −1/3, sD = 2.3381, n=6
Hypotheses: H0 : µD = 0 Ha : µD 6= 0
Test Statistics: Assuming a normal population for the porosity differences,
−0.3333
t= √ = −0.349.
2.3381/ 6
Rejection Region: |t| > t0.025 = 2.571 (degrees of freedom = 5)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the mean porosities of the two procedures significantly differ at α = 0.05.
10.57 Hypotheses: H0 : σ12 = σ22 Ha : σ12 6= σ22
Test Statistics: Assuming normal populations,
F = s21 /s22 = (79.3515)2 /(76.7898)2 = 1.07.
Rejection Region: F > F0.05 (5, 5) = 5.05 or F < 1/F0.05 (5, 5) = 1/5.05
= 0.198
Conclusion: Fail to reject H0 at α = 0.10; i.e., there is insufficient evidence to conclude that the
variance among range measurements significantly differs for the two storage methods at α = 0.10.
10.59 Hypotheses: H0 : µd = 0 and Ha : µd > 0
Since the data are paired, we should calculate the difference between each pair of measurements:
217 − 95 = 122
252 − 107 = 145
269 − 109 = 160
271 − 113 = 158
291 − 115 = 176
291 − 118 = 173
291 − 118 = 173
293 − 119 = 174
311 − 119 = 192
Total 1473
From this new data set we get:
x̄d = 163.67 sd = 20.51
Assuming that the populations of difference meausrements are normally distributed, we can cal-
culate the t-score for our data:
x̄d − µd 163.67 − 0
t= √ = √ = 23.94
sd / n 20.51/ 9
With 9 − 1 = 8 degrees of freedom our test statistic has a p-value of p = P (T > 23.94) ≈ 0
Because the p-value ≈ 0 is smaller than α = 0.05, reject the null hypothesis, as there is sufficient
evidence to conclude that, at the 5% significance level, the mean difference in the number of FETI
iterations without reconjugation and those with reconjugation is different from 0.
Chapter 10: Hypothesis Testing 133
10.69 Hypotheses: H 0 : p 1 = p2 Ha : p1 6= p2
Test Statistics: Observed and expected (values in parentheses) cell counts are given in the table
below.
Inspector
A B Totals
Top category 18(19) 20(19) 38
Lower category 7(6) 5(6) 12
Totals 25 25 50
(18 − 19)2 (5 − 6)2
χ2 = + ··· + = 0.44
19 6
2 2
Rejection Region: χ > χ0.05 = 3.84146
(degrees of freedom = (2−1)(2−1) = 1)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that
the inspectors significantly differ in their assessments at α = 0.05.
10.71 a We can calculate the expected value for each entry:
Athletic Involvement
None 1-3 Semesters 4+ Semesters
GPA Below Mean (528)(426)/852 = 264 (219)(426) = 109.5 (105)(426) = 52.5
GPA Above Mean (528)(426)/852 = 264 (219)(426) = 109.5 (105)(426) = 52.5
Hypotheses: H0 : GPA is independent of length of athletic involvment; and Ha : GPA is not
independent of length of athletic involvement.
Our test statistic can be calculated as:
c X r
X (Xij − E(Xij ))2
X2 =
j=1 i=1
E(Xij )
(290 − 264)2 (63 − 52.5)2
= + ... + = 13.71
264 52.5
With (2 − 1)(3 − 1) = 2 degrees of freedom, we can calculate the p-value of our test statistic:
p = P (χ2 (2) > 13.71) = 0.0011
Because p = 0.0011 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence
to conclude that, at the 5% level of significance, GPA is not independent of the length of athletic
involvement.
b Hypotheses: H0 : p1 = p2 = 0.5 and Ha : p1 6= p2
We can calculate our test statistic:
c X r
X (Xij − E(Xij ))2
X2 =
j=1 i=1
E(Xij )
(42 − 105(0.5))2 (63 − 105(0.5))2
= + = 4.2
105(0.5) 105(0.5)
With 2 − 1 = 1 degrees of freedom, we can calculate the p-value of our test statistic:
p = P (χ2 (1) > 4.2) = 0.0404
Because p = 0.0404 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence
to conclude that, at the 5% level of significance, for students with 4 or more semesters of athletic
involvement, the proportion of students with GPA below the mean is different than the proportion
of students with GPA above the mean.
136 Chapter 10: Hypothesis Testing
10.81 Since we do not know λ, we must estimate it from the sample data: λ̂ = x̄ = 0.483
Since several groups have less than 5, we must group 8, 7, 6, 5 and 4 together into one group, ≥ 4
with 10 observations.
Hypotheses: H0 : The data are modeled by a poisson distribution;
Ha : The data are not modeled by a poisson distribution.
Using a poisson distribution, we can calculate the expected value of each outcome:
0 255.39
1 123.37
2 29.80
3 4.80
≥4 0.64
Since the sample size is large (n=414), we can compute our test statistic:
X (Fi − E(yi ))2
X2 =
E(yi )
(296 − 255.39)2 (10 − 0.64)2
= + ... + = 165.63
255.39 0.64
Since we had to estimate λ, we have 5 − 1 − 1 = 3 degrees of freedom, and our p-value can be
calculated as:
p = P (χ2 (3) > 165.63) ≈ 0
Because p ≈ 0 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence to
show that the data did not come from a Poisson distribution.
10.83 a The Kolmogorov-Smirnov test is appropriate. The maximum-likelihood estimate of θ from an
exponential distribution is θ̂ = x̄ = 187.8/16 = 11.7375.
i i−1
xi F (xi ) i/n − F (xi ) F (xi ) −
n n
√
0.2 0.5
modified D = D − 16 + 0.26 + √ = 2.28
16 16
Rejection Region: modified D > 1.094 (from Table 8.5)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the
exponential distribution is inadequate at α = 0.05.
b Using F (x) = 1 − e−x/12 , we have,
i i−1
i − F (xi ) F (xi ) −
n n
1 -0.4612 0.5237
2 -0.4026 0.4651
3 -0.3518 0.4143
4 -0.2969 0.3594
5 -0.2456 0.3081
6 -0.1904 0.2529
7 -0.1315 0.1940
8 -0.0726 0.1351
9 -0.0206 0.0831
10 0.0384 0.0241
11 0.0840 -0.0215
12 0.0820 -0.0095
13 0.1188 -0.0563
14 0.1364 -0.0739
15 0.1841 -0.1216
16 0.2213 -0.1588
Hypotheses: H0 : Population has an exponential (θ = 12) distribution
Ha : Population does not have an exponential (θ = 12) distribution
Test Statistics: D = max (D+ , D− ) = max (0.2213, 0.5237) = 0.5237
√
0.11
modified D = 0.5237 16 + 0.12 + √
16
= 2.1720
Rejection Region: modified D > 1.358 (from Table 8.5)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the
exponential (θ = 12) model is inadequate at α = 0.05.
10.85 We wish to test the fit of a Weibull (γ = 2) distribution. Equivalently, we will work with yi = x2i
so that y will have an exponential (θ) distribution. The maximum-likelihood estimate of θ is given
by θ̂ = ȳ = 54, 144.13 and F (y) = 1 − e−y/54,144.13 .
140 Chapter 10: Hypothesis Testing
i i−1
yi F (yi ) i/n − F (yi ) F (yi ) −
n n
i i−1
yi F (yi ) i/n − F (yi ) F (yi ) −
n n
20 − 19.2
10.91 Since = 2.67 > 1.57, we accept the lot.
0.3
10.99 Hypotheses: H0 : µ = 10 Ha : µ 6= 10
9.8 − 10
Test Statistics: Assuming a normal population, t = √ = −1.55.
0.5/ 15
P-value: Let p = P (|t| > 1.55); then p/2 = P (t > 1.55), and hence 0.05 < p/2 < 0.1. Thus,
0.1 < p < 0.2 (degrees of freedom = 14).
Conclusion: Since the P-value is large (say, greater than 0.05), we fail to reject H0 and conclude
that the mean is not significantly different from 10. A two-tailed test is appropriate because the
specification calls for producing exactly 10-ohm resistors and deviations from this specification
should be detected in both directions.
10.101 Hypotheses: H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0
92 − 98
Test Statistics: z = p = −5.60
20/50 + 30/40
Rejection Region: |z| > z0.025 = 1.96
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the mean
resistance to abrasion differs for the two coupling agents at α = 0.05.
10.103 Hypotheses: H 0 : p 1 − p2 = 0
Ha : p1 − p2 > 0 where ’1’ indicates ’before chemical’ and ’2’ indicates ’after chemical’
43 22
Test Statistics: p̂1 = = 0.86 p̂2 = = 0.44
50 50
0.86 − 0.44
z=r = 4.91
(0.86)(0.14) (0.44)(0.56)
+
50 50
Rejection Region: z > z0.025 = 1.96
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that the chemical
significantly reduces the number of samples containing the harmful bacteria at α = 0.025.
10.105 Hypotheses: H0 : µ 1 − µ 2 = 0
Ha : µ1 − µ2 6= 0 where ’1’ denotes ’wood’ and ’2’ denotes ’graphite’
Test Statistics: Assume normal populations with equal variances.
1/2
2(0.2)2 + 2(0.07)2
sp = = 0.05148
3+3−2
2.41 − 2.22
t= r = 4.52
1 1
0.05148 +
3 3
Rejection Region: |t| > t0.025 = 2.776 (degrees of freedom = 3 + 3 − 2 = 4)
Conclusion: Reject H0 at α = 0.05, i.e., there is sufficient evidence to conclude that the mean
impulses differ for the two rackets at α = 0.05.
144 Chapter 10: Hypothesis Testing
10.107 Hypotheses: H0 : Plant species is independent of the species of the nearest neighbor
Ha : Plant species is not independent of the species of the nearest neighbor
a Test Statistics: Observed and expected (values in parentheses) cell frequencies are presented in
the following table.
Nearest Neighbor
A B Totals
A 20(13.44) 4(10.56) 24
Plant
B 8(14.56) 18(11.44) 26
Totals 28 22 50
(20 − 13.44)2 (19 − 11.44)2
χ2 = + ··· + = 13.99
13.44 11.44
Rejection Region: χ2 > χ20.05 = 3.84146
(degrees of freedom = (2 − 1)(2 − 1) = 1)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that plant species
and nearest neighbor plant species are dependent at α = 0.05.
b The test statistic is χ2 = 13.99 and the same rejection region and conclusion hold as in part (a).
c Test Statistics: Observed and expected (values in parentheses) cell frequencies are presented in
the following table.
Nearest Neighbor
A B Totals
Sampled A 20(13.44) 4(5.76) 24
Plant B 18(14.56) 8(6.24) 26
Totals 28 12 50
(20 − 18.24)2 (8 − 6.24)2
χ2 = + ··· + = 5.05
18.24 6.24
Rejection Region: χ2 > χ20.05 = 3.84146
(degrees of freedom = (2 − 1)(2 − 1) = 1)
Conclusion: Reject H0 at α = 0.05; i.e., there is sufficient evidence to conclude that plant species
and nearest neighbor plant species are dependent at α = 0.05.
Chapter 10: Hypothesis Testing 145
10.109 Hypotheses: H0 : p1 = p2 Ha : p1 6= p2
Test Statistic: Observed and expected (values in parentheses) cell counts are listed in the table
below. 1 = Response to T-intersection question and 2 = Response to four-legged-intersection
question.
1
Correct Incorrect Totals
Correct 141(90.4394) 145(195.5606) 286
2 Incorrect 13(63.5606) 188(137.4394) 201
Totals 154 333 487
P (Xij − E(Xij ))2
x2 =
ij E(Xij )
(141 − 90.4394)2 (145 − 195.5606)2 (13 − 63.5606)2 (188 − 137.4394)2
= + + +
90.4394 195.5606 63.5606 137.4394
= 100.1577
Rejection Region: x2 > x20.05 = 3.84146
(degrees of freedom = (2 − 1)(2 − 1) = 1)
Conclusion: Reject H0 at α = 0.05; i.e., knowledge of T-intersections is not independent of
four-legged intersections.
10.111 a Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0
We can calculate our sample means and sample standard deviations:
x̄1 = 0.0837 x̄2 = 0.1017
s1 = 0.0218 s2 = 0.0120
Assuming that the populations are normally distributed, we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (0.0837 − 0.1017) − 0)
t= q 2 = q = −1.251
s1 s22 0.02182 0.01202
n1 + n2 3 + 3
Degrees of freedom:
2 2 2
s22
s1 0.02182 0.01202
n1 + n2 3 + 3
v = (s2 /n )2 (s22 /n2 )2
= (0.02182 /3)2 (0.01202 /3)2
= 3.11 ≈ 4
1 1
n1 −1 + n2 −1 3−1 + 3−1
Our test statistic has a p-value of p = 2P (T < −1.251) = 2(0.1396) = 0.2792
Because the p-value of 0.2792 is greater than α = 0.05, do not reject the null hypothesis, as there
is not sufficient evidence to conclude that, at the 5% significance level, there is a difference in the
mean distances for BDC and MR semiactive dampers.
146 Chapter 10: Hypothesis Testing
b Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0
We can calculate our sample means and sample standard deviations:
x̄1 = 0.1510 x̄2 = 0.1703
s1 = 0.0475 s2 = 0.0506
Assuming that the populations are normally distributed, we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (0.1510 − 0.1703) − 0)
t= q 2 = q = −0.4824
s1 s22 0.04752 0.05062
n1 + n2 3 + 3
Degrees of freedom:
2 2 2
s22
s1 0.04752 0.05062
n1 + n2 3 + 3
v = (s2 /n )2 (s22 /n2 )2
= (0.04752 /3)2 (0.05062 /3)2
= 3.98 ≈ 4
1 1
n1 −1 + n2 −1 3−1 + 3−1
Our test statistic has a p-value of p = 2P (T < −0.4824) = 2(0.3274) = 0.6548
Because the p-value of 0.6548 is greater than α = 0.05, do not reject the null hypothesis, as there
is not sufficient evidence to conclude that, at the 5% significance level, there is a difference in the
mean interstory drifts for BDC and MR dampers.
c Hypotheses: H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0
We can calculate our sample means and sample standard deviations:
x̄1 = 3.067 x̄2 = 7.127
s1 = 0.9556 s2 = 0.2307
Assuming that the populations are normally distributed, we can calculate our test statistic:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (3.067 − 7.127) − 0)
t= q 2 = q = −7.1531
s1 s22 0.95562 0.23072
n1 + n2 3 + 3
Degrees of freedom:
2 2 2
s22
s1 0.95562 0.23072
n1 + n2 3 + 3
v = (s2 /n )2 (s22 /n2 )2
= (0.95562 /3)2 (0.23072 /3)2
= 2.23 ≈ 3
1 1
n1 −1 + n2 −1 3−1 + 3−1
Our test statistic has a p-value of p = 2P (T < −7.1531) = 2(0.001) = 0.002
Because the p-value of 0.002 is smaller than α = 0.05, reject the null hypothesis, as there is
sufficient evidence to conclude that, at the 5% significance level, there is a difference in the mean
absolute accelerations measured for BDC and MR dampers.
10.113 a Hypotheses: H0 : µd = 0 and Ha : µd > 0
Since the data are paired, we should calculate the difference between each data pair. From this
new data set, we find that:
x̄d = 15.41 sd = 11.71
Assuming that the population of differences is normally distributed, we can calculate our test
statistic:
x̄d − µd 15.41 − 0
t= √ = √ = 3.482
sd / n 11.71/ 7
With 7 − 1 = 6 degrees of freedom, our test statistic has a p-value of: p = P (T > 3.482) = 0.0066
Because p = 0.0066 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence
to conclude that, at the 5% significance level, the mean difference in the load at first yield and
the load at the first traverse crack is greater than 0.
Chapter 10: Hypothesis Testing 147
The least-squares line does not pass through the data points.
149
150 Chapter 11: Inference For Regression Parameters
P P 2 P
11.3 xi = 10 x = 22.5 xi yi = 61.95
P P 2 i
yi = 27.5 yi = 170.77 n=5
2
(10) (27.5)2
SSxx = 22.5 − = 2.5 SSyy = 170 − = 19.52
5 5
(10)(27.5) SSxy 6.95
SSxy = 61.95 − = 6.95 β̂1 = = = 2.78.
5 SSxx 2.5
27.5 10
β̂0 = ȳ − β̂1 x̄ = − (2.78) = −0.06
5 5
Least-squares line: ŷ = −0.06 + (2.78)x
P P 2 P
11.5 xi = 556 xi = 39, 080 xi yi = 120, 399
P P 2
yi = 1, 756 yi = 391, 720 n=8
(556)2 (1, 756)2
SSxx = 39, 080 − = 438 SSyy = 391, 720 − = 6, 278
8 8
(556)(1, 756) SSxy −1, 643
SSxy = 120, 399 − = −1, 643 β̂1 = = = −3.7511
8 SSxx 438
(1, 756) 556
β̂0 = ȳ − β̂1 x̄ = − (−3.7511) = 480.2043
8 8
Least-squares line: ŷ = 480.2043 − (3.7511)x
P P 2 P
11.7 xi = 180 xi = 2, 900 xi yi = 7, 420
P P 2
yi = 469 yi = 19, 313 n = 12
(180)2 (469)2
SSxx = 2, 900 − = 200 SSyy = 19, 313 − = 982.9167
12 12
(180)(469) SSxy 385
SSxy = 7, 420 − = 385 β̂1 = = = 1.925
12 SSxx 200
(469) 180
β̂0 = ȳ − β̂1 x̄ = − (1.925) = 10.2083
12 12
Least-squares line: ŷ = 10.2083 + 1.925x
ȳ = 9.448 x̄ = 15.504
11.13
SSyy = 1101.2 SSxy = 1546.6 SSxx = 2360.2
SSE = SSyy − β̂1 (SSxy ) = 1101.2 − (1546.6/2360.2)(1546.6) = 87.74
SSE 87.74
s2 = = = 10.97
n−2 10 − 2
√
s = 10.97 = 3.312
This value is, on average, how far the observed flowthrough LC50 is from the predicted value for
a given static LC50.
11.21 We seek a confidence interval for 5β1 . If we assume that the errors are independent, normal
2 σ2
(0, σ ), then β̂1 is distributed normal β1 , . Hence 5β̂1 has mean E(5β̂1 ) = 5E(β̂1 ) =
SSxx
25σ 2 25σ 2
5β1 and V (5β̂1 ) = 25V (β̂1 ) = . Then 5β̂1 is distributed normal 5β1 , . Thus a
SSxx SSxx
95% confidence interval for 5β1 is
√
5s 24.1792
5β̂1 ± t0.025 √ = 5(1.925) ± 2.228(5) √ = (5.752, 13.498)
SSxx 200
(degrees of freedom = 12 − 2 = 10)
152 Chapter 11: Inference For Regression Parameters
P P 2 P
11.23 xi = 55 x = 385 xi yi = 1, 889.09
P P 2i
yi = 348.47 yi = 12, 153.6443 n = 10
2
(55) (348.47)2
SSxx = 385 − = 82.5 SSyy = 12, 153.6443 − = 10.5102
10 10
(55)(348.47)
SSxy = 1, 889.09 − = −27.495
10
SSxy −27.495
a β̂1 = = = −0.3333
SSxx 82.5
348.47 55
β̂0 = ȳ − β̂1 x̄ = − (−0.3333) = 36.68
10 10
Least-squares line: ȳ = 36.68 − 0.3333x
SSyy − (SSxy )2 10.5102 − (−27.495)2
b SSE = = = 1.3469
SSxx 82.5
SSE 1.3469
s2 = = = 0.1684
n−2 8
c Hypotheses: H0 : β1 = 0 Ha : β 1 < 0
Test Statistics: Assume that errors are independent, normal (0, σ 2 ).
1 −0.3333
t= √ =√ √ = −7.38
s/ SSxx 0.1684/ 8.25
11.31 Hypotheses: H0 : β0 = 0 Ha : β0 6= 0
β̂0 −0.06
Test Statistics: t = s =s = −0.17
x2 1 (2)2
1
s2 + (0.0663) +
n SSxx 5 2.5
Rejection Region: |t| > t0.025 = 3.182
(degrees of freedom = 5 − 2 = 3)
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence to conclude that β0 is
different from zero at α = 0.05.
ȳ = 0.0616 x̄ = 874.42
11.33 a
SSxy = 54.15 SSxx = 159916 SSyy = 0.0339
SSxy 54.15
β̂1 = = = 0.000339
SSxx 159916
β̂0 = ȳ − β̂1 x̄ = 0.0616 − (0.000339)(874.42) = −0.2345
REGRESSION EQUATION: Wear rate = −0.2345 + 0.000339 ∗ (Temperature)
b Scatter plot with confidence and prediction bands:
c Hypotheses: H0 : β2 = 0 Ha : β2 6= 0
β̂2 0.009535
Test Statistic: t = = = 1.51
sβ̂2 0.006326
Rejection Region: |t| > t0.01 = 3.055 (degrees of freedom = 12)
Conclusion: Fail to reject H0 at α = 0.01, i.e., the quadratic term does not make a significant
contribution to the model.
Chapter 11: Inference For Regression Parameters 155
P P 2 P
d xi = 151 x = 2, 295 xi yi = 1, 890
P P 2i
yi = 222 yi = 3, 456 n = 15
2
(151)
SSxx = 2, 295 − = 774.9333
15
2
(222)
SSyy = 3, 456 − = 170.4
15
(151)(222)
SSxy = 1, 890 − = −344.8
15
SSxy −344.8
β̂1 = = = −0.4449
SSxx 774.9333
222 151
β̂0 = ȳ − β̂1 x̄ = − (−0.4444) = 19.2791
15 15
Reduced fitted model: ŷ = 19.2791 − 0.4449x
√
(To test the utility of the model, we compute s2 = 1.3065, which yields a t = β̂1 / s/ SSxx =
e The P-value of 0.0001 indicates that if this experiment were repeated, only about once in 10,000
trials would a t-value at least 4.68 units from zero be observed by chance when, in fact, β2 = 0.
Analysis of Variance
Source DF SS MS F P
Regression 1 0.00016332 0.00016332 138.95 0.000
Residual Error 12 0.00001411 0.00000118
Total 13 0.00017743
Y = β0 + β1 x1 + β2 x2 + β 3 x3 + β4 x4 + ǫ
where k = 4 in order to determine SSE2 . Then we would fit the reduced model Y = β0 + β1 x3 +
β2 x4 + ǫ where g = 2 in order to determine SSE1 . These are the relevant quantities for the
F-statistic.
b ν1 = k − g = 4 − 2 = 2 ν2 = n − (k + 1) = 25 − (4 + 1) = 20
11.51 a H0 : β 1 = · · · = β 5 = 0
Ha : At least one of β1 , . . . , β5 is nonzero
b H0 : β 3 = β 4 = β 5 = 0
Ha : At least one of β1 , β4 , or β5 is nonzero
0.729/5
c Test Statistic: F = = 18.29
(1 − 0.729)/(40 − 6)
Rejection Region: F > F0.05 (5, 34) ≈ 2.50
Conclusion: Reject H0 at α = 0.05; i.e., the complete model is useful for prediction since at least
one of the β’s is significantly different from zero at α = 0.05.
(3, 197.16 − 1, 830.44)/(5 − 2)
d Test Statistic: F = = 8.46
1, 830.44/(40 − (5 + 1))
Rejection Region: F > F0.05 (3, 34) ≈ 2.89
Conclusion: Reject H0 at α = 0.05; i.e., the second order model is significant at α = 0.05.
Chapter 11: Inference For Regression Parameters 157
11.53 Hypotheses: H0 : β1 = β3 = 0 Ha : β2 6= 0 or β3 6= 0
(795.23 − 783.90)/2
Test Statistic: F = = 1.41
783.90/(200 − (4 + 1))
Rejection Region: F > F0.05 (2, 195) ≈ 3.06
Conclusion: Fail to reject H0 at α = 0.05; i.e., there is insufficient evidence that the mean faculty
salary is dependent on sex at α = 0.05.
11.55 a Time series plots of CO, SO2 , Pb, NO2 and Ozone levels (with different y scales for comparison):
158 Chapter 11: Inference For Regression Parameters
CO, NO2 and SO2 levels have decreased linearly over the years, Pb levels have decreased drasti-
cally, and ozone levels have remained more or less constant.
b Scatterplot matrix of CO, Pb, NO2 , Ozone and SO2 levels:
Hypotheses: H0 : ρ = 0 and Ha : ρ 6= 0
Assuming that the populations of CO and SO2 measurements are normally distributed, we can
calculate our test statistic:
s
r
n−2 20 − 2
t=r = (0.972) = 17.55
1 − r2 1 − (0.972)2
With 20 − 2 = 18 degrees of freedom, we can calculate our p-value as:
p = 2P (t < 17.55) ≈ 0
Because p ≈ 0 is less than α = 0.05, reject the null hypothesis, as there is sufficient evidence to
conclude that the correlation coefficient is non-zero.
11.57 a Scatterplot of dampers vs height:
The scatterplot indicates a non-linear correlation between the dampers and the height of the
building. There is a possible quadratic relationship.
160 Chapter 11: Inference For Regression Parameters
Analysis of Variance
Source DF SS MS F P
Regression 2 82845 41423 102.92 0.000
Residual Error 6 2415 402
Total 8 85260
c An F of over 100 gives us a p-value ≈ 0. Since p ≈ 0 is less than α = 0.05, reject the null hypothesis
that all of the βi are 0. We can be very confident that the model contributes information for the
prediction of the height of the building.
11.59 a Scatterplot of limit load vs PDA:
The scatterplot shows a strong negative linear correlation between limit load and PDA.
Chapter 11: Inference For Regression Parameters 161
Analysis of Variance
Source DF SS MS F P
Regression 1 0.40838 0.40838 325.56 0.000
Residual Error 7 0.00878 0.00125
Total 8 0.41716
c An F of over 300 gives us a p-value ≈ 0. Because p ≈ 0 is less than α = 0.05, reject the null
hypothesis that β1 is 0. We can be very confident that the model contributes information for the
prediction of the limit load.
11.61 a Minitab output for number of year 1 cost of developing underground utilities vs C1-C8:
The regression equation is
Underground-Yr1 cost = 1279 + 301 C1 + 0.870 C2 - 121 C3 + 2.82 C4 - 274 C5
+ 1.24 C6 - 114 C7 + 928 C8
Analysis of Variance
Source DF SS MS F P
Regression 8 11734462627 1466807828 493.34 0.000
Residual Error 38 112983254 2973244
Total 46 11847445882
162 Chapter 11: Inference For Regression Parameters
b Minitab output for number of year 1 cost of developing overhead utilities vs C1-C8:
The regression equation is
Overhead-Yr1cost = - 873 + 205 C1 + 2.21 C2 - 167 C3 + 1.14 C4 + 1182 C5
+ 0.687 C6 + 307 C7 + 412 C8
Analysis of Variance
Source DF SS MS F P
Regression 8 8537420756 1067177595 293.43 0.000
Residual Error 38 138204360 3636957
Total 46 8675625116
c Minitab output for number of year 2 cost of developing underground utilities vs C1-C8:
The regression equation is
Underground-Yr2 cost = 1373 + 318 C1 + 0.879 C2 - 127 C3 + 2.99 C4 - 227 C5
+ 1.30 C6 - 102 C7 + 987 C8
Analysis of Variance
Source DF SS MS F P
Regression 8 13039013684 1629876710 453.32 0.000
Residual Error 38 136624794 3595389
Total 46 13175638478
d Minitab output for number of year 2 cost of developing overhead utilities vs C1-C8:
The regression equation is
Overhead-Yr2cost = 1147 + 186 C1 + 2.09 C2 - 189 C3 + 1.63 C4 + 2197 C5
- 0.09 C6 + 160 C7 + 374 C8
Analysis of Variance
Source DF SS MS F P
Regression 8 8158250097 1019781262 26.28 0.000
Residual Error 38 1474295399 38797247
Total 46 9632545496
164 Chapter 11: Inference For Regression Parameters
11.63 a Scatterplots of percent deviation vs black body temperature for calibration at 1400 and 1500
degrees:
Analysis of Variance
Source DF SS MS F P
Regression 1 1.7313 1.7313 148.27 0.000
Residual Error 9 0.1051 0.0117
Total 10 1.8364
Regression Analysis: 1500 deviation versus temperature
Analysis of Variance
Source DF SS MS F P
Regression 1 1.5128 1.5128 134.44 0.000
Residual Error 9 0.1013 0.0113
Total 10 1.6141
For both models, with p ≈ 0, we can reject the null hypotheses that β1 = 0 and conclude that the
models provide information for the prediction of the percent deviation.
c For the 1400 degree calibration, we can set the percent deviation in the regression equation equal
to 0 and solve for the temperature:
8.96 - 0.00627 temperature = 0
temperature = 8.96/0.00627 = 1429.027
Doing the same thing for the 1500 degree calibration we get:
temperature = 8.67/0.00586 = 1479.5
166 Chapter 11: Inference For Regression Parameters
11.69 From the SAS printout, a 95% confidence interval for the mean cost of computer jobs that require
42 seconds of CPU time and print 2,000 lines is ($7.32, $9.45).
d Hypotheses: H0 : β4 = β5 = 0
Ha : β4 6= 0 or β5 6= 0
(1216.0189 − 969.4831)/(5 − 3)
Test Statistic: F = = 6.87
969.4831/ (60 − (5 + 1))
Rejection Region: F > F0.05 (2, 54) ≈ 3.20
Conclusion: Reject H0 at α = 0.05; i.e., the mean increase in achievement test scores per unit
increase in IQ significantly differs for the three levels of SES at α = 0.05.
11.73 Hypotheses: H0 : β2 = 0 Ha : β 2 < 0
−0.53
Test Statistic: t = = −1.10
0.48
Rejection Region: t < −t0.01 = −2.326
(degrees of freedom = 42 − 3 = 39(∞))
Conclusion: Fail to reject H0 at α = 0.01; i.e., there is insufficient evidence to conclude that after
allowing for the effect of initial assembly time, plant A had a lower mean assembly time than
plant B.
11.75 We test the hypotheses from Exercise 10.26, part (b).
(259.34 − 226.12)/(4 − 2)
Test Statistic: F = = 3.31
226.12/(50 − (4 + 1))
Rejection Region: F > F0.05 (2, 45) ≈ 3.21
Conclusion: Reject H0 at α = 0.05; i.e., the mean delivery time significantly differs for mail and
truck deliveries at α = 0.05.
11.77 a Y = cost of material and labor
x1 = area
x2 = number of baths
(
1 central air
x3 =
0 no central air
First-order model: Y = β0 + β1 x1 + β2 x2 + β3 x3 + ǫ
168 Chapter 11: Inference For Regression Parameters
b Second-order model: Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x1 x2 + β5 x1 x3
+β6 x2 x3 + β7 x21 + β8 x22 + ǫ
c H0 : β 4 = · · · = β 8 = 0
Ha : At least one of β4 , . . . , β8 is nonzero
11.79 Minitab output follows:
Regression Analysis: Time versus Brand, Experience
Analysis of Variance
Source DF SS MS F P
Regression 2 1.27145 0.63573 150.62 0.000
Residual Error 7 0.02955 0.00422
Total 9 1.30100
Source DF Seq SS
Brand 1 0.52900
Experience 1 0.74245
a ŷ = 1.6794 + 0.4441x1 − 0.0793x2
b Hypotheses: H0 : β1 = β2 = 0 Ha : β1 6= 0 or β2 6= 0
Test Statistic: F = 150.62
P-value: 0.0001
Conclusion: Since the P-value is extremely small, we reject H0 and conclude that the model is
appropriate.
c Since R2 = 0.9773 is large, it tends to support the finding that the model is appropriate.
d (−0.0793 ± 1.895(0.005981)) = (−0.0907, −0.0680)
Since this interval does not include zero, we may conclude at a 90% confidence level that β2 is
not zero. Hence the service person’s number of months of experience in preventive maintenance
is useful in predicting time of preventive maintenance.
e ŷ = 1.6794 + 0.4441(0) − 0.0793(6) = 1.2036 (hours)
f Assuming that the service times are independent, the predicted mean time to service ten computers
is 12.036 hours.
g (1.6430, 1.9785)
Chapter 11: Inference For Regression Parameters 169
Analysis of Variance
Source DF SS MS F P
Regression 5 17.5827 3.5165 100.41 0.000
Residual Error 34 1.1908 0.0350
Total 39 18.7735
ŷ = −9.9168 + 0.1668x1 + 0.1376x2 − 0.001108x21 − 0.0008433x22 + 0.0002411x1 x2
a The value of R2 = 0.9365 indicates that about 93.65% of the variability in the GPA data is
accounted for by the model.
Hypotheses: H0 : β1 = · · · = β5 = 0
Ha : At least one of β1 , . . . , β5 is nonzero
Test Statistic: F = 100.41
P-value: 0.0001 < 0.05 = α
Conclusion: Since the P-value is extremely small, we reject H0 and conclude that the model is
useful in predicting mean freshman GPA values.
b Regression curves for x2 = 60, 75 and 90:
170 Chapter 11: Inference For Regression Parameters
c Hypotheses: H0 : β5 = 0 Ha : β5 6= 0
Test Statistic: t = 1.67
P-value: 0.1032 > 0.10 = α
Conclusion: Fail to reject H0 at α = 0.10; i.e., there is insufficient evidence to conclude that the
interaction term is important for the prediction of GPA.
11.83 Minitab output follows:
Regression Analysis: SO2 emission versus Output
Analysis of Variance
Source DF SS MS F P
Regression 1 14488 14488 124.37 0.000
Residual Error 7 815 116
Total 8 15304
Analysis of Variance
Source DF SS MS F P
Regression 2 15049.3 7524.6 177.55 0.000
Residual Error 6 254.3 42.4
Total 8 15303.6
Chapter 11: Inference For Regression Parameters 171
b ŷ = −93.1277 + 0.4446x
c ŷ = 204.4603 − 0.6380x + 0.0009593x2
d Hypotheses: H0 : β2 = 0 Ha : β2 6= 0
Test Statistic: t = 3.64
P-value: 0.0108
Conclusion: Since the P-value is small, we reject H0 and conclude that the quadratic model is
useful in describing the relationship between sulfur dioxide and output.
e ŷ = 204.4603 − 0.6380(500) + 0.0009593(500)2 = 125.2853
11.85 Hypotheses: H0 : β2 = 0 Ha : β 2 < 0
Test Statistic: t = − 6.60
Rejection Region: t < −t0.05 = −1.717 (degrees of freedom = 22)
Conclusion: Reject H0 at α = 0.04; i.e., there is sufficient evidence to conclude that the rate of
increase in output per unit increase of input decreases as the input increases.
11.87 a Hypotheses: H0 : β1 = β2 = 0 Ha : β1 6= 0 or β2 6= 0
Test Statistics: We fit the reduced model Y = β0 + β3 x2 + ǫ.
P P 2 P P 2 P
Summary Statistics: x2 = x2 = 10, y = 461, y = 13151, xy = 228. Then for
2
the reduced model SSE = SSyy − SSxy /SSxx = 2554.95 − (−2.5)2 /5 = 2523.70.
(2523.7 − 128.586)/2
F = = 149.01
128.586(20 − 4)
Rejection Region: F > F0.01 (2, 16) > F0.05 (2, 16) = 3.63
b Conclusion: Reject H0 at α = 0.05; i.e., there is a significant quadratic relationship between age
of machine and time for repairs.
172 Chapter 11: Inference For Regression Parameters
11.89 Hypotheses: H0 : β3 = β4 = β5 = 0
Ha : At least one of β3 , β4 , or β5 is nonzero
(370.7911 − 164.9185)/3
Test Statistic: F = = 9.99
164.9185/(30 − 6)
Rejection Region: F > F0.05 (3, 24) = 3.01
Conclusion: Reject H0 at α = 0.05; i.e., the inclusion of the variable for speed limit contributes
information for the prediction of number of highway deaths.
11.91 Minitab output follows:
Regression Analysis: reaction distance versus speed
Analysis of Variance
Source DF SS MS F P
Regression 1 2117.5 2117.5 * *
Residual Error 4 0.0 0.0
Total 5 2117.5
Analysis of Variance
Source DF SS MS F P
Regression 1 41383 41383 68.84 0.001
Residual Error 4 2404 601
Total 5 43787
Chapter 11: Inference For Regression Parameters 173
Analysis of Variance
Source DF SS MS F P
Regression 1 62222 62222 103.51 0.001
Residual Error 4 2404 601
Total 5 64627
a Reaction = 1.1 speed
Reaction = 1.1(55) = 60.5
b Braking = −102.4952 + 4.8629 speed
Braking = −102.4952 + 4.8629(55) = 164.9643
c Total = −102.4952 + 5.9629 speed
Total = −102.4952 + 5.9629(55) = 225.4643
11.93 Model for BaP: %CN = −25.5902 + 68.8217 mean Rf
Model for BaA: %CN = −20.6338 + 77.4706 mean Rf
Model for Phe: %CN = −16.0972 + 86.0228 mean Rf
174 Chapter 12: Analysis of Variance
Chapter 12
Analysis of Variance
175
176 Chapter 12: Analysis of Variance
yi2 = 103083,
P P
12.7 Summary Statistics: yi = 1155,
T1 = 244, T2 = 269, T5 = 381, T7 = 261
ANOVA Table
Source df SS MS F
Treatments 3 345.6089 115.2030 8.6342
Error 9 120.0833 13.3426
Total 12 465.6922
b Hypotheses: H0 : µ1 = µ2 Ha : µ1 6= µ2
Test Statistic: F = 19.62
Rejection Region: F > F0.05 (1, 98) ≈ 3.95
Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference between the number of
hours missed for the two companies.
12.13 Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least two means differ
(8(80) + 8(81) + 8(86) + 8(90))
Test Statistics: ȳ = = 84.25
32
700
s2p = = 25
(32 − 4)
8(80 − 84.25)2 + · · · + 8(90 − 84.25)2
F = = 6.9067
(25)(3)
Rejection Region: F > F0.05 (3, 28) = 2.95
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences among the mean percent-
ages of copper for the four castings.
yi = 772, yi2 = 30, 550,
P P
12.15 Summary Statistics:
TA = 174, TB = 208, TC = 231, TD = 159,
(722)2
TSS = 30, 550 − = 750.8,
20
(174)2 (159)2 (722)2
SST = + ··· + − = 637.2
5 5 20
ANOVA Table
Source df SS MS F
Treatments 3 637.2 212.4 29.9
Error 16 113.6 7.1
Total 19 750.8
s
33.6 44.1 1 1
12.23 a − ± 2.179 0.8620 + = (−3.3795, −0.8205)
5 5 5 5
b t0.1/6 ≈ 2.401 (degrees of freedom = 12)
q
i j yi − yj ± 2.401 0.8620( n11 + n12 )
1 2 (5.7231, 7.7169)
1 3 (7.1231, 9.1169)
2 3 (7.8231, 9.8169)
P P 2
12.27 Summary Statistics: yi = 1520.3, yi = 110587, n = 21,
T1 = 497.7, T2 = 531.3, T3 = 491.3,
B1 = 211.1, B2 = 202.7, B3 = 233.1,
B4 = 218.1, B5 = 220.5, B6 = 205.3, B7 = 229.5,
(1520.3)2
TSS = 110587.13 − = 524.6494,
21
1 (1520.3)2
SST = (497.7)2 + · · · + (491.3)2 − = 131.9010,
6 21
1 (1520.3)2
SSB = (211.1)2 + · · · + (229.5)2 − = 268.2894
3 21
ANOVA Table
Source df SS MS F
Treatment 2 131.9010 65.9505 6.3588
Block 6 268.2894 44.7149 4.3113
Error 12 124.4591 10.3716
Total 20 524.6494
Analysis of Variance
Source DF SS MS F P
Regression 5 2105.75 421.15 16.94 0.002
Residual Error 6 149.17 24.86
Total 11 2254.92
Analysis of Variance
Source DF SS MS F P
Regression 3 143.6 47.9 0.18 0.906
Residual Error 8 2111.3 263.9
Total 11 2254.9
(2, 111.3333 − 149.1667)/2
Test Statistic: F = = 39.4626
149.1667/6
Rejection Region: F > F0.05 (2, 6) = 5.14
Conclusion: Reject H0 at α = 0.05; i.e., there are significant differences among the treatment
means at α = 0.05.
184 Chapter 12: Analysis of Variance
s
497.7 531.3 1 1
12.39 a − ± 2.179 10.3716 + = (−8.551, −1.049)
7 7 7 7
Since the entire interval includes negative values, we may conclude at a 95% confidence level that
the mean pressure of iron exceeds that of nickel.
b t0.1/6 ≈ 2.401(degrees of freedom = 12)
s
497.7 531.3 1 1
− ± 2.401 10.3716 + = (−4.8 ± 4.1331)
7 7 7 7
= (−8.9331, −0.6669)
497.7 491.3
− ± 4.1331 = (−3.2188, 5.0474)
7 7
531.3 491.3
− ± 4.1331 = (1.5812, 9.8474)
7 7
12.41 t0.1/6 ≈ 2.749 (degrees of freedom = 6)
s
110 109 1 1
Treatments 1 and 2: − ± 24.8611 + = 0.25 ± 9.6921 = (−9.4421, 9.9421). Since
4 4 4 4
this interval includes zero, the difference between treatment 1 and treatment 2 is not significant
at the 90% confidence level.
110 218
Treatments 1 and 3: − ± 9.6921 = (−36.6921, −17.3079). Since this interval does not in-
4 4
clude zero, the difference between treatment 1 and treatment 3 is significant at the 90% confidence
level.
109 218
Treatments 2 and 3: − ± 9.6921 = (−36.9421, −17.5579). Since this interval does not in-
4 4
clude zero, the difference between treatment 2 and treatment 3 is significant at the 90% confidence
level.
s
73.3 81.5 1 1
12.43 a − ± 3.355 0.0528 + = (−2.128, −1.152)
5 5 5 5
b Assume normal distributions with equal variances.
a ANOVA Table
Source df SS MS F
Treatments 3 48.5
A 1 0.0 0.0 0.0
B 1 8.0 8.0 6.4
A×B 1 40.5 40.5 32.4
Error 4 5.0 1.25
Total 7 53.5
Comparing to F0.05 (1, 4) = 7.71, the interaction between temperature and time is significant at
α = 0.05. s
1 1
b t0.1/12 ≈ 3.966 (degrees of freedom = 4) and 3.966 1.25 + = 4.4341
2 2
Temperature
Low High
Low y1 = 24.5 y3 = 29
Time
High y2 = 27 y4 = 22.5
f ANOVA Table
Source df SS MS F
Treatments 3 174.0150
A 1 41.4050 41.4050 1.4676
B 1 114.1060 114.1060 4.0444
A×B 1 18.6050 18.6050 0.6594
Error 28 789.9655 28.2131
Total 31 963.9805
g Comparing to F0.05 (1, 28) = 4.20, the interaction term is not significant. Thus, the length of
time to complete a task decreases by about the same amount for men and women as their weights
increase. s
1 1
h 18.30 − 13.00 ± 2.048 28.2131 + = 5.3 ± 5.4391 = (−0.139, 10.739). Since the interval
8 8
includes zero, there is not a significant difference in time to complete the task between light men
and women at α = 0.05.
i 14.50 − 12.25 ± 5.4391 = (−3.189, 7.689). Since the interval includes zero, there is not a significant
difference in time to complete the task between heavy men and women at α = 0.05.
12.49 Summary Statistics: Let Ai = crucible i total and Bj = temperature j total.
yi = 114.2, yi2 = 1096.18, n = 12,
P P
12.51 Summary Statistics: Let Ai = carbon level i total and Bj = manganese level j total.
yi = 309.3, yi2 = 12024.87, n = 8,
P P
Comparing to F0.05 (1, 4) = 7.71, we find that the interaction is not significant at α = 0.05.
Thus, we test ”main effects” and find that both factors, carbon and manganese, are significant at
α = 0.05.
b From part (a), we see that average breaking strength significantly increases as both percentage
carbon and percentage manganese increase. Therefore, we choose the treatment combination of
0.5% carbon and 1.0% manganese.
Comparing the F-value of 0.1925 to F0.05 (2, 6) = 5.14, we find no significant differences in the
mean mileage ratings among the three brands of gasoline at α = 0.05.
c Comparing the F-value of 0.1739 to F0.05 (3, 6) = 4.76, we find no significant difference in the
mean mileage for the four models.
s
78 75.8 1 1
d − ± 3.707 1.6403 + = (−2.807, 3.907)
4 4 4 4
s
1 1
e t0.1/6 ≈ 2.749 (degrees of freedom = 6) and 2.749 1.6403 + = 2.4896
4 4
Brands Simultaneous Confidence Intervals
76.5 78
A and B − ± 2.4896 = (−2.865, 2.115)
4 4
76.5 75.8
A and C − ± 2.4896 = (−2.315, 2.665)
4 4
78 75.8
B and C − ± 2.4896 = (−1.940, 3.040)
4 4
Chapter 12: Analysis of Variance 189
b Comparing the F-value 18.77 to F0.05 (6, 12) = 3.00 indicates that there is a significant interaction
between the factors at α = 0.05. s
1 1 q
c t0.1/18 ≈ 2.998 (degrees of freedom = 12), 2.998 1.5 + = 2.1199 and 2.998 1.5( 81 + 81 ) =
6 6
Ai Bj
1.8359. Let Ai = and Bj = .
6 8
Simultaneous Confidence Intervals
A1 − A2 ± 2.1199 = (−3.959, 0.292)
A1 − A3 ± 2.1199 = (−2.959, 1.292)
A1 − A4 ± 2.1199 = (−6.792, 2.541)
A2 − A3 ± 2.1199 = (−1.126, 3.126)
A2 − A4 ± 2.1199 = (−4.959, −0.708)
A3 − A4 ± 2.1199 = (−5.959, −1.708)
B1 − B2 ± 1.8359 = (−0.840, 2.840)
B1 − B3 ± 1.8359 = (−1.215, 2.465)
B2 − B3 ± 1.8359 = (−2.215, 1.465)
The intervals that contain zero indicate a nonsignificant difference. Hence, levels 1, 2, and 3 of
factor A are not significantly different, whereas level 4 is significantly different from the others
at α = 0.05. Also, the levels of factor B are not significantly different. To select the treatment
combination with the largest mean, we take level 4 of factor A and any level of factor B.
190 Chapter 12: Analysis of Variance
s
1 1
c t0.05/4 ≈ 2.597 (degrees of freedom = 12) and 2560 49 + = 12.6714
4 4
520 672
A1 − A2 ± 12.6714 = − ± 12.6714 = (−50.671, −25.329)
4 4
558 634
B1 − B2 ± 12.6714 = − ± 12.6714 = (−31.671, −6.329)
4 4
The hourly and piece rate is significantly higher than the hourly rate and the worker-modified
schedule is significantly higher than the 8−5 schedule. Thus, we recommend the hourly and piece
rate and the worker-modified schedule.
Comparing the F-value 12.61 with F0.05 (2, 12) = 3.89, we see that there are significant differences
among the three batches.
192 Chapter 12: Analysis of Variance
b To compare the three batches, we use 90% simultaneous confidence intervals with t0.01/6 ≈ 2.403
s
1 1
(degrees of freedom = 12) and 2.403 10.1 + = 4.8300.
5 5
Batches Simultaneous Confidence Intervals
138 179
1 and 2 − ± 4.8300 = (−13.03, −3.37)
5 5
138 133
1 and 3 − ± 4.8300 = (−3.83, 5.83)
5 5
179 133
2 and 3 − ± 4.8300 = (4.37, 14.03)
5 5
Since batch 2 is significantly different from batches 1 and 3, we select batch 2 to give the largest
mean brightness.
12.67 ANOVA Table
Source df SS MS F p
Treatments 7 2341400.0
Diameter 1 530450.0 530450.0 19.57 0.0013
Thickness 2 1352744.4 676372.2 24.95 0.0001
Temp. 2 201036.1 100518.1 3.71 0.0624
RH 2 257169.4 128584.7 4.74 0.0356
Error 10 271061.1
Total 17 2612461.1
At the 0.05 significance level, the factor Temperature is the only one not significant.
12.69 For As /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Pk 2
Pk 2
i=1 ni (ȳi − ȳ) /(k − 1) 2 i=1 (ni − 1)Si
Test Statistic: F = where S p =
Sp2 n−k
In this case,
(7 − 1)(0.15)2 + (11 − 1)(0.12)2 + (31 − 1)(0.067)2 + (5 − 1)(0.37)2
Sp2 =
54 − 4
= 0.0192
and
7(0.46 − 0.5456)2 + 11(0.48 − 0.5456)2 + 31(0.56 − 0.5456)2 + 5(0.72 − 0.5456)2
F = = 4.4641.
(4 − 1)0.0192
Rejection Region: F > F0.05 (3, 50) = 2.79
Conclusion: Reject H0 at α = 0.05; i.e., the mean mass ratio of arsenic to copper is significantly
higher at the plume than at the other sites.
Chapter 12: Analysis of Variance 193
For Cd /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Test Statistic:
(10 − 2)(0.017)2 + (11 − 1)(0.024)2 + (31 − 1)(0.011)2 + (5 − 1)(0.022)2
Sp2 =
57 − 4
= 0.0003
10(0.068 − 0.0757)2 + 11(0.087 − 0.0757)2
F =
(4 − 1)0.0003
31(0.074 − 0.0757)2 + 5(0.077 − 0.0757)2
+ = 2.3317
(4 − 1)0.0003
Rejection Region: F > F0.05 (3, 53) = 2.7791
Conclusion: Do not reject H0 at α = 0.05; i.e., there is no significant difference in the mean mass
ratio of cadmium to copper at any of the four sites.
For Pb /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Test Statistic:
(13 − 1)(0.16)2 + (11 − 1)(0.17)2 + (49 − 1)(0.07)2 + (4 − 1)(0.23)2
Sp2 =
77 − 4
= 0.0136
13(1.03 − 0.8786)2 + 11(0.94 − 0.8786)2 + 49(0.82 − 0.8786)2 + 4(0.90 − 0.8786)2
F =
(4 − 1)0.0136
= 12.4890
Rejection Region: F > F0.05 (3, 73) = 2.7300
Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference in the mean mass ratio of
lead to copper at the Tucson Research ranch site from 8/84−10/84 and the Bisbee site.
For Sb /Cu :
Hypotheses: H0 : µ1 = µ2 = µ3 = µ4 Ha : At least one µi is different
Test Statistic:
(3 − 1)(0.019)2 + (7 − 1)(0.018)2 + (11 − 1)(0.016)2 + (5 − 1)(0.034)2
Sp2 =
26 − 4
= 0.0004
3(0.073 − 0.0821)2 + 7(0.078 − 0.0821)2 + 11(0.10 − 0.0821)2 + 5(0.054 − 0.0821)2
F =
(4 − 1)(0.0099)
= 6.5315
Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference in the mean mass ratio of
antimony to copper at the plume site and the Tucson Research Ranch site (8/84−9/85).
194 Chapter 12: Analysis of Variance
For Zn /Cu :
Test Statistic:
Conclusion: Reject H0 at α = 0.05; i.e., there is a significant difference in the mean mass ratio of
zinc to copper for the Tucson Research ranch site (8/84−9/85) compared with the other sites.
Analysis of Variance
Source DF SS MS F P
Regression 5 0.34549 0.06910 3.85 0.029
Residual Error 11 0.19747 0.01795
Total 16 0.54296
12.75 Standardize all the means, and test for outliers. (Exercise for student.)