Statistics 4101 Lab Report 2 Trevor Purchase Problem 1 Is the mean amount of TCDD that collects in the frog ovary significantly greater than the mean amount that collects in the frog liver? H0: movary = mliver Ha: movary > mliver Level of significance: a = 5% Reject the null hypothesis if the probability value is less than 5%. Computed results (Minitab): Paired T-Test and CI: liver, ovaries Paired T for liver - ovaries N Mean StDev SE Mean liver 4 13.03 1.72 0.86 ovaries 4 33.53 6.17 3.08 Difference 4 -20.50 5.54 2.77 95% CI for mean difference: (-29.31, -11.69) T-Test of mean difference = 0 (vs not = 0): T-Value = -7.40 P-Value = 0.005 P-value is 0.5%. Reject the null hypothesis. At the 5% level of significance, there is sufficient information to claim that the average amount of TCDD the collects in the frog ovary is greater than the amount that collects in the frog liver. Problem 2 Is the percentage of fat in wieners significantly different when determined by two different methods? H0: mMethod A = mMethod B Ha: mMethod A ¹ mMethod B Level of significance: a = 5% Reject the null hypothesis if the probability value is less than 5%. Computed results (Minitab): Two-Sample T-Test and CI: "METHOD A:, METHOD B: Two-sample T for "METHOD A: vs METHOD B: N Mean StDev SE Mean "METHOD 5 7.220 0.172 0.077 METHOD B 2 9.467 0.234 0.17 Difference = mu "METHOD A: - mu METHOD B: Estimate for difference: -2.246 95% CI for difference: (-4.566, 0.073) T-Test of difference = 0 (vs not =): T-Value = -12.30 P-Value = 0.052 DF = 1 P-value is 5.2%. Fail to reject the null hypothesis. At the 5% level of significance, there is insufficient information to claim that the two test methods report different percentages of fat in wieners. Problem 3 Is the average percentage of fat in the samples of pork significantly different when determined by a commercially available instrument than when determined by the AOAC method? H0: mInst = mAOAC Ha: mInst ¹ mAOAC Level of significance: a = 5% Reject the null hypothesis if the probability value is less than 5%. Computed results (Minitab): Paired T-Test and CI: Instrument, AOAC Paired T for Instrument - AOAC N Mean StDev SE Mean Instrument 19 30.29 20.30 4.66 AOAC 19 30.39 20.45 4.69 Difference 19 -0.0995 0.3505 0.0804 95% CI for mean difference: (-0.2684, 0.0695) T-Test of mean difference = 0 (vs not = 0): T-Value = -1.24 P-Value = 0.232 P-value is 23.2%. Fail to reject the null hypothesis. At the 5% level of significance, there is insufficient information to claim that the two test methods report different percentages of fat in samples of pork, on average. Problem 4 95% confidence intervals for the mean area of the rectangles, Food Safety data: Guess: (6.69, 14.81) 8.12 Representative sample: (6.189, 9.192) 3.003 Random Sample: (5.310, 7.472) 2.162 Mean area of the population of 100 rectangles: 7.420 All three confidence intervals contain the true mean of the population. The interval that was created by guess was the widest, spanning 8.12, while the interval derived from the random sampling was the narrowest, at 2.162. While the lower bound of all three intervals were reasonably similar, covering a range of only 1.38, the upper bounds varied much more widely. They had a range of 7.338. From this example, the Student’s t–distribution seems to be highly robust, as the assumptions that are made allow us to obtain meaningful results. Problem 5 In a given test, the null hypothesis may or may not be false. The purpose of a hypothesis test is to determine this. The power of a hypothesis test is the probability of rejecting a false null hypothesis. The power of a test can be estimated and its level is a significant piece of information for the experimenter. A test with a high power is likely to reject a false null hypothesis, and fail to reject a true null hypothesis. A test with low power is unlikely to reject the null hypothesis even if the null hypothesis is false. Such a test should be redesigned so that time and resources are not wasted on a test that will not give results of any value. Problem 6 Write a short essay describing the relationship between confidence intervals, t - statistics, critical values and p - values for 2-sample t-tests. A confidence interval is a range of values that has a specified probability of containing the parameter being estimated. The 95% and 99% confidence intervals which have .95 and .99 probabilities of containing the parameter respectively are most commonly used. If the parameter being estimated were m, the 95% confidence interval might look like the following: 12.5 m 30.2 What this means is that the interval between 12.5 and 30.2 has a .95 probability of containing m. A confidence interval only has the specified probability of containing the parameter if the sample data on which it is based is the only information available about the value of the parameter. As an extreme example, consider the case in which 1000 studies estimating the value of m in a certain population all resulted in estimates between 25 and 30. If one more study were conducted and if the 95% confidence interval on m were computed (based on that one study) to be: 35 m 45 A critical value is used in significance testing. It is the value that a test statistic muct exceed in order for the null hypothesis to be rejected. For example, the critical value of t (with 12 degrees of freedom using the .05 significance level) is 2.18. This means that for the probability value to be less than or equal to .05, the absolute value of the t statistic must be 2.18 or greater. In hypothesis testing, the probability value (sometimes called the p value) is the probability of obtaining a statistic as different from or more different from the parameter specified in the null hypothesis as the statistic obtained in the experiment. The probability value is computed assuming the null hypothesis is true. If the probability value is below the significance level then the null hypothesis is rejected. Problem 7 Use the questions that you find by using the "Exercises" link from the Rice site as a basis for a short commentary on confidence intervals. You may answer them question by question, but it is better to put your answer in essay form. Do not be too concerned if your commentary does not explicitly answer each question. 1. What percent of the 95% confidence intervals would you expect to contain 50? 2. Which is wider, 95% or 99% confidence intervals? Why? 3. How does sample size affect the number of intervals that contain 50? Explain. 4. How does sample size affect the width of the intervals? 5. The widths of the intervals vary somewhat even for a given sample size. Why? 6. Is there a tendency for intervals not containing 50 to differ in length from those that do? If so, why? 7. If you already knew that the population mean were 50, what value would there be in computing a confidence interval? 8. There is an important determinant of the width of confidence intervals that you cannot modify in this simulation. What is it?