See the notes from Monday and the Wednesday before
• There is a new Datacamp assignment from a new course Working in Rstudio. We probably won’t do the whole course (although you can if you want to!) but this will help with the future assignment that you will have to work out in Rstudio and “hand in” via Piazza. (Coming soon!)
• Problems to work on expected value and variance (and standard deviation) of RVs: remember, the standard deviation is just the square root of the variance!
Find the expected value, variance, and standard deviation for the RVs of each of these previous homework problems (for which you have already found the probability distribution function):
p. 128 #3.3.1, 3.3.2, 3.3.3, 3.3.5, 3.3.7
Also do the following:
p.159 #3.6.5 (The theorem they mention is the “computational formula” we used in class.)
For X with exponential probability density $f(x) = 3e^{-3x}$, compute the mean, the variance, and the standard deviation. You will have to use integration by parts! but it’s not too hard. A little nice review is here: there is an example which integrates $x\cdot e^{x}$ which is what you will be doing more or less. Also check out Question 6 at the bottom: once you’ve chosen your answer, it will show you a step-by-step solution.
Please read the following. There is a problem for you to do at the end.
• Go back to problems 3.3.1 and 3.3.2 on p. 128.
In problem 3.3.1, we have five balls numbered 1 through 5, and we select two of them successively (without replacement). We define the RV X = the larger of the two numbers, so its possible values are 2, 3, 4, 5. We are told to find the pdf for X.
If you look at the answers to 3.3.1 in the book (and we did it in class), they all have denominator 10, which suggests that the sample space had 10 outcomes in it, in other words that the sample space was
$S = \{(1,2), (1,3), (1,4), (1,5), (2,3), (2,4), (2,5), (3,4), (3,5), (4,5)\}$
As far as the value of X is concerned, the order in which we choose the numbered balls does not matter, so this seems fine. Or is it?
This sample space, assuming the outcomes are equally likely, does give the correct pdf for X. But there is a little cheating going on, which catches up to us if we try to extend this sample space to use it in problem 3.3.2.
In 3.3.2, we have the same problem except that now we select with replacement. That means that the sample space must include outcomes where both numbers are the same: it must include (1,1), (2,2), and so on. If we just throw these into our previous sample space, we come up with the sample space
? $S_{1} = \{(1,1), (1,2), (1,3), (1,4), (1,5), (2,2), (2,3), (2,4), (2,5), (3,3), (3,4), (3,5), (4,4), (4,5), (5,5)\}$
I’ve put a question mark in front because I suspect that this sample space does not have equally likely outcomes. (I will explain next time, but you may already see why.) This sample space has 15 outcomes in it, so if they are indeed equally likely outcomes, the probability distribution function would have denominators 15.
Suppose that we had never heard of problem 3.3.1. Working from scratch, if we are selecting two things with replacement from a set of five objects, there should be $5^{2} = 25$ possible outcomes, and the sample space would be similar to what we used for the “rolling two dice” example:
$S_{2} = \{(1,1), (1,2), (1,3), (1,4), (1,5),
(2,1), (2,2), (2,3), (2,4), (2,5),
(3,1), (3,2), (3,3), (3,4), (3,5),
(4,1), (4,2), (4,3), (4,4), (4,5),
(5,1), (5,2), (5,3), (5,4), (5,5)\}$
This sample space has 25 outcomes, so if they are indeed equally likely outcomes, the probability distribution function would have denominators 25.
Both of these things cannot be true. Only one can be true (at the most). You might think that the pdfs would come out the same after reducing to lowest terms, but when you go and do it you will see that they are not the same.
How can we decide? There are two ways to think about that question: is there a mathematical way to show that one of them has equally likely outcomes and the other does not? Or we could ask, if we do this experiment in the real world, which one gives the actual probabilities?
I’ll give a mathematical answer, but the second question is the most interesting, because if there is no way to test probability theory in the real world, that is a very sad state of affairs!
We will test these two models against each other by using the frequentist approach: we will repeat the experiment a very large number of times, and see what proportion (relative frequency) of the time each possible value of X shows up. According to the frequentist interpretation of probability, if we repeat the experiment a very large number of times, those relative frequencies should be close to the actual probabilities.
In fact, we won’t do the experiment in real life (by drawing actual physical numbered balls), but we will use R to simulate it. That will mean that we can easily repeat the “experiment” (the simulation) 1000 times or more if we want! That’s a pretty large number.
Your job: (for now)
Find the pdf for X using $S_{1}$ and assuming the outcomes are equally likely. Then do the same for $S_{2}$ assuming its outcomes are equally likely. Verify that the two pdfs are not the same.