Monthly Archives: March 2018

Homework for Wednesday 28 March

See the notes from Monday and the Wednesday before

• There is a new Datacamp assignment from a new course Working in Rstudio. We probably won’t do the whole course (although you can if you want to!) but this will help with the future assignment that you will have to work out in Rstudio and “hand in” via Piazza. (Coming soon!)

• Problems to work on expected value and variance (and standard deviation) of RVs: remember, the standard deviation is just the square root of the variance!

Find the expected value, variance, and standard deviation for the RVs of each of these previous homework problems (for which you have already found the probability distribution function):

p. 128 #3.3.1, 3.3.2,  3.3.3, 3.3.5, 3.3.7

Also do the following:

p.159 #3.6.5 (The theorem they mention is the “computational formula” we used in class.)

For X with exponential probability density $f(x) = 3e^{-3x}$, compute the mean, the variance, and the standard deviation. You will have to use integration by parts! but it’s not too hard. A little nice review is here: there is an example which integrates $x\cdot e^{x}$ which is what you will be doing more or less. Also check out Question 6 at the bottom: once you’ve chosen your answer, it will show you a step-by-step solution.


Please read the following. There is a problem for you to do at the end.

• Go back to problems 3.3.1 and 3.3.2 on p. 128.

In problem 3.3.1, we have five balls numbered 1 through 5, and we select two of them successively (without replacement). We define the RV X = the larger of the two numbers, so its possible values are 2, 3, 4, 5. We are told to find the pdf for X.

If you look at the answers to 3.3.1 in the book (and we did it in class), they all have denominator 10, which suggests that the sample space had 10 outcomes in it, in other words that the sample space was

$S = \{(1,2), (1,3), (1,4), (1,5), (2,3), (2,4), (2,5), (3,4), (3,5), (4,5)\}$

As far as the value of X is concerned, the order in which we choose the numbered balls does not matter, so this seems fine. Or is it?

This sample space, assuming the outcomes are equally likely, does give the correct pdf for X. But there is a little cheating going on, which catches up to us if we try to extend this sample space to use it in problem 3.3.2.

In 3.3.2, we have the same problem except that now we select with replacement. That means that the sample space must include outcomes where both numbers are the same: it must include (1,1), (2,2), and so on. If we just throw these into our previous sample space, we come up with the sample space

? $S_{1} =  \{(1,1), (1,2), (1,3), (1,4), (1,5), (2,2), (2,3), (2,4), (2,5), (3,3), (3,4), (3,5), (4,4), (4,5), (5,5)\}$

I’ve put a question mark in front because I suspect that this sample space does not have equally likely outcomes. (I will explain next time, but you may already see why.) This sample space has 15 outcomes in it, so if they are indeed equally likely outcomes, the probability distribution function would have denominators 15.

Suppose that we had never heard of problem 3.3.1. Working from scratch, if we are selecting two things with replacement from a set of five objects, there should be $5^{2} = 25$ possible outcomes, and the sample space would be similar to what we used for the “rolling two dice” example:

$S_{2} =  \{(1,1), (1,2), (1,3), (1,4), (1,5),

(2,1), (2,2), (2,3), (2,4), (2,5),

(3,1), (3,2), (3,3), (3,4), (3,5),

(4,1), (4,2), (4,3), (4,4), (4,5),

(5,1), (5,2), (5,3), (5,4), (5,5)\}$

This sample space has 25 outcomes, so if they are indeed equally likely outcomes, the probability distribution function would have denominators 25.

Both of these things cannot be true. Only one can be true (at the most). You might think that the pdfs would come out the same after reducing to lowest terms, but when you go and do it you will see that they are not the same.

How can we decide? There are two ways to think about that question: is there a mathematical way to show that one of them has equally likely outcomes and the other does not? Or we could ask, if we do this experiment in the real world, which one gives the actual probabilities?

I’ll give a mathematical answer, but the second question is the most interesting, because if there is no way to test probability theory in the real world, that is a very sad state of affairs!

We will test these two models against each other by using the frequentist approach: we will repeat the experiment a very large number of times, and see what proportion (relative frequency) of the time each possible value of X shows up. According to the frequentist interpretation of probability, if we repeat the experiment a very large number of times, those relative frequencies should be close to the actual probabilities.

In fact, we won’t do the experiment in real life (by drawing actual physical numbered balls), but we will use R to simulate it. That will mean that we can easily repeat the “experiment” (the simulation) 1000 times or more if we want! That’s a pretty large number.

Your job: (for now)

Find the pdf for X using $S_{1}$ and assuming the outcomes are equally likely. Then do the same for $S_{2}$ assuming its outcomes are equally likely. Verify that the two pdfs are not the same.

 

Notes for Monday 26 March (includes Monday 19 March)

(after Test 2)

Last Monday we discussed:

• Computing the expected value for a continuous RV

• Define the variance and standard deviation for discrete and for continuous RVs

• Computing the variance for discrete and continuous RVs, using the definition and also the “computational formula”.

All definitions and many of the computations are in the slideshows:

Math2501ExpectedValueForRVs-slideshow

Math2501VarianceForRVs-slideshow

Make sure that you know the definitions and the notations! There is a pretty good summary of the notation, facts, and definitions about random variables in the box on the first page of these notes

(which are on the distributions and densities we will study next, so you may want to look through them.)

 

Another example worked in class: We also computed the expected value and variance for a continuous RV with pdf $f(x) = 3x^{2}$ for $0 < x < 1$. Here are the computations: (I omit a few details of the computation which you can probably fill in without much trouble)

Computing the expected value (the mean) of X:

$E(X)$ or $\mu_{X}$ = $\displaystyle \int_{0}^{1}x\cdot3x^{2}\textrm{d}x = 3\int_{0}^{1}x^{3}\textrm{d}x$

$= 3\left[\frac{x^{4}}{4}\right]_{0}^{1}$

$= 3\cdot \frac{1}{4} = \frac{3}{4}$

Computing the variance of X using the definition of the variance:

$Var(X)$ or $\sigma_{X}^{2}$ = $\displaystyle \int_{0}^{1}\left(x -\frac{3}{4}\right)^{2}\cdot 3x^{2}\textrm{d}x$

$= 3\displaystyle \int_{0}^{1}\left(x^{2} – \frac{3}{2}x + \frac{9}{16}\right)\cdot x^{2}\textrm{d}x$

$= 3\displaystyle \int_{0}^{1}\left(x^{4} – \frac{3}{2}x^{3} + \frac{9}{16}x^{2}\right)\textrm{d}x$

$= 3\displaystyle \left[\frac{x^{5}}{5} – \frac{3}{2}\cdot\frac{x^{4}}{4} + \frac{9}{16}\cdot\frac{x^{3}}{3}\right]_{0}^{1}$

$= 3\displaystyle \left[\frac{1}{5} – \frac{3}{8} + \frac{3}{16}\right] = \frac{3}{80}$

So the variance is $\frac{3}{80}$

NOTE: And the standard deviation is $\sqrt{\frac{3}{80}}$

 

We can simplify this computation of the variance somewhat by using the computational formula (a result of a theorem)

$\sigma^{2}_{X} = E(X^{2}) – \mu_{X}^{2}$

Applying it to our RV, we already know $\mu_{X} = \frac{3}{4}$. We need $E(X^{2})$:

$E(X^{2}) = \displaystyle \int_{0}^{1}x^{2}\cdot 3x^{2}\textrm{d}x$

$= 3 \displaystyle \int_{0}^{1}x^{4}\textrm{d}x$

$= 3 \displaystyle \left[\frac{x^{5}}{5}\right]_{0}^{1}$

$ = \frac{3}{5}$

 

Now to compute the variance:

$\sigma^{2}_{X} = E(X^{2}) – \mu_{X}^{2} = \frac{3}{5} – \left(\frac{3}{4}\right)^{2} = \frac{3}{5} – \frac{9}{16} = \frac{3}{80}$

We get the same answer as using the definition (as we should, since this is a mathematical theorem.

 

Please note that the variance cannot ever be a negative number. That is because, by definition, we are taking the mean of the squared deviations, and squared real numbers cannot be negative. If your variance ever comes out negative, you have made an error somewhere!

Test 2 review UPDATED and with added material (corrected answer sheet)

Test 2 is rescheduled for the first hour or so of class on Monday 26 March.

Test 2 review problems

Corrected answers:

MAT2572Test2ReviewAnswersSpring2018

Below the fold are worked-out solutions to the first few problems, similar to what I did in class. They use the basic counting techniques which are described in these notes: MAT2572CountingMethods-slideshow

Continue reading

Winter Storm Warning (again)

There is a winter storm warning for the region starting tonight through Thursday.

 

I recommend checking the college website to find out the status of classes. I, myself, cannot cancel class.

Also, sign up for CUNY alerts, but they seem to  run slow. Checking the college website or your City Tech email seems to be the best.

Notes for Wednesday 14 March

Math2501ExpectedValueForRVs-slideshow

 

MAT 2572                                                      Basic Histograms for PDFs in R

 

 

 

The basics: for a RV X with possible values a, b, …, n

 

Assign the values of the probabilities as a vector

(I’ll call it my_distribution, which is really not a very good name)

Then those probabilities will be the heights of the bars

The command barplot gives the bar graph

 

> my_distribution <- c(p(a), p(b), …, p(n))

> barplot(my_distribution)

 

But this gives a bar graph with space between the bars. To make it a histogram, we make the space between the bars be 0:

 

> barplot(my_distribution, space=0)

 

barplot also has a color attribute col, which can be set like this:

 

> barplot(my_distribution, space=0, col=“blue”)

 

Experiment with the col attribute to see which colors R will recognize!

(It’s not always a good idea to put colors in graphs, because of readability and psychological issues though.)

 

To put labels on the graph (for the values of X), If the values run through a consecutive sequence of integers a through n

> names(my_distribution) <- as.character(a:n)

 

 

Saving your work as a script:

menu at upper left corner

 

 

Exporting the graph to use it in a document, for example:

Plots > export…

https://support.rstudio.com/hc/en-us/articles/200484448-Editing-and-Executing-Code

Homework for Wednesday 14 March

 

Notes from Monday’s class are in this post.

• Don’t forget to keep working on your Datacamp assignments!

• Do the following from the textbook:

p. 128 #3.3.1, 3.3.2,  3.3.3, 3.3.5, 3.3.7

Note: $p_{X}(k)$ is another notation for $P(X = k)$, in other words the probability distribution function.

p. 138 # 3.4.1, 3.4.3, 3.4.5, 3.4.7, 3.4.11, 3.4.13

Note: problems 11 and 13 give you the cdf, not the pdf, so it sould be easy to compute those probabilities!

 

• There will be a Quiz on Wednesday. It will be based on one of the homework problems mentioned above.

 

Don’t forget, if you get stuck on a problem, you can post a question on Piazza. Make sure to give your question a good subject line and tell us the problem itself – we need this information in order to answer your question. And please only put one problem per posted question!

 

Monday 12 March class

Topics:

Cumulative distribution functions

Continuous random variables and their probability densities (MAT2572RVsAndTheirPDsNotesContinuousCase)

Cumulative distribution function for a continuous RV = the antiderivative of the probability density (being careful of domain)

 

Brief notes on cumulative distribution functions:

Definition: For a random variable $X$, the cumulative distribution function (cdf) $F(x)$ is a function with domain the set of all real numbers, such that

$F(x) = P(X\le x)$

Note: Capital $X$ is the name of  the random variable. Lower-case $x$ is the input to the function, that is, it represents some real number.

Facts about cdfs:

• For any type of random variable (discrete or continuous), $F(x)$ is a nondecreasing function.

• $\displaystyle \lim_{x\rightarrow -\infty}F(x) = 0$ and $\displaystyle \lim_{x\rightarrow \infty}F(x) = 1$

• If $X$ is a discrete (finite or infinite) RV, $F(x)$ is a step function with jumps at the possible values of $X$.

The use of the cdf: For any real numbers $a$ and $b$, $a\le b$,

$P(a \le X < b) = F(b) – F(a)$

Often (especially for continuous RVs) it is much easier to compute probabilities using the cdf rather than the pdf!

 

Winter Storm Warning, and which cities have the most unpredictable weather?

A winter storm warning has been issued for the region including NYC, beginning tonight and extending to Thursday. It is not clear right now (as I write) how and how badly NYC will be affected. It’s a good idea to monitor the situation – check weather forecasts as the storm approaches tonight.

Also, if you have not already done so, it’s a good idea to sign up for CUNY Alerts. You can do that here.

 

Here is an interesting application of statistical analysis: Which city has the most unpredictable weather? (from FiveThirtyEight). Spoiler alert: It’s not NYC.