Monthly Archives: April 2018

Monday 30 April class: notes and links

[Latexpage]

Here are notes related to today’s class. Some have been linked previously.

Instructions for using graphing calculators (TI and Casio) to compute special probability distributions

 

Special probability distributions (summary notes) – was linked previously:

MAT2572RVsDistributions

 

Distinguishing the special distributions:

Math2501DistinguishSpecialProbabilityDistributions

(This is the version I wrote for my other class: I may or may not have time to rewrite it a bit.)

 

Explanations of the names of the special distributions

 

Sampling distributions and the Central Limit Theorem (slideshow): Math2501-CentralLimitTheorem-slideshow

[I have more extensive notes on sampling distributions, statistics, and the three important theorems which I will post after a bit of a rewrite.]

Note: the standard deviation of the sample means is usually referred to as the standard error of the mean, to distinguish it from the standard deviation of the underlying random variable (the population standard deviation)

statistic is a number that is computed from a random sample. Better: a statistic is a RV whose value depends on a random sample. So far we have two examples: the sample mean and the sample variance (and, by extension, the sample standard deviation is also a statistic). Generally, a statistic is connected to a population parameter which the statistic is intended to estimate: the sample mean $\bar{X}$ is connected to the population mean $\mu$, and the sample variance $S^{2}$ is connected to the population variance $\sigma^{2}$.

[If you are interested in the reason that the sample variance is divided by n-1 rather than n, here is a good discussion from stackexchange. The mathematical reason for it is that this makes the sample variance an “unbiased estimator” of the population variance. A complete mathematical explanation is here.]

Note: “Central” in the Central Limit Theorem (CLT) refers to the mean, which is a measure of “center” of a distribution. So this is a theorem about the limits of distributions of sample means.

We will not dwell on these three theorems in this section for very long. It is important to get the idea of what the CLT says and why it is important; try not to get overwhelmed by the formal notation. A good idea is to try to rephrase it in your own words.

An example I worked out to show what the Central Limit Theorem is saying about the distribution of sample means is in this spreadsheet. This is based on finite populations, but the results are still revealing. Some more examples from populations with very wild distributions are here. There is a demonstration program you can play with here. and some discussion and more examples here.
We will be using simulations in R to illustrate the CLT.

 

Snow day Makeup Assignment

Choose one of the Datacamp courses that you think you might be interested in. A list of suggestions follows below. Each of these courses is estimated to take about 4 hours to complete: you must complete at least half of the course you choose, and take notes while you do that. (Extra credit will be given if you complete the course you have chosen.)

Then you will write a post on the OpenLab site for this class, describing the course that you worked on and something interesting you learned in it. More detail will be given in a later post.

Please choose your course no later than Wednesday 2 May, and make a private post to me in Piazza telling me which course you have chosen. When you are choosing a course, it may help to view the video at the start of the courses (if there is one) for courses you are not sure about.

Then get to work on it, so that you don’t end up doing it all at the last minute!

 

Datacamp courses: suggestions for snow day makeup

General programming in R:

Intermediate R

Writing Functions in R

Data Visualization in R

 

Making Reports using R:

Reporting with R Markdown

 

Probability and Statistics:

Foundations of Probability in R (starting with Chapter 2)

Statistical Modeling in R (part 1)

 

Financial and Marketing Applications:

Bond Valuation and Analysis in R

Credit Risk Modeling in R

Equity Valuation in R

Financial Trading in R

Introduction to Portfolio Analysis in R

Introduction to R for Finance

Forecasting Product Demand in R

 

 

 

Test 3 review UPDATED

Test 3 is scheduled for the first hour or so of class on Wednesday 25 April.

 

Review problems are here:

MAT2572Test3ReviewSpring2018

Answers and some hints are here: UPDATE the answer to #3 is now included

MAT2572Test3ReviewAnswersSpring2018

Please let us know on Piazza if you find any typographical or other errors in these!

 

This test will include the following topics: (My notes on the special distributions are here: MAT2572RVsDistributions)

• Variance and standard deviation of a finite RV

• Expected value (mean), Variance, and standard deviation of a continuous RV

• Binomial distribution

• Hypergeometric distribution

• Poisson distribution

• Estimating binomial probability using Poisson

• Normal distribution

Estimating binomial probability using normal distribution (including the continuity correction)

 

 

Don’t forget, if you get stuck on a problem, you can post a question on Piazza. Make sure to give your question a good subject line and tell us the problem itself – we need this information in order to answer your question. And please only put one problem per posted question!

 

Homework for Monday 23 April

[Latexpage]

See the notes on this page.

Instructions for some calculators are linked on this page.

 

This is practice computing normal probabilities. Make sure that you practice using the calculator you will use on the tests, since you will not be allowed to use R during tests.

The problems are posted in pdf form in the “Resources” page in Piazza.

Section 4.3 p. 245 #4.3.1, 4.3.2, 4.3.5(a, b), 4.3.9, 4.3.11

4.3.9 and 4.3.11 are examples of binomial probabilities which are being estimated by a normal probability using the Demoivre-Laplace Theorem. See the notes linked at the top of this page.

Also, more practice finding critical values of Z: use your calculator please!

Find values of the standard normal z such that:

$P(Z \le z) = 0.01$

$P(Z \gr z) = 0.01$

$P(Z \gr z) = 0.005$

 

Also don’t forget about R programming assignment 1

 

Don’t forget, if you get stuck on a problem, you can post a question on Piazza. Make sure to give your question a good subject line and tell us the problem itself – we need this information in order to answer your question. And please only put one problem per posted question!

 

Notes from Wednesday 18 April class

[Latexpage]

Estimating binomial using normal: This makes use of the DeMoivre-Laplace Theorem: essentially, what this theorem says is that under certain conditions (see below) a binomial probability distribution with parameters n, p is very close to a normal probability density with parameters $\mu = n\theta$, $\sigma = \sqrt{n\theta(1-\theta)}$.  In the past this was used to estimate binomial probabilities for computational purposes: this has become less important as computers are used to perform such computations now. The importance of this theorem is that it will be used in the estimation techniques when we do inferential statistics (coming right up!)

The conditions required are that n should be large and p should not be too close to either 0 or 1. A common rule of thumb is to require that np and n(1-p) both be at least 5, but 10 is a better choice for the cutoff – see the computation below.

————–

Here is the example that I worked in class estimating binomial by normal: obviously we do not need to do this estimation since we can easily compute the binomial probability itself. The example is merely to illustrate a little tricky part to the method, and also to show that this gives excellent estimates when we meet the conditions of the rule of thumb given above, especially if we stay away from the lower bound 5 for the mean and “failure mean”.

Suppose we flip a coin 20 times. This is an unbalanced coin, so that the probability of heads showing is 40%. What is the probability that we will get heads exactly 6 times?

This is binomial: $P(X = 6) = b(6; 20, 0.4) = \binom{20}{4}(0.4)^{6}(0.6)^{14} \approx 0.1244$

Check that it is OK to approximate it by a normal probability:

Is $\mu = n\theta \ge 5$? $n\theta = 20(0.4) = 8$, yes.

Is $n(1-\theta) \ge 5$? $n(1-\theta) = 20(0.6) = 12$, yes.

Now to estimate it, we have to be careful because we are using a continuous density to estimate a discrete probability distribution! So if you think about estimating the area of the bar of the probability distribution histogram by the area under a normal curve, you can see that we need to find the normal probability that X is between 5.5 and 6.5 (the edges of the bar). We estimate by a normal probability with mean $\mu = 8$ and standard deviation $\sigma = \sqrt{8(0.4)(0.6)} = \sqrt{4.8}$.

Therefore $P(X=6) \approx n(5.5<X<6.5; 8, \sqrt{4.8})$

$\approx 0.1212$ by using the normal cdf function.

In R, we compute this by using pnorm(6.5, 8, sqrt(4.8)) – pnorm(5.5, 8, sqrt(8.8))

In your calculator, locate the normal cdf functions. Instructions for some calculators are linked on this page.

 

So the normal approximation gives us two decimal places which are correct, and the third decimal place is off.

This is good enough for many purposes, but not always good enough. Because the mean 8 is close to the cutoff 5, we are seeing the limits of this kind of estimate. The reason for using the higher cutoff of 10 for $np$ and $n(1-p)$ is to get more accuracy (a third decimal place).

 

Some more examples, with pictures of the graphs, are worked out here.

Using calculators to compute special distribution probabilities

I will add to this as I find more resources. If you find something useful, please let us know!

 

Using TI-84 for binomial probability: [pdf]

Other TI graphing calculators probably work similarly.

Here is a resource that discusses many useful computations for probability and statistics on TI-84, including finding binomial and Poisson probabilities:

For the Casio fx-9750gii Power Graphic calculator, here is a comprehensive guide to statistics and probability functions.

 

 

If you are using a calculator not listed above and these instructions don’t help, let me know!

Homework for Wednesday 18 April

Add to the problems in the previous assignment:

Section 4.2 p.226 # 4.2.5, 4.2.7

Section 4.2 p. 233 #4.2.10, 4.2.11, 4.2.13, 4.2.17, 4.2.19

Also don’t forget the R assignment: see also these notes

and to work on corrections to Test 2 if you wish to do so. (DO NOT OMIT the part about stating what you did wrong in the original Test!)

 

There will be a Quiz on Wednesday, based on one of the problems from the assignment above.

Don’t forget, if you get stuck on a problem, you can post a question on Piazza. Make sure to give your question a good subject line and tell us the problem itself – we need this information in order to answer your question. And please only put one problem per posted question!

 

 

Math Club this Thursday

This might be interesting!

 

This week in the City Tech Math Club:

 

Title: “Statistical Mechanics and Combinatorics”

Speaker: Dr. Ezra Halleck (NYCCT)

Date/Room: Thursday April 19, 2018, 12:50-2:00pm, Namm 720

 

Abstract: In this picture-rich and proof-light treatment, I will begin with the connections between the 2 subjects but focus on enumerative and bijective aspects. One example is tiling using dimers. Another is a model of ice, again in a plane. There will be several hands-on activities as well as recursive programming examples in MATLAB, Python and R.

 

Pizza and refreshments will be served at 12:45pm. Feel free to stop by anytime and let interested students know about this event. We are still in need of volunteers to give talks in May.

https://openlab.citytech.cuny.edu/mathclub/

R programming assignment 1

(This is in addition to the Datacamp assignments, of course!)

Below you will find an R script which makes a histogram of the probability distribution function for problem 3.2.2a. I have added some comments, to include the description of the problem and to explain some of the coding.

I have also included the graph, which I exported as a jpeg.

 

Your assignment is to write a similar script for the probability distribution functions of problems 3.2.1(a and b) and 3.2.2(b). Each problem should have a separate script.

You may do this by editing my script, but make sure that you change everything that needs to be changed. Also make sure that your variable name(s) are good and descriptive.

You should also explore the “help” in Rstudio for the barplot function, and see what various features you can add or change in the graph. If you add a feature, put in a comment to describe what you did.

It is possible to write the scripts in a word processing program and save as a text file with the extension .r (although your word processor may object to that!), but you will need to run them in Rstudio anyway, so it is probably best to do the final editing in Rstudio. The “R script” menu item is found by clicking on the green + sign at the upper left of the Rstudio window.

 

Save your scripts with names of the following format:

Lastname_Firstname_problemnumber_Graph.r

Where Lastname = your last name

Firstname = your first name

problemnumber = the number of the problem

For example, my script was saved under the name

Shaver_Sybil_3.2.2a_Graph.r

 

Also, export the graphs as either jpegs or pdfs, your choice. The “export” is at the top of the Plots tab. Save them under the same names as the scripts, but with the extension .jpeg or .pdf instead of .r

 

Post the three scripts and the three graphs in Piazza in a private note to me. This is how you will submit your work.

The scripts and graphs are due by 10 PM Monday the 23rd of April.

 

Here is my R script and after it is my graph:

#Problem 3.2.2a: two numbers are selected from the integers 1 through 5, with replacement.
# X represents the larger of the two numbers. This is the pdf for X.
problem3_2_2a_dist <- c(1/25, 3/25, 5/25, 7/25, 9/25)
# The line below adds labels to the bars showing the X value.
# as.character is used because the “names” attribute must be of character type.
# If we wanted to list the numbers, we would have to put them in quotes to make them characters.
names(problem3_2_2a_dist) <- as.character(1:5)
# I could put a comment here to explain the features I have added to this graph.
barplot(problem3_2_2a_dist, space=0, xlab = “larger number”, ylab = “probability”, col=”blue”)