Monthly Archives: May 2018

snow day make up datacamp Rstudio W.green

William Green

intermediate R

I chose intermediate R because it was the direct next level to the beginners course.  The course dealt with conditional loops, conditional statements. these loops and statements are very key into the major of Electrical Engineering. When using some of the machine codes and programming from engineering the same codes arise and the intermediate R reinforces our understanding and programming writing abilities. The course had some topics with conditions where the program will continue to move on to the next instructions if the condition is not met and if the condition is met the program will break the code and give you a answer. this is demonstrated in engineering with for and while loops. one thing I learned in the intermediate R is  writing and adjusting date and time this is found in the utilities section of the topic. This is something that I have not done while programming in engineering and posed some difficulty.

Some of the topics included conditions, loop, apply family, functions and utility.

Give a brief (but not vague!) description of the topics that are covered in that course.

 

Last two R assignments

ALL R assignments are absolutely due by midnight of Thursday 24 May, no exceptions!

————————————————————————

Finding a confidence interval using t-test in R:

Work problem 7.4.8: I copied the problem below.

I have typed up the data already:

7.4.8data

Just click-hold on the link and choose “open in Rstudio”

You will create a vector containing the data, and then do a t-test on that vecotr of data using the t.test function.  Copy and paste to Piazza as you did with the previous assignment, and then type the confidence interval itself at the top of your post. When you type in the confidence interval, round the numbers to the nearest whole number.

7.4.8. The following table lists the typical cost of repairing
the bumper of a moderately priced midsize car damaged
by a corner collision at 3 mph. Use these observations
to construct a 95% confidence interval for μ, the true
average repair cost for all such automobiles with similar
damage. The sample standard deviation for these data is
s =$369.02.
Make/Model
Repair
Cost Make/Model
Repair
Cost
Hyundai Sonata $1019 Honda Accord $1461
Nissan Altima $1090 Volkswagen Jetta $1525
Mitsubishi Galant $1109 Toyota Camry $1670
Saturn AURA $1235 Chevrolet Malibu $1685
Subaru Legacy $1275 Volkswagen Passat $1783
Pontiac G6 $1361 Nissan Maxima $1787
Mazda 6 $1437 Ford Fusion $1889
Volvo S40 $1446 Chrysler Sebring $2484
Source: www.iihs.org/ratings/bumpersbycategory.aspx?


Second assignment: writing up about your Datacamp course that you chose for the snow day make-up.

You will make a post to this OpenLab blog. To do this you must join OpenLab and join this course – you may already have done this. Choose this course once you log in to OpenLab, and then click on “Dashboard” on the right side of the page. This will take you to the Dashboard where you will find a link “Posts” on the left side: use it to “Add New”.

In your post, you should do the following:

First, give your name.

Give the title of your course and the link to it: you can copy the link from this post

Describe briefly why you chose it.

Give a brief (but not vague!) description of the topics that are covered in that course.

Choose one specific example of a thing you learned that you found interesting or useful. Here you will details, and also tell us why you chose it to write about.

Hypothesis Testing: notes and problems

[Latexpage]

 

Instructions for using the various graphing calculators are linked in this post.

Here is how hypothesis testing works for the t-test on the mean:

We first state our null hypothesis and alternative hypothesis.

For this type of test, the null hypothesis always has an = sign in it:

$H_{0}: \mu = \mu_{0}$

The alternative can have any one of three forms, depending on what is important to distinguish. Either $H_{a}: \mu \neq \mu_{0}$

or $H_{a}: \mu < \mu_{0}$

or $H_{a}: \mu > \mu_{0}$

Where $\mu_{0}$ is a number (the hypothesized value of the population mean).

Using the t.test function in R, or the t-test on your calculator, the information we get includes p-values.

A p-value measures the strength of the evidence (the sample data) against the null hypothesis. Here is another source that explains more about p-values. The smaller p is, the more sure we are that we should reject the null hypothesis.

In practice, we choose a value of $\alpha$, called the level of significance of the test. It represents the probability of making a Type I error. (Usually that means we want $\alpha$ to be small, 5% or less.) Then we look at the p-value returned by R or by the calculator. If $p \le \alpha$ we will reject the null hypothesis: otherwise, we accept the null hypothesis or reserve judgement.

 

Here are problems to practice deciding what form of the alternative hypothesis to use: (we did some of them in class)

Math2501HypothesisTestingProblems

Here are problems to practice using the t-test and drawing conclusions from it: #4-5 and 8-11 especially

Practice problems

 

 

R assignment

# Take a random sample of size 50 from the binomial distribution that relates to the Zener card experiment:

binom_sample1 <- rbinom(50, 25, 0.2)

# Print the values in binom_sample1

binom_sample1

# Get a 95% confidence interval for that sample data

t.test(binom_sample1, mu = 5)

# take another sample and call it binom_sample2, and do the same

What does (and does not) 95% confidence mean?

[An old post which I am reposting here.]

 

As a technical note, a 95% confidence interval does not mean that there is a 95% probability that the interval contains the true mean. The interval computed from a given sample either contains the true mean or it does not. Instead, the level of confidence is associated with the method of calculating the interval. The confidence coefficient is simply the proportion of samples of a given size that may be expected to contain the true mean. That is, for a 95% confidence interval, if many samples are collected and the confidence interval computed, in the long run about 95% of these intervals would contain the true mean.

From the Engineering Statistics Handbook [National Institute of Standards and Technology, Information and Technology Laboratory]