We will have a quiz (Quiz #2) tomorrow (Wednesday, Feb 26). The quiz will be a simple exercise involving generating a scatterplot and calculating the correlation coefficient (using the spreadsheet command =correl) for a given paired data set.
To prepare for the quiz, review the class outline on those topics and also review the exercises from “HW4-Paired Data” on scatterplots and the correlation coefficient (exercises #6, 9, 10, 13, 14, 19, 22):
- you can use the built-in spreadsheet function =correl to calculate the correlation coefficient for #6, 19, and 22
- #19 and #22 ask for additional statistics related to linear regression–those won’t be covered on tomorrow’s quiz
Here are additional notes and hints on “HW4-Paired Data” (which is due Mon March 2)
- #1-2, 5 (review of equations of lines, independent/dependent variables)
- recall that if we have y given as a function of x, we call x the independent variable, and y the dependent variable
- especially in the context of linear regression, where we get a linear function (or “linear model”) y = α + βx that seeks to explain the y-variable in terms of the x-variable, then x is sometimes called the explanatory or input variable, and y is called the response or output variable
- #3, 4, 20, 21, 22 (linear regression)
- for #3 and 22, use the built-in spreadsheet functions =slope(y_data, x_data) and =intercept(y_data, x_data) to find the “least squares line” (i.e., the linear regression line y = α + βx, where α is the y-intercept and β is the slope
- #7, 8, 17, 19 ask about the “coefficient of determination”
- the coefficient of determination is defined as r^2, i.e., the square of the correlation coefficient
- thus, for #7, just square the given correlation coefficient to get the coefficient of determination
- also note that the slope of the regression line has the same sign as the correlation coefficient
- the coefficient of determination gives the proportion (i.e., percentage) of the variation in the dependent (y) variable explained by regression against the independent (x) variable) – this is relevant for #19:
- first calculate the correlation coefficient r
- square it to get the coefficient of determination; convert this decimal to a % to get the “percentage of the variation in y [that] would be explained by the regression line”
- see https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data#assessing-the-fit-in-least-squares-regression or https://en.wikipedia.org/wiki/Coefficient_of_determination for (a lot!) more
- the coefficient of determination is defined as r^2, i.e., the square of the correlation coefficient
- extra credit: #15, 17, 23
- for #17, see https://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions for the definitions of the following:
- SST (Total Sum of Squares)
- SSR (Regression Sum of Squares, or “explained sum of squares”), and
- SSE (Error Sum of Squares, or “residual sum of squares”)
- for #17, see https://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions for the definitions of the following: