We will have a quiz (Quiz #2) tomorrow (Wednesday, Feb 26). The quiz will be a simple exercise involving generating a scatterplot and calculating the correlation coefficient (using the spreadsheet command =correl) for a given paired data set.

To prepare for the quiz, review the class outline on those topics and also review the exercises from “HW4-Paired Data” on scatterplots and the correlation coefficient (exercises #6, 9, 10, 13, 14, 19, 22):

- you can use the built-in spreadsheet function =correl to calculate the correlation coefficient for #6, 19, and 22
- #19 and #22 ask for additional statistics related to linear regression–those won’t be covered on tomorrow’s quiz

Here are additional notes and hints on “HW4-Paired Data” (which is due Mon March 2)

- #1-2, 5 (review of equations of lines, independent/dependent variables)
- recall that if we have
*y*given as a function of*x*, we call*x*the independent variable, and*y*the dependent variable - especially in the context of linear regression, where we get a linear function (or “linear model”)
*y = α + βx*that seeks to explain the y-variable in terms of the x-variable, then*x*is sometimes called the explanatory or input variable, and*y*is called the response or output variable

- recall that if we have

- #3, 4, 20, 21, 22 (linear regression)
- for #3 and 22, use the built-in spreadsheet functions =slope(y_data, x_data) and =intercept(y_data, x_data) to find the “least squares line” (i.e., the linear regression line
*y = α + βx*, where*α*is the*y*-intercept and*β*is the slope

- for #3 and 22, use the built-in spreadsheet functions =slope(y_data, x_data) and =intercept(y_data, x_data) to find the “least squares line” (i.e., the linear regression line

- #7, 8, 17, 19 ask about the “coefficient of determination”
- the coefficient of determination is defined as r^2, i.e., the square of the correlation coefficient
- thus, for #7, just square the given correlation coefficient to get the coefficient of determination
- also note that the slope of the regression line has the same sign as the correlation coefficient

- the coefficient of determination gives the proportion (i.e., percentage) of the variation in the dependent (y) variable explained by regression against the independent (x) variable) – this is relevant for #19:
- first calculate the correlation coefficient r
- square it to get the coefficient of determination; convert this decimal to a % to get the “percentage of the variation in y [that] would be explained by the regression line”

- see https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data#assessing-the-fit-in-least-squares-regression or https://en.wikipedia.org/wiki/Coefficient_of_determination for (a lot!) more

- the coefficient of determination is defined as r^2, i.e., the square of the correlation coefficient

- extra credit: #15, 17, 23
- for #17, see https://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions for the definitions of the following:
- SST (Total Sum of Squares)
- SSR (Regression Sum of Squares, or “explained sum of squares”), and
- SSE (Error Sum of Squares, or “residual sum of squares”)

- for #17, see https://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions for the definitions of the following: