Conceptual

Ex.1

It was mentioned in the chapter that a cubic regression spline with one knot at \(\xi\) can be obtained using a basis of the form \(x, x^2, x^3, (x − \xi)_+^3\), where \((x − \xi)_+^3 = (x − \xi)^3\) if \(x > \xi\) and equals 0 otherwise.

We will now show that a function of the form

\(f(x) = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 + \beta_4(x − \xi)_+^3\)

is indeed a cubic regression spline, regardless of the values of \(\beta_0, \beta_1, \beta_2,\beta_3, \beta_4\).

(a) Find a cubic polynomial

\(f_1(x) = a_1 + b_1x + c_1x^2 + d_1x^3\)

such that \(f(x) = f_1(x)\) for all \(x \le \xi\). Express \(a_1, b_1, c_1, d_1\) in terms of \(\beta_0, \beta_1, \beta_2,\beta_3, \beta_4\).

(b) Find a cubic polynomial

\(f_2(x) = a_2 + b_2x + c_2x^2 + d_2x^3\)

such that \(f(x) = f_2(x)\) for all \(x > \xi\). Express \(a_2, b_2, c_2, d_2\) in terms of \(\beta_0, \beta_1, \beta_2,\beta_3, \beta_4\). We have now established that \(f(x)\) is a piecewise polynomial.

Answer: (a) and (b) follows in the picture:

(c) Show that \(f_1(\xi) = f_2(\xi)\). That is, \(f(x)\) is continuous at \(\xi\).

(d) Show that \(f_1'(\xi) = f_2'(\xi)\). That is, \(f'(x)\) is continuous at \(\xi\).

(e) Show that \(f_1''(\xi) = f_2''(\xi)\). That is, \(f''(x)\) is continuous at \(\xi\).

Answer: (c), (d) and (e) follows in the picture:

Ex.2

Suppose that a curve \(\hat{g}\) is computed to smoothly fit a set of \(n\) points using the following formula:

where \(g^{(m)}\) represents the \(m\)th derivative of \(g\) (and \(g^{(0)} = g\)). Provide example sketches of \(\hat{g}\) in each of the following scenarios.

(a) \(\lambda = \infty\), \(m = 0\) (a) \(\lambda = \infty\), \(m = 1\) (a) \(\lambda = \infty\), \(m = 2\) (a) \(\lambda = \infty\), \(m = 3\) (a) \(\lambda = 0\), \(m = 3\)

Answer: Following are the answers

… the and sketches:

Ex.3

Suppose we fit a curve with basis functions \(b_1(X) = X, b_2(X) = (X − 1)^2I(X \ge 1)\). (Note that \(I(X \ge 1)\) equals \(1\) for \(X \ge 1\) and \(0\) otherwise.) We fit the linear regression model

\(Y = \beta_0 + \beta_1b_1(X) + \beta_2b_2(X) + \epsilon\),

and obtain coefficient estimates \(\hat\beta_0 = 1, \hat\beta_1 = 1, \hat\beta_2 = −2\). Sketch the estimated curve between \(X = −2\) and \(X = 2\). Note the intercepts, slopes, and other relevant information.

Answer: The graph shows that the curve is linear in [-2, 1] and quadratic in [1, 2]

x <- seq(-5, 5, by = 0.1)
y <- 1 + x - 2 * (x - 1)^2 * if_else(x >= 1, 1, 0)

tbl <- tibble(
        x = x,
        y = y
) %>%
        mutate(x_seg = if_else(x <= 2 & x >= -2, x, 0),
               y_seg = 1 + x_seg - 2 * (x_seg - 1)^2 * if_else(x_seg >= 1, 1, 0)
        )

ggplot(tbl) + 
        geom_line(aes(x = x, y = y)) + 
        geom_line(aes(x = x_seg, y = y_seg), color = "green", size = 2) +
        geom_vline(xintercept = 1, color = "red") + 
        geom_vline(xintercept = -2, color = "green") + 
        geom_vline(xintercept = 2, color = "green")

#clean up and load again kr funcs
#
rm(list = ls())
source("kRfunlib.R")

Ex.4

Suppose we fit a curve with basis functions \(b_1(X) = I(0 ≤ X ≤ 2) − (X − 1)I(1 ≤ X ≤ 2)\), \(b_2(X) = (X − 3)I(3 ≤ X ≤ 4) + I(4 < X ≤ 5)\).

We fit the linear regression model

\(Y = β_0 + β_1b_1(X) + β_2b_2(X) + \epsilon\),

and obtain coefficient estimates \(\hat\beta_0 = 1, \hat\beta_1 = 1, \hat\beta_2 = 3\).

Sketch the estimated curve between \(X = −2\) and \(X = 2\). Note the intercepts, slopes, and other relevant information.

Answer: The graph shows that the curve is constant in [-2, 0] and [0, 1], and linear in [1, 2]

x <- seq(-2, 2, by = 0.1)
y <- 1 + 
        1 * if_else((x >= 0 & x <=2), 1, 0) - (x - 1) * if_else((x >= 1 & x <= 2), 1, 0) +
        3 * (x - 3) * if_else((x >= 4 & x <= 4), 1, 0) + if_else((x > 4 & x <= 5), 1, 0)

tbl <- tibble(
        x = x,
        y = y
)

ggplot(tbl) +
        geom_point(aes(x = x, y = y), color = "green") +
        geom_vline(xintercept = 0, color = "red") + 
        geom_vline(xintercept = 1, color = "red")

#clean up and load again kr funcs
#
rm(list = ls())
source("kRfunlib.R")

Ex.5

Consider two curves, \(\hat{g_1}\) and \(\hat{g_2}\), defined by

where \(g^{(m)}\) represents the m\(th\) derivative of g.

(a) As \(\lambda \rightarrow \infty\), will \(\hat{g_1}\) and \(\hat{g_2}\) have the smaller training RSS?

Answer: \(\hat{g_2}\) has the smaller training RSS, being more flexible in going through the training ponts.

(b) As \(\lambda \rightarrow \infty\), will \(\hat{g_1}\) and \(\hat{g_2}\) have the smaller test RSS?

Answer: \(\hat{g_1}\) would have lower variance, being less flexible

(c) For \(\lambda = 0\), will \(\hat{g_1}\) and \(\hat{g_2}\) have the smaller training and test RSS?

Answer: They would be the same as the penalty term drops out

Applied

Ex.6

In this exercise, you will further analyze the Wage data set considered throughout this chapter.

(a) Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polynomial.

What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA?

Make a plot of the resulting polynomial fit to the data.

Answer: As a first step we use cv to select the optimal degree. After that, as step 2, we use ANOVA for the same taks and, in a 3rd step, we compare the results. As a final step we plot the fit.

Step 1) CV

library(boot)

max_degree <- 5

get_poly_cv_errors <- function(n, ndata){
        glm.fit <- glm(wage ~ poly(age, n), data = ndata)
        cv.error <- cv.glm(ndata, glm.fit)$delta[1]
        tbl <- tibble(
                Degree = n,
                Error =  cv.error,
                Type = as_factor("cv")
        )
        tbl
}

map_df(1:max_degree, get_poly_cv_errors, ndata = Wage)

## # A tibble: 5 x 3
##   Degree Error Type 
##    <int> <dbl> <fct>
## 1      1 1676. cv   
## 2      2 1601. cv   
## 3      3 1596. cv   
## 4      4 1595. cv   
## 5      5 1595. cv

Step 2) ANOVA

fit.1 = lm(wage ~ age, data=Wage)
fit.2 = lm(wage ~ poly(age, 2), data=Wage)
fit.3 = lm(wage ~ poly(age, 3), data=Wage)
fit.4 = lm(wage ~ poly(age, 4), data=Wage)
fit.5 = lm(wage ~ poly(age, 5), data=Wage)
anova(fit.1, fit.2, fit.3, fit.4, fit.5)

## Analysis of Variance Table
## 
## Model 1: wage ~ age
## Model 2: wage ~ poly(age, 2)
## Model 3: wage ~ poly(age, 3)
## Model 4: wage ~ poly(age, 4)
## Model 5: wage ~ poly(age, 5)
##   Res.Df     RSS Df Sum of Sq        F    Pr(>F)    
## 1   2998 5022216                                    
## 2   2997 4793430  1    228786 143.5931 < 2.2e-16 ***
## 3   2996 4777674  1     15756   9.8888  0.001679 ** 
## 4   2995 4771604  1      6070   3.8098  0.051046 .  
## 5   2994 4770322  1      1283   0.8050  0.369682    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step 3) Comparison

Degree 4 polynomila exhibits the lowest CV error and is significant according to ANOVA

Step 4) Plotting the Fit

Wage_augm <- augment(fit.4, data = Wage)

ggplot(Wage_augm) +
        geom_point(aes(x = age, y = wage)) +
        geom_line(aes(x = age, y = .fitted), color = "red", size = 2)

#clean up and load again kr funcs
#
rm(list = ls())
source("kRfunlib.R")

(b) Fit a step function to predict wage using age, and perform crossvalidation to choose the optimal number of cuts. Make a plot of the fit obtained.

library(boot)

max_cut <- 10

get_step_cv_errors <- function(n, ndata, var){
        ndata$cut_var <- cut(var, n)
        glm.fit <- glm(wage ~ cut_var, data = ndata)
        cv.error <- cv.glm(ndata, glm.fit)$delta[1]
        tbl <- tibble(
                Cuts = n,
                Error =  cv.error,
                Type = as_factor("cv")
        )
        tbl
}

map_df(2:max_cut, get_step_cv_errors, ndata = Wage, var = Wage$age)

## # A tibble: 9 x 3
##    Cuts Error Type 
##   <int> <dbl> <fct>
## 1     2 1734. cv   
## 2     3 1683. cv   
## 3     4 1636. cv   
## 4     5 1631. cv   
## 5     6 1623. cv   
## 6     7 1612. cv   
## 7     8 1601. cv   
## 8     9 1610. cv   
## 9    10 1605. cv

We observer min cv error at max_cut = 8

Following is the plot of the selected fit:

Wage$cut_var <- cut(Wage$age, 8)

glm.fit <- glm(wage ~ cut_var, data = Wage)

Wage_augm <- augment(glm.fit, data = Wage)

ggplot(Wage_augm) +
        geom_point(aes(x = age, y = wage)) +
        geom_line(aes(x = age, y = .fitted), color = "red", size = 2)

#clean up and load again kr funcs
#
rm(list = ls())
source("kRfunlib.R")

Ex.7

The Wage data set contains a number of other features not explored in this chapter, such as marital status (maritl), job class (jobclass), and others. Explore the relationships between some of these other predictors and wage, and use non-linear fitting techniques in order to fit flexible models to the data.

Create plots of the results obtained, and write a summary of your findings.

Answer: First we explore the relationships ….

library(corrr)

# Wage %>% 
#         correlate() %>%
#         focus(wage)

#clean up and load again kr funcs
#
rm(list = ls())
source("kRfunlib.R")

Ch7 Moving Beyond Linearity

Kiril Raytchev