Canonical Analysis

Characterizing optima in response surfaces.

Kris Sankaran true
11-30-2021

Readings 11.3, Rmarkdown

  1. One we are in the vicinity of a potentially optimal point, we need more subtle methods. There are two essential issues,
  1. Canonical analysis helps solve both of these problems. First, we’ll define canonical analysis, and then we’ll illustrate its use in finding optimal points and proposing new directions.

Mathematical Setup

  1. A first-order model never has any isolated optima. It’s either perfectly flat, or increases forever in some direction. A second-order model, however, does potentially have optimal values.
First order models have no isolated optima.

Figure 1: First order models have no isolated optima.

  1. Calculus can help us identify these optima. Recall that, at a function’s maximum, its gradient is zero. This makes it natural to consider the gradient of a fitted second order surface, \[\begin{align*} \nabla_{x}\hat{y}\left(x\right) &= \nabla_{x}\left[x^{T}\hat{b} + x^{T}\hat{B}x\right] \\ &= \hat{b} + 2\hat{B}x \end{align*}\]

so the function has a stationary point at \[\begin{align*} x_{\ast} &= -\frac{1}{2}\hat{B}^{-1}\hat{b}. \end{align*}\]

  1. We’ll denote the value of the surface at this optimal point by \[\begin{align*} \hat{y}_{\ast} &= \hat{b}^{T}x + x^{T}\hat{B}\hat{x} \\ &= -\frac{1}{2}\hat{b}^{T}\hat{B}^{-1}\hat{b} + \frac{1}{4}\hat{b}^{T} \hat{B}^{-1} \hat{B} \hat{B}^{-1} \hat{b} \\ &= -\frac{1}{4}\hat{b}^{T}\hat{B}^{-1}\hat{b} \end{align*}\]

  2. Just like in ordinary calculus, there are three possibilities,

Second order model with a local minimum and at a saddlepoint.Second order model with a local minimum and at a saddlepoint.

Figure 2: Second order model with a local minimum and at a saddlepoint.

  1. In one-variable calculus, we’d just take second derivatives, and see whether the function curves up, down, or is flat. We’re in higher-dimensions, though, so we’ll use the generalization of the second-derivative test, called the canonical representation.

Canonical Representation

Geometric representation of the eigendecomposition.

Figure 3: Geometric representation of the eigendecomposition.

  1. Since \(\hat{B}\) is symmetric, we can find an eigendecomposition, \[\begin{align*} \hat{B} &= U \Lambda U^{T}, \end{align*}\] where \(U\) are orthogonal eigenvectors and \(\Lambda\) is a diagonal matrix containing eigenvalues \(\lambda_{k}\). Define \(w\left(x\right) = U^{T}\left(x - x_{\ast}\right)\). It turns out that our second-order fit can be equivalently expressed as \[\begin{align*} \hat{y}\left(x\right) &= \hat{y}_{\ast} + w^T\left(x\right)\Lambda w\left(x\right) \\ &= \hat{y}_{\ast} + \sum_{k = 1}^{K} \lambda_{k}w^{2}_{k}\left(x\right) \end{align*}\] which is much simpler because there are no interaction terms between the \(w_{k}\)’s (like there are between \(x_{k}\)’s).

Proof1: The textbook skips the proof of this fact, but if you have seen some linear algebra, the approach is interesting. First, plug-in the value of \(\hat{y}_{\ast}\) from above, and expand the definition of \(w\left(x\right)\), \[\begin{align*} \hat{y}\left(x\right) &= \hat{y}_{\ast} + w^{T}\left(x\right) \Lambda w\left(x\right) \\ &= -\frac{1}{4}\hat{b}^{T}\hat{B}\hat{b} + \left(x - x_{\ast}\right)^{T}U\Lambda U^{T} \left(x - x_{\ast}\right). \end{align*}\]

Now, use the definition of our original eigendecomposition \(\hat{B} = U \Lambda U^{T}\) and expand the quadratic, \[\begin{align*} \hat{y}\left(x\right) &= -\frac{1}{4}\hat{b}^{T}\hat{B}\hat{b} + \left(x - x_{\ast}\right)^{T}\hat{B}\left(x - x_{\ast}\right) \\ &= -\frac{1}{4}\hat{b}^{T}\hat{B}\hat{b} + x^{T}\hat{B}x - 2x^{T}\hat{B}x_{\ast} + x^{T}_{\ast}\hat{B}x_{\ast}. \end{align*}\]

To simplify, let’s plug-in the expression for the stationary point \(x_{\ast} = -\frac{1}{2}\hat{B}^{-1}\hat{b}\) that we found above, \[\begin{align*} \hat{y}\left(x\right) &= -\frac{1}{4}\hat{b}^{T}\hat{B}\hat{b} + x^{T}\hat{B}x + x^{T}\hat{B}\hat{B}^{-1}\hat{b} + \frac{1}{4}\hat{b}^{T}\hat{B}^{-1}\hat{B}\hat{B}^{-1}\hat{b} \\ &= \hat{b}^{T}x + x^{T}\hat{B}x \end{align*}\]

which was exactly our original definition of the second-order model.

Implications

  1. The representation of the second-order model by \[\begin{align} \label{eq:canonical} \hat{y}\left(x\right) &= \hat{y}_{\ast} + \sum_{k = 1}^{K} \lambda_{k}w^{2}_{k}\left(x\right) \end{align}\] is an important one, because
  1. To see the first point, suppose the \(\lambda_{k}\)’s were all negative. Then, expression is maximized when \(w_{k}\left(x\right)\) are all zero — any change in the \(w_{k}\left(x\right)\)’s from there can only ever decrease \(\hat{y}\left(x\right)\).
  1. Similarly, when all \(\lambda_{k}\) are all positive, then moving the \(w_{k}\left(x\right)\)’s by any amount can only increase \(\hat{y}\left(x\right)\). This means we’re at a minimum!

  2. Finally, when some of the \(\lambda_{k}\)’s are positive and others are negative, then we’re at a saddlepoint.

  1. This last discussion also gives us insight into the second point.

Data Example

  1. We’ll continue the chemical process experiment from last time. Here, the experimenter has performed a central composite design experiment, a refinement of the earlier factorial designs. The hope is that now a configuration that maximizes the yield can be precisely isolated.
library(readr)
library(dplyr)
library(ggplot2)

chem <- read_csv("https://uwmadison.box.com/shared/static/nbaj1m8j7tuaqmznjlrsgbzyhp9k61i8.csv")
ggplot(chem) +
  geom_point(
    aes(time, temp, col = yield),
    position = position_jitter(w = 0.3, h = 0.3)
  ) +
  coord_fixed() +
  scale_color_viridis_c()

  1. As before, we will code the data. We use SO to fit a second-order model. It appears that the linear and pure quadratic components are very strong, and that there is limited, if any, interaction between these two variables.
library(rsm)

chem_coded <- coded.data(chem, time_coded ~ (time - 35) / 5, temp_coded ~ (temp - 155) / 5)
fit <- rsm(yield ~ SO(temp_coded, time_coded), data = chem_coded)
summary(aov(fit))
                            Df Sum Sq Mean Sq F value   Pr(>F)    
FO(temp_coded, time_coded)   2 10.043   5.021  70.814 2.27e-05 ***
TWI(temp_coded, time_coded)  1  0.250   0.250   3.526    0.103    
PQ(temp_coded, time_coded)   2 17.954   8.977 126.594 3.19e-06 ***
Residuals                    7  0.496   0.071                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Next, we can perform a canonical analysis, to make sure that we are at a stationary point. Since both of the eigenvalues are negative, this is in fact the case. The eigenvectors also tell us directions that (in this case) decrease the yield fastest.
analysis <- canonical(fit)
analysis
$xs
temp_coded time_coded 
  4.305847  10.389230 

$eigen
eigen() decomposition
$values
[1] -0.9634986 -1.4142867

$vectors
                 [,1]       [,2]
temp_coded -0.9571122 -0.2897174
time_coded -0.2897174  0.9571122
  1. We can identify the specific location of the optimum, by decoding the canonical analysis.
stationary <- code2val(analysis$xs, codings = codings(chem_coded))
stationary
     temp      time 
176.52923  86.94615 
w1 <- code2val(analysis$xs + analysis$eigen$vectors[, 1], codings = codings(chem_coded))
w2 <- code2val(analysis$xs + analysis$eigen$vectors[, 2], codings = codings(chem_coded))
  1. Finally, let’s plot the response surface, with the canonical points overlaid. For this, we can use the contour and segments methods provided by the rsm package.
contour(fit, ~ time_coded + temp_coded, image = TRUE, asp = 1)
segments(stationary[2], stationary[1], w1[2], w1[1], col = "red")
segments(stationary[2], stationary[1], w2[2], w2[1], col = "red")
The response surface, with canonical vectors overlaid.

Figure 4: The response surface, with canonical vectors overlaid.


  1. You can skip this proof without worrying whether it will appear in later lectures or assignments / exams. But I encourage you to go on, it gives a flavor of more advanced topics in statistics.↩︎