Statistical Experimental Design: Method of Steepest Ascent

When we last discussed response surfaces (in Chapter 5), we showed how to fit a regression surface to a fixed set of runs. This let us find configurations of factors that optimized some response of interest.
However, the real power of response surfaces emerges when we think sequentially, using the results from one fit to plan a series of follow-up experiments, each bringing us closer to an optimal configuration of factors.

First and Second-Order Models

It makes sense to gradually refine our designs as we approach a potential optimal point. At the start of experimentation, even a few small runs are likely to point us in the right direction. Near the end, we’ll want to tune a good configuration of factors into an optimal one.

Figure 1: In initial runs we will use first-order models. When we approach a potential optimum, we will switch to second-order models.

One approach is to divide experimentation into two phases. We can start with first-order models, and then proceed to second-order when in the vicinity of the optimum.
A first-order model is a linear model without any interactions or nonlinearities. It can be fit using relatively few samples; for example, unreplicated or fractional factorial designs. It is defined as, \[\begin{align*} y_{i} &= \beta_{0} + \sum_{k = 1}^{K}\beta_{k}x_{ik} + \epsilon_{i} \\ &= x_{i}^{T}b + \epsilon_{i} \end{align*}\]

where we defined \[\begin{align*} x_{i} = \begin{pmatrix} 1 \\ x_{i1} \\ \vdots \\ x_{iK} \end{pmatrix}, \end{align*}\] and \[\begin{align*} b = \begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_{K} \end{pmatrix} \end{align*}\]

A second-order model is a linear model with interactions and quadratic terms. These surfaces are more complex, so require more involved sampling, like replicated factorial or central composite designs. It is defined as, \[\begin{align*} y_{i} &= \beta_0 + \sum_{k = 1}^{K}\beta_{kk}x_{ik}^{2} + \sum_{k < k_{\prime}} \beta_{k k^{\prime}} x_{ik}x_{ik^{\prime}} + \epsilon_{i} \\ &= x_{i}^{T}b + x_{i}^{T} B x_i + \epsilon_{i} \end{align*}\]

where we defined \(B\) as the symmetric \(K \times K\) matrix with diagonal entries \(\beta_{kk}\) and off-diagonal elements \(\frac{1}{2}\beta_{kl} = \frac{1}{2}\beta_{lk}\).

Method of Steepest Ascent

The method of steepest ascent uses the results of a first-order model to propose a new configuration of factors that brings us closer to the optimum. Specifically, we use the following recipe,

Using the current set of sampled factor configurations, fit the first-order (linear) surface, \[\begin{align*} \hat{y}\left(x\right) &= x^{T}\hat{b} \end{align*}\]
Find the coordinate \(k^{\ast}\) of the coefficient that has the largest magnitude, \[\begin{align*} k^{\ast} &= \arg \max_{k \in \{1, \dots, K\}} \left|\hat{\beta}_{k}\right| \end{align*}\] which represents the factor to which the response is most sensitive. Define some reasonable stepsize \(\Delta x_{k^{\ast}}\) for this particular factor, so that the next run brings us closer to the optimum without being so far that the first-order model may not hold.
Update each factor’s sampling values according to the sensitivities \(\hat{\beta_{k}}\) of the response to that factor, and adjusting for the stepsize defined in step (2). \[\begin{align*} x \leftarrow x + \frac{\Delta x_{k^{\ast}}}{\beta_{k^{\ast}}} b \end{align*}\]

Code Example

Let’s work through example 11.1. Here, an experimenter is trying to optimize yield as a function of reaction time and temperature. The current time / temperature setting is at 35 minutes and 155 degrees. A \(2^{2}\) factorial is run with four center points. Let’s visualize the data. Note that in our plot, we use position = position_jitter(w = 0.5, h = 0.5). This prevents the few points sampled at the centerpoint from overlapping one another.

library(readr)
library(dplyr)
chem <- read_csv("https://uwmadison.box.com/shared/static/4x9v5wtgu8w8i2kuzjdhtlhkj1tda701.csv") %>%
  group_split(seq) # split main from follow-up experiments

library(ggplot2)
p <- ggplot(chem[[1]]) +
  geom_point(
    aes(time, temp, col = yield),
    position = position_jitter(w = 0.5, h = 0.5),
    size = 3
  ) +
  coord_fixed() +
  scale_color_viridis_c()
p

Figure 2: \(2^2\) factorial experiment around current operating conditions.

Let’s code the data – this is a bit more involved than in factorial experiments, because the data don’t simply lie on the corners of a cube. We’ll use the coded.data function from the rsm package to handle this.

library(rsm)
chem_coded <- coded.data(chem[[1]], time_coded ~ (time - 35) / 5, temp_coded ~ (temp - 155) / 5)

Now we can fit a first-order (FO) model to these data.

fit <- rsm(yield ~ FO(time_coded , temp_coded), data = chem_coded)
summary(fit)


Call:
rsm(formula = yield ~ FO(time_coded, temp_coded), data = chem_coded)

             Estimate Std. Error  t value  Pr(>|t|)    
(Intercept) 40.444444   0.057288 705.9869 5.451e-16 ***
time_coded   0.775000   0.085932   9.0188  0.000104 ***
temp_coded   0.325000   0.085932   3.7821  0.009158 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared:  0.941, Adjusted R-squared:  0.9213 
F-statistic: 47.82 on 2 and 6 DF,  p-value: 0.0002057

Analysis of Variance Table

Response: yield
                           Df  Sum Sq Mean Sq F value    Pr(>F)
FO(time_coded, temp_coded)  2 2.82500 1.41250 47.8213 0.0002057
Residuals                   6 0.17722 0.02954                  
Lack of fit                 2 0.00522 0.00261  0.0607 0.9419341
Pure error                  4 0.17200 0.04300                  

Direction of steepest ascent (at radius 1):
time_coded temp_coded 
 0.9221944  0.3867267 

Corresponding increment in original units:
    time     temp 
4.610972 1.933633

The output looks almost exactly the same as the output for lm, except it also includes the direction of steepest ascent.

From visual inspection, and based on this model’s \(R^{2}\), the linear fit seems quite good. But for reference, we can compare with a model with both two-way interactions (TWI) and a quadratic component (PQ). Neither of these terms seem to improve fit, so we’ll stick to the original first-order model. Compare the ANVOA with table 11.2.

fit_iq <- rsm(yield ~ FO(time_coded, temp_coded) + TWI(time_coded , temp_coded) + PQ(time_coded, temp_coded), data = chem_coded)
summary(aov(fit_iq))

                            Df Sum Sq Mean Sq F value  Pr(>F)   
FO(time_coded, temp_coded)   2 2.8250  1.4125  32.849 0.00329 **
TWI(time_coded, temp_coded)  1 0.0025  0.0025   0.058 0.82132   
PQ(time_coded, temp_coded)   1 0.0027  0.0027   0.063 0.81374   
Residuals                    4 0.1720  0.0430                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ascent <- steepest(fit, dist = seq(0, 2, 0.4))

Path of steepest ascent from ridge analysis:

ascent_long <- steepest(fit, dist = seq(0, 10, 1))

Path of steepest ascent from ridge analysis:

p + 
  geom_point(
    data = ascent %>% select(-`|`),
    aes(time, temp, col = yhat),
    shape = 2 
  )

Figure 3: A short extrapolation along the direction of steepest ascent.

The experimenter follows this path and finds that at each point, the yield does increase, up to a time of 85 and temperature of 175. At this new point, it seems worth running a second factorial experiment.

p + 
  geom_point(
    data = ascent_long %>% select(-`|`),
    aes(time, temp, col = yhat),
    shape = 2 
  ) +
  geom_point(
    aes(time, temp, col = yield),
    position = position_jitter(w = 0.4, h = 0.4),
    size = 3
  ) +
  geom_point(
    data = chem[[2]],
    aes(time, temp, col = yield),
    position = position_jitter(w = 0.4, h = 0.4),
    size = 3
  )

A longer extrapolation, up to the point where the experimenter noticed a decrease in yield, along with the follow-up experiment at that location.

Figure 4: A longer extrapolation, up to the point where the experimenter noticed a decrease in yield, along with the follow-up experiment at that location.

The second element of chem contains the second experiment after following the path suggested by the method of steepest ascent.

ggplot(chem[[2]]) +
  geom_point(
    aes(time, temp, col = yield),
    position = position_jitter(w = 0.4, h = 0.4),
    size = 3
  ) +
  coord_fixed() +
  scale_color_viridis_c()

At this point, it seems like we might be at a local maximum. We will likely want to perform more intensive sampling – more than just a \(2^2\) design with center points – in order to more precisely localize the optimum. For that, we’ll need to understand second-order methods and canonical analysis.