Day 18 in math modeling

  1. Homework:

  2. Chapter 4 - Empirical models (continued)

    Begin with Exercise #3, p. 226:


    1. R2: a measure of fit
      • See figure 4.5, p. 158
      • R2 represents the amount of variation explained by the model against a constant model (using the mean: y=ybar). Recall this model from last time in several plots (the shotgun blast, the horizontal line).
      • "Some books call R2 the coefficient of determination" (p. 159)
      • Hint for Exercise #2, p. 226 (based on "trick" from last time)

    2. Finding R2 in the non-linear case
      • same as R2 in the linear case!

    3. Example - the X-files (using results from Mathematica)
      • Typical use of regression: predict year two's results from year one's results
      • Discussion of the "failures" of the process
        • Using the model of the first season with the data of the second season results in a higher R2! Linear regression minimizes the Residual Sum of Squares, rather than maximizing R2.
        • Season two would have done a woeful job of predicting year one:

    4. Curvilinear models
      • Catalog of functions - standard classes:
        • ladder of powers
        • exponential and log
        • trigonometric (periodic) functions
      • Intrinsically linear models - parameters are linear
      • Linearizable models - parameters enter in a non-linear way, but data may be transformed to yield linear procedure

    5. Example: Cost of advertising. Comparison of various models:

      • linear model
      • log-linear (i.e. exponential linearizable model)
      • log-log model (i.e. power linearizable model)
      • If ever there were a data set crying out to be modeled as a logistic, this is the one! Notice that nice inflection point. The logistic, however, is not linearizable, as we see in the bear data example.

    6. Examples:

      • Lynx data:

        • Long term trend: horizontal asymptote (perhaps zero!)
        • Decaying oscillations (some periodicity)
        • Model: exponential decay to steady value (1,934 here), with decaying oscillations of ten-year period.

      • Bear data - intrinsically non-linearizable model: the logistic

          • long term trend: horizontal asymptote
          • concave down? Increasing with decreasing slope....
          • positive y-intercept
          • model: logistic

        • "With nonlinear problems such as the logistic, choosing good [initial parameter] guesses is essential." (p. 183)


    Some Questions:

    1. What is the range for which the model might remain valid? How far can you trust this model (how far can you throw this model)?
    2. When is complexifying worthwhile? How do I decide whether R2 is okay?

Website maintained by Andy Long. Comments appreciated.