Day 17 in math modeling

Homework:
- Exercises and Project 3.2 from Chapter 3 due next Monday, 3/26
- Exercises p. 226, #1, 2, 3
Something fun (potential project): coaches as hermit crabs....
Problem with life tables problem: p. 134: d_x = l_x-l_x+1
Exam 1
- Everyone did pretty well (different people missed different things).
- confusion between normal and uniform distributions
- key on the web
Chapter 4 - Empirical models
Empirical models are the anti-thesis of theoretical models. If you have information about the process being modeled, then a carefully constructed theoretical model should be in close agreement with the "empirical data". That is, if we just collect data, then we'll find the results in good agreement with the theoretical predictions of the model. Theoretical models are generally considered "better" than empirical (the latter are somewhat "dirty").
We'll start with data and create a model. Since nature involves some stochasticity, we expect our data to be noisy - nature always seems to be trying to conceal the deterministic aspects of relationships between variables. True to the modeling process, we're going to start simple, and only complexify as needed: KISS - Keep It Simple Stupid! For that reason, we'll begin today with the linear model.

Sections:
1. Covariance and correlation
  - Interested in the dependence of y on x.
  - Is there something systematic about the variation of x and y? Do they vary together?
  - Covariance of a variable with itself is variance.
  - Correlation standardizes covariance.
  - Correlation does not imply causation! (And we're usually more interested in causation.)
2. Fitting a line - using calculus and least squares criterion
  - Derivation: (multi-variate) calculus solution
  - Relationship between slope and covariance.
3. R²: a measure of fit
  - See figure 4.5, p. 158
  - R² represents the amount of variation explained by model against a constant model (using the mean as a base model)
  - "Some books call R² the coefficient of determination" (p. 159)
  - Exercise #2, p. 226
4. Finding R² in the non-linear case
  - same as R² in the linear case!
5. Example - the X-files (using results from Mathematica)
  - Typical use of regression: predict year two's results from year one's results
  - Discussion of the "failures" of the process
    - R² lowered with more data?! (i.e., incorporating season 2 we'd expect to do better....)
    - season two would have done a woeful job of predicting year one....
Note:
In the end, the result is to be a model. The question then becomes one of interpretability: can the model that results explain? Or is it simply a way of predicting values that are close to the actual, without any sort of knowledge being gained? We certainly hope that knowledge increases, based on the model that results.
Some Questions:
1. What is the range for which the model might remain valid? How far can you trust this model (how far can you throw this model)?
2. When is complexifying worthwhile? How do I decide whether R² is okay?

Website maintained by Andy Long. Comments appreciated.