Last time | Next time |
That's an interesting exercise, which I carried out using "hash tables" -- one of the coolest data manipulation programming tricks I know.
What if we add a linear term: \[ \alpha + \beta x + amp\ Sin[2\pi (t - phase)] \]
And reminded you that it's not just the values of the parameters, but also the significance of the parameters -- is it possible that they are zero? In which case we can drop that portion of model from consideration....
To evaluate the effectiveness of a model, it might pay to focus on the mean square residual: that seems a fairer metric for evaluating the difference between models (and one that I'd hoped you'd focus on during your exam -- I emphasized it by actually doing that calculation separately, even though it already figured in the tables of regression results).
That's a little draconian -- sometimes we had a max, but no min (or vice versa). It would be better to keep what we have. But in the interest of moving along, I just eliminated anything missing one or the other. That's a fairly conservative strategy....
One of the things you'll notice is that they used a variety of date strings, which might seem really strange -- except that I requested that NOAA give me all the data on Bowling Green from 1700-2020 -- so their server put together several data sets (I don't have the documentation for those, although I've sent an email off to NOAA to see if I can get some). So presumably, the different data sets used different systems.
That was one pain in assembling the data below -- creating a consistent format.
Note: the first decade is truncated (1890-1900 -- since the data started in August of 1893).
What do you think of this, more modern version, of the Fletcher years?
How well does it jive with our original hypotheses, and what we have come to expect based on other people's work?
Reminder: although these are non-linear models, they can be linearized -- so you can use linear regression by splitting your trig term using a trig identity. Or just do the regression using a non-linear regression routine.
In terms of climate change, the most interesting parameter is the slope of the linear term: is it significantly different from 0?