Last time | Next time |
We can simulate Wood County weather, once we have a model for it, and then see if the distribution of extreme years matches up well with Fletcher's distribution. If we use a model with a linear trend over time, then we're saying "climate change"; if we use a model with no linear trend, then we're saying "no climate change".
I also described the process of randomization, which is essential for generating weather. We want to generate 1000 "realizations" of Wood County weather, then see whether Fletcher's extreme year distribution is outlandish (or not), given an assumption of no climate change.
Also, be on the look-out for any outliers. Take a look at the data, to see if there are any funny looking data that we should investigate, too....
Variance of mins for days a given number of "lags"
(days) apart (up to 10). This graph is really just the
"nose" on the graph at right -- we zoom in on the first
few lags, to see how the variance dives down for mins
for days that are within a few days of each
other.
We think of a "lag" as a delay -- it means we're comparing days lagging by 1, by 2, by 3 days from each other, and so on (up to 10 days apart in this graph).
|
Variance for "lags" up to 365 days apart --
i.e. one year.
You may all have heard that you can't trust a weather prediction more than about a week out, and what you're seeing here is a picture of why that is: you can't trust a day 7 days out more than you can a day a year later -- i.e., "what's it usually doing on April 24th?". |
What this tells us is that tomorrow's temps are more similar (variance about 1 for minima one day apart) than those 10 days apart (variance 2.7 or so). | The graph tells us the obvious: there's a lot of variance between temperatures a half year apart (you may have a winter min compared to a summer min); and that, since temperatures are essentially periodic, temperatures about a year later are also very similar. |
So what we notice is that, after about a week, the correlation between two mins a week apart is about the same as the correlation between two mins exactly a year apart. A year apart is really "seasonal correlation" (more "climatic", if you will); the correlation for mins just a few days apart is some weather system moving through.... Which has an effect of no more than about a week. |
Why do you suppose that is?
For one thing, let's look at the distribution of means and standard deviations of these temperatures across the year, over 127 years:
Can you make a good story out of these two graphics -- the histograms above, and the summaries of mins and maxes from below?
What do you think of those? (I hope that you think that they look pretty damned normal.)
In the summer months, the deviation is lower; in the winter the deviation is higher.
So if we're going to simulate a temperature (let's say a max), you might
Empirical PDF -- Probability Density Function -- for August
15th Minima
with a suggested normal-distribution overlay. Maybe the data isn't normal, but that's just to give you the idea. |
We might want to create a theoretical distribution from the empirical data -- otherwise we'd never be able to exceed the extremes of the data (to "create new records").
Empirical CDF -- Cumulative Distribution Function -- for August 15th Minima
with a normal cdf overlay (I just used the mean and the standard deviation of the empirical data to estimate the normal). Maybe the data isn't normal, but this is just to give you the idea. By the way, for the randomin
|
So we're essentially looking at the residuals here -- just with the wrong labels on the x-axis. And these ones look pretty normal.
(from a paper by Haslett, On the Sample Variogram and the Sample Autocovariance for Non-Stationary Time Series). The only difference is that their x-axis is in years! They're seeing that, year to year, there is a some similarity, especially for "adjacent years" -- years not too far apart. The tail, heading off into the upper stratosphere, is actually showing that climate change is occuring, probably linearly (in the past to that point -- 1997).
We find roughly the same thing for our data, especially after having removed the trend (which you should have calculated for homework) for the two time series:
At left: zooming in on the nose of the graph at right. | The "empirical variogram" and the "theoretical variogram for the residuals of the minimum temperature model. |
The graph at left says that for seven day or so (a week) it looks like there really is a stronger correlation between adjacent days' residuals (there's less variance between them) -- after which, as shown in the graph at right -- there doesn't seem to be much going on....
The graph at right says that residuals 180 days apart and 45 days apart are about equally (un)correlated....
The upshot: today's temperature has something to say about the temperature a week from now -- but just barely. Today's temperature has a lot to say about tomorrow's temperature, however....
Notice how nicely the model fits the data (the points) at left. We have modeled what is called the "empirical variogram" (computed from the data) with a "theoretical variogram", which we can then use to generate simulations which will look similar to the original data.
I've used non-linear regression to fit this, but constrained to a class of functions that has certain properties that you'd expect of a variance (the empirical variogram shown is a temporal decomposition of variance -- we've broken the variance down by pairs of points a given "distances in time" apart -- i.e. those 1 day, 2 days, 3 days, etc. apart).
Furthermore, the non-linear regression is weighted: there are more pairs of points 1 day apart than one year apart in the data set -- think about it -- so the variogram shown for "1 day apart pairs" represents a lot more data, so gets more weight.
We can then do this a thousand times, to make a thousand "realizations" of the data -- simulations that look like the data. And from those we can decide if Fletcher's results look realistic, under an assumption of no climate change.
You could ask me questions on Zoom, of course....
Frequently the variogram is used to estimate at unknown locations, as well, through a process called "kriging" -- that is, we could attempt to estimate data where it is missing in the Bowling Green data. There's a lot you can do with a variogram, which is a way of representing data's correlation with itself, either temporally, as in this case, or more often spatially -- the variogram is a really important tool in geology, especially mineral exploration (e.g. mining), where it is used to tell you where to drill for oil next....:(
For that reason, I've often felt a love-hate relationship with what I think of as a beautiful mathematical tool.