In this report, we will be using linear regression to determine if the city of Mango demonstrates significant increases or decreases in temperature over time. To begin, we read in the temperature data files and stored them in two seperate varaibles, MangoMax for the maximum temperatures and MangoMin for the minimum temperatures. The first half of this report, we will discuss the maximum temperatures and the results of our linear regression and residuals. The second half will focus on the minimum temperatures. The first graph in this report, is a simple scatter plot of the anual maximum temperatures.


Maximum Temperatures

ggplot(MangoMax, aes(MangoMax$ANNEES, MangoMax$MOY)) + geom_point() + labs (x = "Years", y = "Temperature") + ggtitle("Mango Maximum Temperatures Scatterplot")


From just looking at the points, the graph seems to be fairly consistant, with maybe a slight downward trend. There is one clear outlier of ~29 degrees in the year 1995. Other than that point, all other temperatures seem to fluctate between 32.5 and 35 degrees celsius, with a mean of 33.6. To investigate the possible downward trend further, we ran a linear regression and added it to our scatter plot.


Add Linear Regression

ggplot(MangoMax, aes(MangoMax$ANNEES, MangoMax$MOY))+ geom_point()+ labs (x = "Years", y = "Temperature") + ggtitle("Mango Maximum Temperatures with Linear Regression") + geom_smooth(method = "lm", se= FALSE)

fit <-lm(MangoMax$MOY ~ MangoMax$ANNEES, data=MangoMax)
stargazer(fit, type = "html", header = FALSE, title = "Maximum Temperature Linear Regression Analysis")
Maximum Temperature Linear Regression Analysis
Dependent variable:
MOY
ANNEES -0.014**
(0.006)
Constant 61.628***
(12.645)
Observations 55
R2 0.085
Adjusted R2 0.068
Residual Std. Error 0.749 (df = 53)
F Statistic 4.917** (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01

The r-squared value from the linear regression is only .0849.Since this is so low, we are not confident in this model. We graphed the residuals in the following plot, just to see the variations in the points. At this point, it is clear that linear regression is not the best option.


Lineaer Residuals

MangoMax$maxpredicted= predict(fit)
MangoMax$maxresiduals = residuals(fit)
View(MangoMax)
ggplot(MangoMax, aes(x= ANNEES, y = MOY))+ geom_smooth(method = "lm", se = FALSE, color = "blue") +geom_point()+
geom_segment(aes(xend = ANNEES, yend = maxpredicted))+xlab("Years") +ylab("Months")+ggtitle("Residuals of Mango Maximum Temperature")


From the summary of the linear regression and further examination of a graph of the residuals, we do not feel comfortable saying that there is evidence that the temperature decreased. Since the r squared value is so low, we do not feel like there is evidence that the coefficent is different than 0. Therefore, we do not believe that the maximum temperature demonstrates a significant decrease. The next graph, is of what we feel is the best model for the maximum temperatures, a horizontal line through the mean.


Horizontal Line

ggplot(MangoMax, aes(x= ANNEES, y = MOY))+ geom_point()+ geom_hline(yintercept= mean(MangoMax$MOY), color = "blue")+ggtitle("Maximum Temperatures with Horizontal Line")+xlab("Years")+ ylab("Month")

Minimum Temperatures

ggplot(MangoMin, aes(MangoMin$ANNEES, MangoMin$MOY))+ geom_point()+ labs (x = "Years", y = "Temperature") + ggtitle("Mango Minimum Temperatures")


Just by looking at the minimum temperature points, it is clear that there is sometype of upward slope. To further investigate, we added a linear regression line on the scatterplot.


Add Linear Regression

ggplot(MangoMin, aes(MangoMin$ANNEES, MangoMin$MOY))+ geom_point()+ labs (x = "Years", y = "Temperature") + ggtitle("Minimum Temp With Linear") + geom_smooth(method = "lm", se=FALSE)

linearmin= lm( MOY ~ ANNEES, data =MangoMin)
stargazer(linearmin, type = 'html', header = FALSE, title= "Minimum Linear Regression Analysis", single.row = TRUE, column.sep.width = "1pt")
Minimum Linear Regression Analysis
Dependent variable:
MOY
ANNEES 0.047*** (0.004)
Constant -70.805*** (8.252)
Observations 55
R2 0.706
Adjusted R2 0.700
Residual Std. Error 0.489 (df = 53)
F Statistic 127.043*** (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01

From the summary of the linear regression, we can see that our r-squared value is about .70. The residuals from the linear regression seem to fall evenly above and below the regression line, but not that many are right on the line. While the linear regression is good, we would like to see if a quadratic model would fit better.


Linear Regression Residuals

MangoMin$linpredicted= predict(linearmin)
MangoMin$linresiduals = residuals(linearmin)
View(MangoMin)
ggplot(MangoMin, aes(x= ANNEES, y = MOY))+ stat_smooth(method = "lm", se = FALSE) +geom_point()+
geom_segment(aes(xend = ANNEES, yend = linpredicted)) +xlab("Years") +ylab("Temperature")+ggtitle("Residuals of Linear Regression Mango Minimum Temperature")

Quadratic Regression

ANNEES2 = MangoMin$ANNEES^2
quadraticmin = lm(MOY ~ ANNEES + ANNEES2, data= MangoMin)
ggplot(MangoMin, aes(x= ANNEES, y = MOY))+ stat_smooth(method = "lm", formula = y~x +I(x^2), se = FALSE) +geom_point() + ylab("Temperatures")+ xlab("Years")+ ggtitle("Qudratic Regression Minimum")

stargazer(quadraticmin, type = 'html', header = FALSE, title= "Min Temp Quadratic Regression Analysis",single.row = TRUE, column.sep.width = "1pt")
Min Temp Quadratic Regression Analysis
Dependent variable:
MOY
ANNEES 3.701*** (1.059)
ANNEES2 -0.001*** (0.0003)
Constant -3,702.762*** (1,052.703)
Observations 55
R2 0.760
Adjusted R2 0.751
Residual Std. Error 0.445 (df = 52)
F Statistic 82.542*** (df = 2; 52)
Note: p<0.1; p<0.05; p<0.01

In the quadratic model, the r-squared value did improve to .76, so we feel better about this model. We believe it is the best model so far. To be sure, we graphed the residuals in the following plot.


Quadratic Regression Residuals

MangoMin$quadpredicted= predict(quadraticmin)
MangoMin$quadresiduals = residuals(quadraticmin)
View(MangoMin)
ggplot(MangoMin, aes(x= ANNEES, y = MOY))+ stat_smooth(method = "lm", formula = y~x +I(x^2), se = FALSE) +geom_point()+
geom_segment(aes(xend = ANNEES, yend = quadpredicted))+xlab("Years") +ylab("Temperature")+ggtitle("Residuals of Quadratic Min Temp")

Conclusion


Based on the data we were given, we beielve that there is enough evidence to conclude there is a significant increase in the minimum temperatures, but there is no significant difference in the maximum tempertatues. While the maximum temperature seemed at first to have a negative slope, upon further examiniation and an r squared value of .07, we have determined the best model for the maximum temperature is a horizontal line through the mean. For the minimum temperatures, we tried both a linear regression and a quadratic. The linear gave us an r squared of .70, but the quadratic gave us an improved r squared of .76. It if for this reason that we believe that our current best model for the City of Mango’s minimum temperatures is the quadratic.


Additional Information Moving Forward


First, since there was an outlier of ~29 Maximum temperature in 1995, we would like to know if this was a mistake, or if something odd happened that year. The point is 4 degrees below the mean and seems out of place. Is this human error? Or, was 1995 just a very cold year for Mango. Next, we think that it would be very helpful to have the temperature data broken down the averages of the months or seasons. It would be interesting to look and see is specific months are increasing more than others. Or if the winters are getting warmer, but the summers are staying the same. It would also be nice to have some geographic data, like sea level or miles from the coasst. Lastly, we would like to know more about the data collection. Was it all collected by the same person? Were the temperatures collected at the exact same geographic location every time?