Check out my webpage

Downloading data from Davis is somewhat straightforward. All you have to do is select the period of time you want data for and then provide your email in the WeatherLink data tab. Depending on who is reading this, the data may already be downloaded for you.

While downloading the data is easy, working with it can be a bit tricky. Here I provide one of many ways in which one can crack open a Davis CSV file, as well as some initial analysis/plotting that you can do.

#first step is to load some of the basic R packages you need
library(dplyr)
library(readr)
library(tidyr)
library(ggplot2)

Calculations and Visualizations

Now that we have the data frame set up, the possibilities are endless. For the sake of a brief introduction im just going to show 3 main types of visualizations that we could do.

Mean Daily Temperatures for a month

Lets start off by making a simple graph of the daily mean temperatures for Dec of 2022. First we have to do some calculations and create a data frame of daily temperatures in each month.

Combined_md<- Combined_df %>% 
  group_by(month, day) %>% 
  summarize(MeanT = mean(Temp)) 

Now we just plot that data (making sure to only select December) and take a look.

Combined_md %>% 
  filter(month == 12) %>% 
ggplot() + geom_line(aes(x = day, y = MeanT)) + 
  theme_linedraw() + 
  xlab("Day") + 
  ylab("Temperature (°F)")

Interesting values for sure. The sudden drop off around Christmas may seem like an error but it isn’t. That was the anomalously cold period that most of the continental United States felt at the end of 2022.

We can also add other variables to this.

Combined_md<- Combined_df %>% 
  group_by(month, day) %>% 
  summarize(MeanT = mean(Temp), MeanSR = mean(SolarRad),  MeanHum = mean(Hum)) 
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
Combined_md %>% 
  filter(month == 11) %>% 
  ggplot() + geom_line(aes(x = day, y = MeanT), color = "red") + 
  geom_line(aes(x = day, y = MeanSR), color = "black")  +
  scale_y_continuous(name = "Temperature (°F)", sec.axis =  sec_axis(~ ., name= "Solar Radiation (W/m2)"))+
  theme_linedraw() + 
  xlab("Day") 

Now we are looking at the mean daily temperature (red) and the mean daily solar radiation (black). We can see periods of time that they are closely correlated and others where they aren’t. Note that there are a number of ways these things can be plotted up. Some prefer to create separate plots and combine them into one. The beauty of coding is that there is seldom one way to do something, so if you find a better or more intuitive approach use it!

Wind Rose Diagrams

Wind rose diagrams are probably one of the most famous weather charts that exist. There are a few ways to make them in R but im going to focus on the one I know best. This requires that we download a new package. This package was designed for plotting air quality data so some of the commands are a bit different.

library(openair)

Before we make the diagram, we have to deal with the rather annoying fact that the WeatherLink csv files report wind direction as compass units (NW, WNW, SSE) instead of degrees (310, 330, 150). One of the ways around this is by converting the analog direction to rough estimations of the numerical direction. This reduces some accuracy but still allows for the plotting of the data. Another option is to use a different R package, other coding language, or even an online visualizer.

#All we are going to do for this is filter out NA values and then run a case_when function. This will look at the Wind Direction column and make a new column with associated degree values we tell it. 
Combined_df_slim<- Combined_df %>% 
  filter(WindDir != "NA") 

Combined_df_slim$WD<- case_when(Combined_df_slim$WindDir == "N" ~ 360, 
                                Combined_df_slim$WindDir == "NNE" ~ 22.5,
                                Combined_df_slim$WindDir == "NE" ~ 45,
                                Combined_df_slim$WindDir == "ENE" ~ 67.5,
                                Combined_df_slim$WindDir == "E" ~ 90,
                                Combined_df_slim$WindDir == "ESE" ~ 112.5,
                                Combined_df_slim$WindDir == "SE" ~ 135, 
                                Combined_df_slim$WindDir == "SSE" ~ 157.5, 
                                Combined_df_slim$WindDir == "S" ~ 180,
                                Combined_df_slim$WindDir == "SSW" ~ 202.5,
                                Combined_df_slim$WindDir == "SW" ~ 225, 
                                Combined_df_slim$WindDir == "WSW" ~ 247.5, 
                                Combined_df_slim$WindDir == "W" ~ 270,
                                Combined_df_slim$WindDir == "WNW" ~ 292.5, 
                                Combined_df_slim$WindDir == "NW" ~ 315, 
                                Combined_df_slim$WindDir == "NNW" ~ 337.5 )
windRose(Combined_df_slim, ws = "WindSpeed", wd = "WD", paddle = F)

The end result shows a preference for northerly winds. There are two reasons for this. First is that we are only using approximations of the wind direction, not the true degree values, although this should not effect the graph too much. The second is the fact that we are looking at a fairly short (2 month) data set. Finally, my anemometer is not in a good location, there is a row of fairly larger trees about 100 feet south of it. However, when you crack open data from another source you will hopefully find much better data.

Density Plots

The final type of visualization I want to show is the probability density function or PDF. You have likely seen these before. They are also often referred to as bell curves. They represent the distribution and probability of a given variable. In order to do this lets look at some data from the roof of the school.

year_df<- read_csv("https://raw.githubusercontent.com/Plummquat/IanPlummer/gh-pages/UC2022.csv", skip = 5, locale=locale(encoding="latin1"))
## Rows: 67879 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (23): Date & Time, Temp - °F, High Temp - °F, Low Temp - °F, Hum - %, De...
## dbl  (5): Barometer - mb, High Wind Speed - mph, Rain - in, Rain Rate - in/h...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
year_df<- year_df %>%
   separate("Date & Time", sep="/", into = c("month", "day", "year")) %>% 
  separate(year, sep=" ", into = c("year","time"))
year_df$day<- as.integer(year_df$day)
year_df$month<- as.integer(year_df$month)
year_df$year<-as.integer(year_df$year)+2000
year_df$time<-gsub("\\:", ".", year_df$time) 
year_df$time<- as.integer(year_df$time)
year_df<- year_df %>% rename(Temp = "Temp - °F",SolarRad = "Solar Rad - W/m^2", Rain = "Rain - in")
year_df$Temp<- as.integer(year_df$Temp)
year_df$Rain<- as.integer(year_df$Rain)

We can make a PDF of essentially any variable but for simplicity sake lets start with daily mean temperatures.

DailyMeans<- year_df %>% 
  group_by(month, day) %>% 
  summarize(MeanT = mean(Temp, na.rm = T))
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
DailyMeans %>% 
ggplot() + geom_density(aes(x = MeanT), fill = "blue") + theme_classic() + xlab("Mean Daily Temperature ( °F)")

Now we have a PDF of the mean daily temperatures throughout the year at the school. The distribution may be skewed a bit as we had a period where it was not recording in the summer. That being said, we know there are strong seasonal cycles in temperature. Perhaps there is a way to view this but on a monthly timescale.

library(ggridges)
DailyMeans$month_name<-
  case_when(DailyMeans$month == 1 ~"January",
            DailyMeans$month == 2 ~"February",
            DailyMeans$month == 3 ~"March",
            DailyMeans$month == 4 ~"April",
            DailyMeans$month == 5 ~"May",
            DailyMeans$month == 6 ~"June",
            DailyMeans$month == 7 ~"July",
            DailyMeans$month == 8 ~"August",
            DailyMeans$month == 9 ~"September",
            DailyMeans$month == 10 ~"October",
            DailyMeans$month == 11 ~"November",
            DailyMeans$month == 12 ~"December")
DailyMeans$month_name<- factor(DailyMeans$month_name, levels = c("December","November", "October", "September", "August", "July", "June", "May", "April", "March", "February", "January"))
DailyMeans %>% 
  ggplot(aes(x = MeanT, y =month_name)) + geom_density_ridges() + theme_classic() + xlab("Mean Daily Temperature ( °F)") + ylab("")
## Picking joint bandwidth of 3.11

There you go! One of my favorite graphs to make, there is just something super satisfying about it. You can try it out with other variables that may or may not have seasonal signals depending on where you live. For example: our data here wouldn’t have a strong seasonal precipitation cycle but somewhere in a monsoon region certainly would.

Final Remarks

I hope you found this page helpful. Check out my other visualization guide here.