Check out my webpage
Downloading data from Davis is somewhat straightforward. All you have to do is select the period of time you want data for and then provide your email in the WeatherLink data tab. Depending on who is reading this, the data may already be downloaded for you.
While downloading the data is easy, working with it can be a bit tricky. Here I provide one of many ways in which one can crack open a Davis CSV file, as well as some initial analysis/plotting that you can do.
#first step is to load some of the basic R packages you need
This step will depend on how you have gotten the data.I download data on a monthly basis and store it on my desktop. For the code below I stored the CSV file online via Github. For some of you this may be especially useful, as many of your educators are likely using Github or similar. Be sure to read the comments on the code below as they may help you with troubleshooting. Now, lets open a CSV for data from December of 2022.
#Im using read_csv which is part of base R. Note that you have to change the pathname for the file in the quotation marks. If you aren't sure how to get a file pathname for something on your computer, it is different depending on your operating system so Google is your friend.
Dec_df<- read_csv("", skip = 5, locale=locale(encoding="latin1"))
Note that above I added two extra commands to the code. The command “skip” is necessary because Davis fills the first 5 lines of their CSV files with a sort of metadata for the station. This is pretty annoying but can be easily rectified by telling R to skip the first 5 lines. The other command is “locale”. This is necessary because R does not fully understand the character encoding of the Davis file, thus we have to instruct it that we are using Latin characters.
Now that we have a the CSV file opened in R, there is one more issue we need to rectify. The data and time the data was recorded is stored in a single column. This makes mathematics and graphing more difficult than it needs to be. So before we can get to the fun stuff we should separate the “Date & Time” column into 4 separate columns. We also need to run some extra commands to make sure the columns are in the right format.
Dec_df<- Dec_df %>%
separate("Date & Time", sep="/", into = c("month", "day", "year")) %>%
separate(year, sep=" ", into = c("year","time"))
Dec_df$month<- as.integer(Dec_df$month) #The individual columns have been saved as characters instead of numbers. We rectify that by using the as.integer command
Dec_df$day<- as.integer(Dec_df$day)
Dec_df$month<- as.integer((Dec_df$month))
Dec_df$year<- as.integer(Dec_df$year)+2000
Dec_df$time<-gsub("\\:", ".", Dec_df$time) #For the sake of graphing I often make my time a decimal number instead of a colon
Dec_df$time<- as.integer(Dec_df$time)
## # A tibble: 8,912 × 36
## month day year time Barometer -…¹ Insid…² Insid…³ Insid…⁴ Insid…⁵ Insid…⁶
## <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 12 1 2022 12 1013. 72 36 44 70 7.2
## 2 12 1 2022 12 1013. 72 36 44 70 7.2
## 3 12 1 2022 12 1013. 72 36 43 70 7.2
## 4 12 1 2022 12 1013. 72 36 43 70 7.2
## 5 12 1 2022 12 1013. 72 36 43 70 7.2
## 6 12 1 2022 12 1013 72 36 43 70 7.2
## 7 12 1 2022 12 1013. 72 36 43 70 7.2
## 8 12 1 2022 12 1013. 72 35 43 70 7
## 9 12 1 2022 12 1013. 72 35 43 70 7
## 10 12 1 2022 12 1013. 72 35 43 70 7
## # … with 8,902 more rows, 26 more variables: `Temp - °F` <dbl>,
## # `High Temp - °F` <dbl>, `Low Temp - °F` <dbl>, `Hum - %` <dbl>,
## # `Dew Point - °F` <dbl>, `Wet Bulb - °F` <dbl>, `Wind Speed - mph` <dbl>,
## # `Wind Direction` <chr>, `Wind Run - mi` <dbl>,
## # `High Wind Speed - mph` <dbl>, `High Wind Direction` <chr>,
## # `Wind Chill - °F` <dbl>, `Heat Index - °F` <dbl>, `THW Index - °F` <dbl>,
## # `THSW Index - °F` <dbl>, `Rain - in` <dbl>, `Rain Rate - in/h` <dbl>, …
Ok, now you may be looking at this and wondering what happens if you want to look at more than just the data from December. This is possible and quite easy to do.
First you need to open up and format another CSV file, for the sake of this I will do data from my station in November.
Nov_df<- read_csv("", skip = 5, locale=locale(encoding="latin1"))
## Rows: 8912 Columns: 33
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date & Time, Wind Direction, High Wind Direction
## dbl (30): Barometer - mb, Inside Temp - °F, Inside Hum - %, Inside Dew Point...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Nov_df<- Nov_df %>%
separate("Date & Time", sep="/", into = c("month", "day", "year")) %>%
separate(year, sep=" ", into = c("year","time"))
Nov_df$day<- as.integer(Nov_df$day)
Nov_df$month<- as.integer(Nov_df$month)
Nov_df$time<-gsub("\\:", ".", Nov_df$time)
Nov_df$time<- as.integer(Nov_df$time)
## # A tibble: 8,912 × 36
## month day year time Barometer -…¹ Insid…² Insid…³ Insid…⁴ Insid…⁵ Insid…⁶
## <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 12 1 2022 12 1013. 72 36 44 70 7.2
## 2 12 1 2022 12 1013. 72 36 44 70 7.2
## 3 12 1 2022 12 1013. 72 36 43 70 7.2
## 4 12 1 2022 12 1013. 72 36 43 70 7.2
## 5 12 1 2022 12 1013. 72 36 43 70 7.2
## 6 12 1 2022 12 1013 72 36 43 70 7.2
## 7 12 1 2022 12 1013. 72 36 43 70 7.2
## 8 12 1 2022 12 1013. 72 35 43 70 7
## 9 12 1 2022 12 1013. 72 35 43 70 7
## 10 12 1 2022 12 1013. 72 35 43 70 7
## # … with 8,902 more rows, 26 more variables: `Temp - °F` <dbl>,
## # `High Temp - °F` <dbl>, `Low Temp - °F` <dbl>, `Hum - %` <dbl>,
## # `Dew Point - °F` <dbl>, `Wet Bulb - °F` <dbl>, `Wind Speed - mph` <dbl>,
## # `Wind Direction` <chr>, `Wind Run - mi` <dbl>,
## # `High Wind Speed - mph` <dbl>, `High Wind Direction` <chr>,
## # `Wind Chill - °F` <dbl>, `Heat Index - °F` <dbl>, `THW Index - °F` <dbl>,
## # `THSW Index - °F` <dbl>, `Rain - in` <dbl>, `Rain Rate - in/h` <dbl>, …
Now all we need to do is run the “full_join” function, which will join two datasets by the column names that they have in common.
Combined_df<- full_join(Dec_df, Nov_df)
## # A tibble: 9,604 × 36
## month day year time Barometer -…¹ Insid…² Insid…³ Insid…⁴ Insid…⁵ Insid…⁶
## <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 12 1 2022 12 1013. 72 36 44 70 7.2
## 2 12 1 2022 12 1013. 72 36 44 70 7.2
## 3 12 1 2022 12 1013. 72 36 43 70 7.2
## 4 12 1 2022 12 1013. 72 36 43 70 7.2
## 5 12 1 2022 12 1013. 72 36 43 70 7.2
## 6 12 1 2022 12 1013 72 36 43 70 7.2
## 7 12 1 2022 12 1013. 72 36 43 70 7.2
## 8 12 1 2022 12 1013. 72 35 43 70 7
## 9 12 1 2022 12 1013. 72 35 43 70 7
## 10 12 1 2022 12 1013. 72 35 43 70 7
## # … with 9,594 more rows, 26 more variables: `Temp - °F` <dbl>,
## # `High Temp - °F` <dbl>, `Low Temp - °F` <dbl>, `Hum - %` <dbl>,
## # `Dew Point - °F` <dbl>, `Wet Bulb - °F` <dbl>, `Wind Speed - mph` <dbl>,
## # `Wind Direction` <chr>, `Wind Run - mi` <dbl>,
## # `High Wind Speed - mph` <dbl>, `High Wind Direction` <chr>,
## # `Wind Chill - °F` <dbl>, `Heat Index - °F` <dbl>, `THW Index - °F` <dbl>,
## # `THSW Index - °F` <dbl>, `Rain - in` <dbl>, `Rain Rate - in/h` <dbl>, …
Perfect! Note that you can do this for any combination of data sets. WeatherLink allows you to download data on different timescales (yearly, 3 month, 6 months). So you can do this same command if you wanted to string together yearly or multi-monthly datasests. This can also be used an effectively infinite amount of times, so you can keep adding months as they go by. The only limitation you will run into is the size of the dataset.
The final useful thing to do is rename some of these columns. R tends to have issues with spaces and miscellaneous characters in names. It will reduce issues and make coding quicker if we rename some important columns now. For the sake of this example I will not do all of the columns, but note that for the best results you should. Also note that if you rename columns and want to add another data set to this with full_join , you have to either revert these names back to the original or rename the columns in the new dataset the exact same way.
Combined_df<- Combined_df %>% rename(Temp = "Temp - °F", HighTemp = "High Temp - °F", LowTemp = "Low Temp - °F", Hum = "Hum - %", WindSpeed = "Wind Speed - mph", WindDir = "Wind Direction", SolarRad = "Solar Rad - W/m^2")
Now that we have the data frame set up, the possibilities are endless. For the sake of a brief introduction im just going to show 3 main types of visualizations that we could do.
Lets start off by making a simple graph of the daily mean temperatures for Dec of 2022. First we have to do some calculations and create a data frame of daily temperatures in each month.
Combined_md<- Combined_df %>%
group_by(month, day) %>%
summarize(MeanT = mean(Temp))
Now we just plot that data (making sure to only select December) and take a look.
Combined_md %>%
filter(month == 12) %>%
ggplot() + geom_line(aes(x = day, y = MeanT)) +
theme_linedraw() +
xlab("Day") +
ylab("Temperature (°F)")
Interesting values for sure. The sudden drop off around Christmas may seem like an error but it isn’t. That was the anomalously cold period that most of the continental United States felt at the end of 2022.
We can also add other variables to this.
Combined_md<- Combined_df %>%
group_by(month, day) %>%
summarize(MeanT = mean(Temp), MeanSR = mean(SolarRad), MeanHum = mean(Hum))
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
Combined_md %>%
filter(month == 11) %>%
ggplot() + geom_line(aes(x = day, y = MeanT), color = "red") +
geom_line(aes(x = day, y = MeanSR), color = "black") +
scale_y_continuous(name = "Temperature (°F)", sec.axis = sec_axis(~ ., name= "Solar Radiation (W/m2)"))+
theme_linedraw() +
Now we are looking at the mean daily temperature (red) and the mean daily solar radiation (black). We can see periods of time that they are closely correlated and others where they aren’t. Note that there are a number of ways these things can be plotted up. Some prefer to create separate plots and combine them into one. The beauty of coding is that there is seldom one way to do something, so if you find a better or more intuitive approach use it!
Wind rose diagrams are probably one of the most famous weather charts that exist. There are a few ways to make them in R but im going to focus on the one I know best. This requires that we download a new package. This package was designed for plotting air quality data so some of the commands are a bit different.
Before we make the diagram, we have to deal with the rather annoying fact that the WeatherLink csv files report wind direction as compass units (NW, WNW, SSE) instead of degrees (310, 330, 150). One of the ways around this is by converting the analog direction to rough estimations of the numerical direction. This reduces some accuracy but still allows for the plotting of the data. Another option is to use a different R package, other coding language, or even an online visualizer.
#All we are going to do for this is filter out NA values and then run a case_when function. This will look at the Wind Direction column and make a new column with associated degree values we tell it.
Combined_df_slim<- Combined_df %>%
filter(WindDir != "NA")
Combined_df_slim$WD<- case_when(Combined_df_slim$WindDir == "N" ~ 360,
Combined_df_slim$WindDir == "NNE" ~ 22.5,
Combined_df_slim$WindDir == "NE" ~ 45,
Combined_df_slim$WindDir == "ENE" ~ 67.5,
Combined_df_slim$WindDir == "E" ~ 90,
Combined_df_slim$WindDir == "ESE" ~ 112.5,
Combined_df_slim$WindDir == "SE" ~ 135,
Combined_df_slim$WindDir == "SSE" ~ 157.5,
Combined_df_slim$WindDir == "S" ~ 180,
Combined_df_slim$WindDir == "SSW" ~ 202.5,
Combined_df_slim$WindDir == "SW" ~ 225,
Combined_df_slim$WindDir == "WSW" ~ 247.5,
Combined_df_slim$WindDir == "W" ~ 270,
Combined_df_slim$WindDir == "WNW" ~ 292.5,
Combined_df_slim$WindDir == "NW" ~ 315,
Combined_df_slim$WindDir == "NNW" ~ 337.5 )
windRose(Combined_df_slim, ws = "WindSpeed", wd = "WD", paddle = F)
The end result shows a preference for northerly winds. There are two reasons for this. First is that we are only using approximations of the wind direction, not the true degree values, although this should not effect the graph too much. The second is the fact that we are looking at a fairly short (2 month) data set. Finally, my anemometer is not in a good location, there is a row of fairly larger trees about 100 feet south of it. However, when you crack open data from another source you will hopefully find much better data.
The final type of visualization I want to show is the probability density function or PDF. You have likely seen these before. They are also often referred to as bell curves. They represent the distribution and probability of a given variable. In order to do this lets look at some data from the roof of the school.
year_df<- read_csv("", skip = 5, locale=locale(encoding="latin1"))
## Rows: 67879 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (23): Date & Time, Temp - °F, High Temp - °F, Low Temp - °F, Hum - %, De...
## dbl (5): Barometer - mb, High Wind Speed - mph, Rain - in, Rain Rate - in/h...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
year_df<- year_df %>%
separate("Date & Time", sep="/", into = c("month", "day", "year")) %>%
separate(year, sep=" ", into = c("year","time"))
year_df$day<- as.integer(year_df$day)
year_df$month<- as.integer(year_df$month)
year_df$time<-gsub("\\:", ".", year_df$time)
year_df$time<- as.integer(year_df$time)
year_df<- year_df %>% rename(Temp = "Temp - °F",SolarRad = "Solar Rad - W/m^2", Rain = "Rain - in")
year_df$Temp<- as.integer(year_df$Temp)
year_df$Rain<- as.integer(year_df$Rain)
We can make a PDF of essentially any variable but for simplicity sake lets start with daily mean temperatures.
DailyMeans<- year_df %>%
group_by(month, day) %>%
summarize(MeanT = mean(Temp, na.rm = T))
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
DailyMeans %>%
ggplot() + geom_density(aes(x = MeanT), fill = "blue") + theme_classic() + xlab("Mean Daily Temperature ( °F)")
Now we have a PDF of the mean daily temperatures throughout the year at the school. The distribution may be skewed a bit as we had a period where it was not recording in the summer. That being said, we know there are strong seasonal cycles in temperature. Perhaps there is a way to view this but on a monthly timescale.
case_when(DailyMeans$month == 1 ~"January",
DailyMeans$month == 2 ~"February",
DailyMeans$month == 3 ~"March",
DailyMeans$month == 4 ~"April",
DailyMeans$month == 5 ~"May",
DailyMeans$month == 6 ~"June",
DailyMeans$month == 7 ~"July",
DailyMeans$month == 8 ~"August",
DailyMeans$month == 9 ~"September",
DailyMeans$month == 10 ~"October",
DailyMeans$month == 11 ~"November",
DailyMeans$month == 12 ~"December")
DailyMeans$month_name<- factor(DailyMeans$month_name, levels = c("December","November", "October", "September", "August", "July", "June", "May", "April", "March", "February", "January"))
DailyMeans %>%
ggplot(aes(x = MeanT, y =month_name)) + geom_density_ridges() + theme_classic() + xlab("Mean Daily Temperature ( °F)") + ylab("")
## Picking joint bandwidth of 3.11
There you go! One of my favorite graphs to make, there is just something super satisfying about it. You can try it out with other variables that may or may not have seasonal signals depending on where you live. For example: our data here wouldn’t have a strong seasonal precipitation cycle but somewhere in a monsoon region certainly would.
I hope you found this page helpful. Check out my other visualization guide here.