# 6 Time series

In this chapter, we’ll use the tidyverse package and the following data.

``````# Libraries
library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.2
#> Warning: package 'stringr' was built under R version 3.5.2

# Data
# Gapminder data with life expectancy, per capita GDP, and population for
# 183 countries; every five years from 1950-2015
# Detailed information on Gapminder countries
# Gapminder life expectancy data for all years, 1800-2100
life_expectancy <-
left_join(
countries %>% select(iso_a3, region = region_gm4),
by = "iso_a3"
) %>%
filter(year <= 2015) %>%
mutate(region = str_to_title(region))
# Major famines by year and country

In the last chapter, we visualized the relationship between per capita GDP and life expectancy. You might have wondered how time fits into that association. In this chapter, we’ll explore life expectancy and GDP over time.

The following ggplot2 cheat sheet sections will be helpful for this chapter:

• Geoms
• `geom_path()`
• Scales

The lubridate package is a helpful tool for working with dates. We’ll use some lubridate functions throughout the chapter. Take a look at the lubridate cheat sheet if you’re not already familiar with the package.

Not all time series are alike. In some situations, you’ll be interested in a long-term trend, but in others you’ll want to highlight short-term changes or even just individual values. In this chapter, we’ll cover various strategies for dealing with these different scenarios.

First, we’ll talk about the mechanics of date scales, which are useful for time series.

## 6.1 Mechanics

### 6.1.1 Date/time scales

Sometimes, your time series data will include detailed date or time information stored as a date, time, or date-time. For example, the `nycflights13::flights` variable `time_hour` is a date-time.

``````nycflights13::flights %>%
select(time_hour)
#> # A tibble: 336,776 x 1
#>   time_hour
#>   <dttm>
#> 1 2013-01-01 05:00:00
#> 2 2013-01-01 05:00:00
#> 3 2013-01-01 05:00:00
#> 4 2013-01-01 05:00:00
#> 5 2013-01-01 06:00:00
#> 6 2013-01-01 05:00:00
#> # … with 3.368e+05 more rows``````

When you map `time_hour` to an aesthetic, ggplot2 uses `scale_*_datetime()`, the scale function for date-times. There is also `scale_*_date()` for dates and `scale_*_time()` for times. The date- and time-specific scale functions are useful because they create meaningful breaks and labels.

`flights_0101_0102` contains data on the number of flights per hour on January 1st and January 2nd, 2013.

``````flights_0101_0102 <-
nycflights13::flights %>%
filter(month == 1, day <= 2) %>%
group_by(time_hour = lubridate::floor_date(time_hour, "hour")) %>%
summarize(num_flights = n())

flights_0101_0102
#> # A tibble: 38 x 2
#>   time_hour           num_flights
#>   <dttm>                    <int>
#> 1 2013-01-01 05:00:00           6
#> 2 2013-01-01 06:00:00          52
#> 3 2013-01-01 07:00:00          49
#> 4 2013-01-01 08:00:00          58
#> 5 2013-01-01 09:00:00          56
#> 6 2013-01-01 10:00:00          39
#> # … with 32 more rows``````
``````flights_0101_0102 %>%
ggplot(aes(time_hour, num_flights)) +
geom_col()`````` Just like with the other scale functions, you can change the breaks using the `breaks` argument. `scale_*_date()` and `scale_*_datetime()` also include a `date_breaks` argument that allows you to supply the breaks in date-time units, like “1 month”, “6 years”, or “2 hours.”

``````flights_0101_0102 %>%
ggplot(aes(time_hour, num_flights)) +
geom_col() +
scale_x_datetime(date_breaks = "6 hours") +
theme(axis.text.x = element_text(angle = -45, hjust = 0))`````` Similarly, you can change the labels using the `labels` argument, but `scale_*_date()` and `scale_*_datetime()` also include a `date_labels` function made for working with dates. `date_labels` takes the same formatting strings as functions like `ymd()` and `as_datetime()`. You can see a list of all formatting strings at `?strptime`.

We’ll use `date_labels` to format `time_hour` so that it doesn’t take up as much space.

``````flights_0101_0102 %>%
ggplot(aes(time_hour, num_flights)) +
geom_col() +
scale_x_datetime(date_breaks = "6 hours", date_labels = "%a %I %p") `````` ## 6.3 Short-term fluctuations

In the mechanics section of this chapter, you saw the following plot.

``````flights_0101_0102 %>%
ggplot(aes(time_hour, num_flights)) +
geom_col() +
scale_x_datetime(date_breaks = "6 hours", date_labels = "%a %I %p") `````` You might wonder why we used `geom_col()` to represent a time series. Here’s the same plot using `geom_line()` and `geom_point()`.

``````flights_0101_0102 %>%
ggplot(aes(time_hour, num_flights)) +
geom_line() +
geom_point() +
scale_x_datetime(date_breaks = "6 hours", date_labels = "%a %I %p")`````` From both plots, you can see that most flights occur in the early morning and around 4pm, but notice that we’re actually treating time like a discrete variable in this situation. We’ve counted the number of flights for each hour, and so it’s useful to be able to connect a number of flights with a specific hour. Columns make it easier to connect numbers of flights to specific hours.

Vertical segment plots using `geom_segment()` can also be helpful for some time series data. Say we want to understand what the first week in January looked like.

``````flights_week_1 <-
nycflights13::flights %>%
filter(lubridate::week(time_hour) == 1) %>%
group_by(time_hour = lubridate::floor_date(time_hour, "hour")) %>%
summarize(num_flights = n())``````

`geom_point()` and `geom_line()` produce the following plot.

``````flights_week_1 %>%
ggplot(aes(time_hour, y = num_flights)) +
geom_line() +
geom_point() +
scale_x_datetime(date_breaks = "1 day", date_labels = "%a") `````` You can see that each day is shaped similarly. However, you can’t tell that there are actually no flights for a couple hours each night.

``````flights_week_1 %>%
ggplot() +
geom_segment(
aes(x = time_hour, xend = time_hour, y = num_flights, yend = 0)
) +
scale_x_datetime(date_breaks = "1 day", date_labels = "%a") `````` `geom_segment()` does a better job of showing the gaps between days. Segments also make it easier to perceive each day as a group to compare against the others. Another advantage of `geom_segment()` is that we can use `color` to encode a categorical variable.

``````flights_week_1 %>%
mutate(am_pm = if_else(lubridate::am(time_hour), "AM", "PM")) %>%
ggplot() +
geom_segment(
aes(
x = time_hour,
xend = time_hour,
y = num_flights,
yend = 0,
color = am_pm
)
) +
scale_x_datetime(date_breaks = "1 day", date_labels = "%a") `````` In this case, there’s no long-term trend we’re interested in. Instead, we want to understand short-term fluctuations, and we care about individual values. In these situations, `geom_col()` and `geom_segment()` are good options.

## 6.4 Individual values

Sometimes, you’ll want to display time on the x-axis like a time series, but you won’t actually care about displaying any kind of trend.

`famines` contains data on major famines across time.

``````famines
#> # A tibble: 77 x 6
#>   name       iso_a3 region start   end deaths_estimate
#>   <chr>      <chr>  <chr>  <dbl> <dbl>           <dbl>
#> 1 Ireland    irl    Europe  1846  1852         1000000
#> 2 India      ind    Asia    1860  1861         2000000
#> 3 Cape Verde cpv    Africa  1863  1867           30000
#> 4 India      ind    Asia    1866  1867          961043
#> 5 Finland    fin    Europe  1868  1868          100000
#> 6 India      ind    Asia    1868  1870         1500000
#> # … with 71 more rows``````

There’s no obvious relationship between time and deaths due to famines.

``````famines %>%
ggplot(aes(start, deaths_estimate)) +
geom_point() +
scale_y_log10()`````` Even though there’s no trend, this data is still interesting if you’re curious about individual famines.

The above plot only uses the `start` date, but we also have the length of the famines. We can treat the x-axis as representing year generally and encode the length of a line as the length of the famine.

``````famines %>%
arrange(desc(deaths_estimate)) %>%
ggplot(aes(start, deaths_estimate)) +
geom_segment(
aes(xend = end, yend = deaths_estimate, color = region),
size = 2,
lineend = "round"
) +
ggrepel::geom_text_repel(aes(label = name), size = 2.3) +
scale_y_log10() +
labs(x = "year")`````` 