R TSplots

From ECLR
Jump to: navigation, search

Introduction

Let's make this clear at the outset. If you just want a quick time-series plot and you have your data in Excel or any other spreadsheet software, then you should stick to Excel. But if you want to manipulate your graph significantly and want to produce a beautiful graph, then you are at the right place.

There are a variety of ways to deal with time-series datasets in R. As discussed in another section of the ECLR webpage we could use the standard ts format. However, there exists anther powerful time-series format, the xts format (you will need the xts package). Here I shall introduce this format and demonstrate how we can use it to create highly customisable time-series plots. To do the latter we shall use the ggplot2 package which is the equivalent of a high-powered Porsche. Here we will only scratch the surface of how to use it.

First, let us load the necessary packages:

setwd("C:/Users/Ralf/Dropbox/ECLR/R/TimeSeriesPlots") 
library(ggplot2)
library(xts)
library(reshape2) # this is required for the melt function below

Loading the data

We shall use an example in which we have annual inflation rates for 13 different countries (1970 to 2013). In fact we have four different types of inflation rates, aggregate, core, energy and food inflation. Each of these are saved in separate csv files. Here we upload the files (you can download the files from here [1]:

 agg 

Let's have a look at what these data look like:

head(agg)
##      X      AUS       CAN       DEN       FIN       FRA      GER       ITA
## 1 1970 4.372826  3.346040  6.514837  2.740442  5.299646 3.450237  4.968320
## 2 1971 4.704262  2.704918  5.869877  6.475761  5.397514 5.240975  4.791667
## 3 1972 6.355141  4.988029  6.562348  6.662487  6.063004 5.484938  5.749503
## 4 1973 7.531081  7.487647  9.303389 10.754480  7.380597 7.032024 10.798620
## 5 1974 9.521791 10.997170 15.275200 16.936390 13.649320 6.986428 19.159770
## 6 1975 8.445255 10.672190  9.605614 17.811390 11.685930 5.910336 16.950500
##         JAP       NET      SWE      SWI        UK        US
## 1  6.974841  3.668919 7.016366 3.615995  6.366568  5.838255
## 2  6.350421  7.477687 7.395542 6.573221  9.444840  4.292767
## 3  4.844125  7.802798 6.007390 6.660002  7.071091  3.272278
## 4 11.619400  8.022207 6.717996 8.754675  9.196044  6.177760
## 5 23.176230  9.591417 9.911724 9.767415 16.043990 11.054800
## 6 11.778410 10.217480 9.779878 6.696594 24.207290  9.143147

Transforming into xts format

As you can see there is data information in the first column, labelled X. We will use this data information to transform the data into data that R recognises as time-series data. So far the series are merely of the num type and the int type for the variable X. You can check that with the str(agg) command.

To convert the dataframes into ones that are recognised to contain time series data we use the xts(x = , order.by =) function. In x = we specify which variables should be transformed into time-series and order.by = specifies the date info.

As we do so we will encounter a couple of mistakes which helped me understand some of the workings or R. But you could also just skip a little ahead. Let's try it with x =agg[,-1] (i.e. all columns in agg but the first which contains the date) and order.by =agg[,1]:

agg 

As you can see from the error message R didn't recognise agg[,1] as a date object. We need to transform the info in variable X (agg[,1]) into a date format. For this we can use the as.Date() function. Let's try it:

as.Date(agg[,1])
##  [1] "1975-05-25" "1975-05-26" "1975-05-27" "1975-05-28" "1975-05-29"
##  [6] "1975-05-30" "1975-05-31" "1975-06-01" "1975-06-02" "1975-06-03"
## [11] "1975-06-04" "1975-06-05" "1975-06-06" "1975-06-07" "1975-06-08"
## [16] "1975-06-09" "1975-06-10" "1975-06-11" "1975-06-12" "1975-06-13"
## [21] "1975-06-14" "1975-06-15" "1975-06-16" "1975-06-17" "1975-06-18"
## [26] "1975-06-19" "1975-06-20" "1975-06-21" "1975-06-22" "1975-06-23"
## [31] "1975-06-24" "1975-06-25" "1975-06-26" "1975-06-27" "1975-06-28"
## [36] "1975-06-29" "1975-06-30" "1975-07-01" "1975-07-02" "1975-07-03"
## [41] "1975-07-04" "1975-07-05" "1975-07-06" "1975-07-07"

But you can see that the translation didn't quite work. It translated into dates, but into daily data, all in 1975. This is because every date has a number representation and day 1970 happens to be the 25th of May 1975. Actually day 0 is 1 Jan 1970. Check this out if you want to understand why, but you could also just accept it.

So we need a slightly different approach. For my own use I decided that it would be easiest to just take a particular day in every year and use that as the data (There may be better ways - please let me know!). So, let's use the 1st of Jan every year (you could use another date as well). For this purpose we need to add "-01-01" to every data. We do this with the paste0 function. For this reason we use order.by = as.Date(paste0(agg[,1],"-01-01")) in the xts function:

# xts needs subannual data so we associate the 1 Jan with each year
agg 

You could check with str(agg) that the tranformation into date format worked. Lastly, before we continue, irt is important to understand how to access the dates of this dataframe. We really only converted the inflation columns into time-series (e.g. agg[,-1]). The dates, which we created using the order.by option are now in the dataframe index and you can get them (and use them, see below) using index(agg).

Creating Plots

We will build up a fairly simple plot for starters. Then we will show how you can change all sorts of aspects of that one graph and then we will show how to use the clever little multiplot.R function to create multiple graphs in one window.

The setup we are concerned about is that we have multiple time series and we want to show these on one figure.

The ggplot setup for multiple time series

ggplot is the go-to graphics package for any intermediate R user. Let's assume you have your data as an xts formatted dataframe with several variables (saved in columns). In order to use ggplot to create a time-series graph we first need to transform the data into the following structure:

##    dates values countries
## 1 Date 1    2.4 Mountania
## 2 Date 2    3.2 Mountania
## 3 Date 3    2.9 Mountania
## 4 Date 1    4.5  Lakeland
## 5 Date 2    4.4  Lakeland
## 6 Date 3    5.1  Lakeland

We basically need to stack the dataes and and variable values in two columns (here "dates" and "values") and then need another column/variable in which we find the country information (or whatever or unit of measurement is, individuals, companies, regions, etc.).

So how do we get there from our dataframe, say agg? As this is a standard problem, people have written very useful functions to achieve this. Here we create a temporary dataframe, I call it temp so I know that I can delete or overwrite it as soon as it has fulfilled its purpose. Here is what we do:

temp 

The first line creates the new dataframe. We use the data information (index(agg)) from our original dataframe as the first variable, and then we take the rest of the data from agg (coredata(agg)), create a dataframe from it and then stack it. Look at the resulting dataframe, it has 572 rows, which is exactly right as it is 13 series with 44 observations each. Also confirm for yourself that it has exactly the structure we want ot to have.

Now we just change the names of the variables in temp as the standard names are somewaht awkward.

names(temp)[1] 

Now we are set to create our first plot.

A simple first plot

Let's create a simple plot. Let me introduce you to the wonderful world of ggplot. Let me show you the line and the result and then I explain how it works.

ggplot(temp,aes(x =Year, y=Infl, color=Country)) + geom_line()
Initial ggplot figure

We start with ggplot(NAMEofDF, aes(HERE we define what goes on the axis)). The aes option stands for aesthetics; I am not sure that this is the best choice of names, but hej, let's not worry about this. So here we have specified that the the dates should go on the x axis (Year), inflation should go on the y axis (Infl) and that the values for different contries should be shown in different colours.

But here is an important lesson for using ggplot. If you just execute ggplot(temp,aes(x =Year, y=Infl, color=Country)) nothing is going to happen, in fact you will get an error message telling you that there is a missing layer. We have just set-up the plot. It is + geom_line() that tells R that you want to use the data to create a line plot (try + geom_point instead to see what happens).

Manipulating the plot

We have now created a simple plot and we can adjust it as we like. To facilitate this it is actually good to save the plot as an object:

p1 

Now the plot is saved in p1 and to plot it you merely have to type p1 into the command window (formally p1 is now a list and you could try and understand what elements ot has, but at this stgae there is really no need to).

We change things in this graph by adding layers to it. This works in a really quite intuitive way. Let's say that we want to ensure that the y-axis shows from -10 to 30. To find out how to do that we use our good old friend Dr. Google ("R ggplot y axis limits") and just scan any of the top links to find out that we can use the ylim function to achieve this. So let's say we want to adjust our figure p1 and impose some y-axis limits. This is how we do it:

p1 

easy, right?

Let's give it also a title:

p1 

Just put p1 into the command window to actually see the figure. As it turns out, a lot of the detail of the graph can be changed via the theme function. Below I changed a whole lot of things about the figure, for instance I remove the x-axis title (axis.title.x=element_blank()) or I change the background colour (panel.background=element_rect(fill = "white")). try and figure out what the other elements do and fiddle around a bit.

p1 

There is of course no point in learning all these options by heart (perhaps if you were dropped from Mars in the Sixties this is what you would think you should do). Dr. Google is always at hand.

The last thing we will do is to add some annotations. One of the things we did via the themes function above was to remove the legend as the resulting picture is mainly meant to illustrate the general international pattern of inflation. However, we could highlight which countries some of the more extreme series belong to. We use the annotate function to do this. Again, we will just do it here and you can play around with the options.

p1 

Let's confirm what we have created so far:

p1
The manipulated ggplot

Creating a multiple plot

We actually imported 4 different inflation series. Let's say that we want to replicate the same figure for the remaining series and then show them all in the same figure. When I say same figure, what I mean is that we want to create a figure with four panels each of which shows one of the graphs.

Let me show you what I mean first.

Let us first create the remaining three plots and save them in objects p2, p3 and p4. We basically repeat what we did for the agg dataframe now for core, ener and food. The only differences between these being where exactly we place the annotations. In practice you will have to fiddle with the exact placement of these.

temp 

Now we have four objects p1 to p4 which we want to work into one figure. On another Section of the ECLR webpage we used the par( ) function to achieve this, but unfortunately this does not work with ggplots (I didn't know but a quick google search revealed this.). So here we use another package, the gridExtra package. Make sure it is installed!

library(gridExtra)
grid.arrange(p1, p2, p3, p4, nrow=2, ncol=2)
## Warning: Removed 4 rows containing missing values (geom_path).
## Warning: Removed 3 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
Here is the final result

The use of this function is really straighforward, The first inputs are the figure objects we want to use and nrow=2, ncol=2 indicate that we want two rows and two clumns of figures.

Easy!

If you want to use this figure in some other document you should click on the Export button in the Plots previewer of RStudio. There you can also change the dimensions of the picture.

Some more resources

A few good graphing resources for ggplot are here:

  • Juanjo Medina has an excellent intro, however, it doesn't cover time-series plots.
  • A cheat sheet for ggplot; only use once you have a basic understanding of how to use it.
  • For some fine-tuning of figures you can find good advice on noamross.net
  • Some great worked examples from sthda.com