R Time Series Objects vs Data Frames: Advantages and Limitations
R
time-series
Author
John Yuill
Published
January 30, 2022
As I learn more and more about R, questions often arise about which packages/methods/tools to use for a given situation. R is a vast - and growing - universe and I’m not interested in learning everything in that universe. I’m interested in learning the shortest paths between where I am now and my objective. As an adherent of the tidyverse, I lean strongly toward solutions in that realm. But, to paraphrase an old saying, ‘tidyverse is a playgound…not a jail’ and if a problem can be handled better by stepping outside the tidyverse, I’m all for that.
One of these areas is in dealing with time series: data sets comprised of repeated measurements over consistent time intervals (hourly, daily, monthly, etc). You can work with time series data using data frames, the fundamental building block of data analysis in R, but there are more specialized tools that offer more flexibility, specific capabilities and ease of use when analyzing time-based data. This can come into play in a wide variety of situations: weekly website visits, monthly sales, daily stock prices, annual GDP, electricity use by minute, that kind of thing.
So what are these time series advantages? How do we leverage them? What limitations of time series objects are good to be aware of? I’m not pretending this is a definitive guide, but I’ve been looking at this for a while and hear are my observations…
(A word on forecasting: this is a MAJOR use case for time series but is not the main focus here and I’ll only touch on that briefly below.)
Time Series Essentials
ts is the basic class for time series objects in R. You can do a lot with ts but its functionality have been extend by other packages, in particular zoo and more recently xts.
xts is a leading, evolved R package for working with time series. It builds on zoo, an earlier pkg for handling time series in R. Datacamp has a very nice
So I’m just going to scratch the surface and hit some highlights with examples here.
Get a Time Series Object
At its most basic, a time series object is a list or sometimes matrix of observations at regular time intervals.
Examples in built-in R data sets include:
annual Nile river flows
class(Nile)
[1] "ts"
str(Nile)
Time-Series [1:100] from 1871 to 1970: 1120 1160 963 1210 1160 1160 813 1230 1370 1140 ...
monthly Air Passengers - yes, I know everybody uses Air Passengers for their time series example. So damn handy. Different examples below, I promise. ;)
class(AirPassengers)
[1] "ts"
str(AirPassengers)
Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
Both these examples are time series of the ts class, and we can see right off that these are different data structures from data frames. A key thing to note about time series is that date/time is not in a column the way it would be in a data frame, but is in an index - similar to row.names in a data frame.
If we look at the index for the Nile river data, we can see the time values and we can check the start and end. This info corresponds to the structure info shown above, where start = 1871, end = 1970, and frequency = 1, meaning 1 observation per year, annual data.
An xts object on Jan 1949 / Dec 1960 containing:
Data: double [144, 1]
Index: yearmon [144] (TZ: "UTC")
head(Air_xts)
[,1]
Jan 1949 112
Feb 1949 118
Mar 1949 132
Apr 1949 129
May 1949 121
Jun 1949 135
We can see here that xts has reshaped the data from a matrix with rows by year and columns by month to more ‘tidy’ data with mth-year as index and observations in one column.
Native xts
Some data comes as xts time series out of the box. For example, the quantmod package fetches stock market data as xts time series automatically:
library(quantmod)## use quantmod pkg to get some stock prices as time seriesprice <-getSymbols(Symbols='EA', from="2020-01-01", to=Sys.Date(), auto.assign=FALSE)class(price)
As noted, a key characteristic of time series object is that dates are in an index rather than being in a date column, as they would be in typical data frame. Looking at the structure of the xts object, we can again see it is different from a data frame.
str(price)
An xts object on 2020-01-02 / 2023-04-06 containing:
Data: double [822, 6]
Columns: EA.Open, EA.High, EA.Low, EA.Close, EA.Volume ... with 1 more column
Index: Date [822] (TZ: "UTC")
xts Attributes:
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2023-04-08 16:50:35"
Convert xts to data frame
If you want to work with the time series as a data frame, it is fairly straightforward to convert an xts object:
price_df <-as.data.frame(price)## add Date field based on index (row names) of xts objectprice_df$Date <-index(price)## set data frame row names to numbers instead of datesrownames(price_df) <-seq(1:nrow(price))## reorder columns to put Date firstprice_df <- price_df %>%select(Date, 1:ncol(price_df)-1)## check out structure using glimpse, as is the fashion of the timesglimpse(price_df)
Data frame is basically a straight-up table, whereas the xts object has other structural features.
Convert data frame to xts
## convert data frame to xts object by specifying the date field to use for xts index.price_xts <-xts(price_df, order.by=as.Date(price_df$Date))str(price_xts)
An xts object on 2020-01-02 / 2023-04-06 containing:
Data: character [822, 7]
Columns: Date, EA.Open, EA.High, EA.Low, EA.Close ... with 2 more columns
Index: Date [822] (TZ: "UTC")
Notice, however, that in the process of converting an xts object to data frame and back to xts, the xts Attributes information has been lost.
Saving/Exporting time series data
Due to the structure of an xts object, the best way to save/export for future use in R and preserve all its attributes is to save as RDS file, using saveRDS. (additional helpful RDS info here.)
However, this won’t be helpful if you need to share the data with someone who is not using R. You can save as a CSV file using write.zoo (be sure to specificy sep=“,”) and this will maintain the table structure of the data but will drop the attributes. It will automatically move the indexes into an Index column so if someone opens it in Excel/Google Sheets, they will see the dates/times.
Saving as RDS or CSV:
## save as RDS to preserve attributessaveRDS(price, file="price.rds")price_rds <-readRDS(file='price.rds')str(price_rds)
An xts object on 2020-01-02 / 2023-04-06 containing:
Data: double [822, 6]
Columns: EA.Open, EA.High, EA.Low, EA.Close, EA.Volume ... with 1 more column
Index: Date [822] (TZ: "UTC")
xts Attributes:
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2023-04-08 16:50:35"
## save as CSV - ensure to include sep=","write.zoo(price, file='price.csv', sep=",")price_zoo <-read_csv('price.csv')
Time Series Strengths
The structure of a time series leads a variety of advantages related to time-based analysis, compared to data frames. A few of the main ones, at least from my perspective:
Period/Frequency Manipulation: can easily change from granular periods, such as daily, to aggregated periods.
Period calculations: counting number of periods in the data (months, quarters, years).
Selection/subsetting based on date ranges.
Visualization: a number of visualization options are designed to work with time series.
Decomposition: breaking out time series into trend, seasonal, random components for analysis.
Forecasting: time series objects are designed for applying various forecasting methods like Holt-Winters and ARIMA. This is well beyond the scope of this post, but we’ll show a quick ARIMA example.
No doubt everything you can do with time series can be done with data frames, but using a time series object can really expedite things.
Time Series Manipulation/Calculation
Period/Frequency Manipulation
Change the period granularity to less granular:
easily change daily data to weekly, monthly, quarterly, yearly
## get periodicity (frequency) for data setperiodicity(price)
Daily periodicity from 2020-01-02 to 2023-04-06
## aggregate by periodhead(to.weekly(price)[,1:5])
Notice that this isn’t a straight roll-up but actual summary: for the monthly data, the High is max of daily data for the month, the Low is minimum for the month, while volume is the sum for the month, all as you would expect.
You can also pull out the values at the END of a period-length, including setting number of periods to skip over each iteration:
get index for last day of period length specified in ‘on’ for every k period.
apply index to dataset to extract the rows.
## every 2 weeks (on='week's, k=2)end_wk <-endpoints(price, on="weeks", k=2)head(price[end_wk,])
See end of Period Calculations section for how to get an average during periods shown: averages for each 6 month period, for example.
Period Counts
## get the number of weeks, months, years in the dataset (including partial)price_nw <-nweeks(price)price_nm <-nmonths(price)price_ny <-nyears(price)
The price data covers:
822 days
171 weeks
40 months
4 years (or portions thereof)
First/last dates:
## get earliest datest_date <-start(price)## get last dateend_date <-end(price)
Start: 2020-01-02
End: 2023-04-06
Selecting/Subsetting
Time series objects make it easy to slice the data by date ranges. This is an area where time series really shine compared to trying to do the same thing with a data frame.
xts is super-efficient at interpreting date ranges based on minimal info.
‘/’ is a key symbol for separating dates - it is your friend.
date ranges are inclusive of references used.
Note that in the following examples based on stock market data, dates are missing due to gaps in data - days when markets closed.
quickly get entire YEAR
## subset on a YEAR (showing head and tail to confirm data is 2021 only)head(price["2021"])
Time series objects lend themselves well to time-based calculations.
Simple arithmetic between two dates is not as straightforward as might be expected, but still easily doable:
## subtraction of a given metric between two datesas.numeric(price$EA.Close["2022-01-21"])-as.numeric(price$EA.Close["2022-01-18"])
[1] 5.1
## subtraction of one metric from another on same dateprice$EA.Close["2022-01-18"]-price$EA.Open["2022-01-18"]
EA.Close
2022-01-18 -4.53
Lag.xts is versatile for lag calculations, calculating differences over time:
## calculates across all columns with one command - default is 1 period but can be set with khead(price-lag.xts(price))
EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
2020-01-02 NA NA NA NA NA NA
2020-01-03 -2.36 -0.60 -1.64 -0.14 -60700 -0.138
2020-01-06 1.37 1.56 1.51 1.58 1093900 1.559
2020-01-07 2.05 -0.06 1.10 -0.39 -1241800 -0.385
2020-01-08 -0.82 0.75 0.05 1.10 959200 1.085
2020-01-09 1.82 0.34 0.49 -0.13 -833000 -0.128
## set k for longer lag - this example starting at a date beyond available data for the lag calculations, so no NAshead(price["2020-01-13/"]-lag.xts(price, k=7))
Diff for calculating differences, based on combination of lag and difference order:
head(diff(price, lag=1, differences=1))
EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
2020-01-02 NA NA NA NA NA NA
2020-01-03 -2.36 -0.60 -1.64 -0.14 -60700 -0.138
2020-01-06 1.37 1.56 1.51 1.58 1093900 1.559
2020-01-07 2.05 -0.06 1.10 -0.39 -1241800 -0.385
2020-01-08 -0.82 0.75 0.05 1.10 959200 1.085
2020-01-09 1.82 0.34 0.49 -0.13 -833000 -0.128
head(diff(price, lag=1, differences=2))
EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
2020-01-02 NA NA NA NA NA NA
2020-01-03 NA NA NA NA NA NA
2020-01-06 3.73 2.16 3.15 1.72 1154600 1.70
2020-01-07 0.68 -1.62 -0.41 -1.97 -2335700 -1.94
2020-01-08 -2.87 0.81 -1.05 1.49 2201000 1.47
2020-01-09 2.64 -0.41 0.44 -1.23 -1792200 -1.21
first example: diff with lag=1, differences=1 gives same result as lag.xts with k=1 (or default)
second example: diff with differences=2 gives the ‘second order difference’: difference between the differences.
EA.Open:
3.73 = 1.37-(-2.36)
0.68 = 2.05-1.37
-2.87 = -0.82-2.05
…
Useful for some forecasting methods, among other applications.
Returns for calculating % change period over period:
functions in quantmod package designed for financial asset prices, but can be applied to other xts data.
various periodicity: daily, weekly, monthly, quarterly, yearly or ALL at once (allReturn())
applied to Air Passenger xts to get % change, even though not financial returns:
head(monthlyReturn(Air_xts))
monthly.returns
Jan 1949 0.0000
Feb 1949 0.0536
Mar 1949 0.1186
Apr 1949 -0.0227
May 1949 -0.0620
Jun 1949 0.1157
Average for period:
Using the indexes obtained in the ‘endpoints’ example at the end of the Period/Frequency Manipulation section above, calculate averages for the periods.
You can also calculate a rolling (moving) average quickly with ‘rollmean’ function from zoo:
## get subset of data for demoprice_c <- price[,'EA.Close']price_c <- price_c['/2020-02-28']## calc rolling mean and add to original data ## - k=3 means 3-period lag## - align='right' put calculated number at last date in rolling periodprice_c$EA_CLose_rm <-rollmean(price_c, k=3, align='right')## quick dygraph - more on this belowdygraph(price_c, width='100%')
Visualization
Time series objects offer some different visualization opportunities than data frames. Below are a couple of options.
Plot.ts
You can do a quick, simple plot with plot.ts(). Note that in this case the x-axis is the numerical index of the data point, and doesn’t show the date.
plot.ts(price$EA.Close)
Dygraphs
The dygraphs package offers flexibility and interactivity for time series.
we provide 6 full years of data and most of that is used to calculated decomposition.
x-axis is year number.
TREND: trending up to about half-way through year 2, then down until about the same point in year 3, then back up, looking like a peak in mid year 4. Not willing to stretch out beyond that. ;)
SEASONAL: pattern has been detected where tends to be a dip at beginning of year, rising up to a peak toward end of first quarter, dropping sharply, smaller peak mid-year, peak in q3 or early q4, drop with a smaller bump at end of year.
RANDOM: as to be expected with stock price in general, lots of randomness involved!
Looks like there may be money to be made riding the seasonal wave! Please: do not buy or sell stocks based on this information. ;)
Forecasting
A primary use case for time series objects is forecasting. This is a whole other, involved topic way beyond the scope of this post.
Here is a quick example to show how easy forecasting can be in R. Note that we need to bring in the forecast package for this. (There is also the amazing [tidyverts eco-system](https://tidyverts.org/) for working with time series that I have recently discovered - again, a whole other topic for another time.)
Get an ARIMA Model
Some basic terms, over-simplified for our purposes here:
ARIMA stands for Auto Regression Integrated Moving Average
One of the most widely-used time series forecasting methods, although certainly not the only.
3 essential parameters for ARIMA are p,d,q: p=periods of lag, d=differencing, q=error of the model.
library(forecast)## get closing prices for forecastingprice_cl <- price[,4]## get a model for the time series - using auto.arima for simplicityfitA <-auto.arima(price_cl, seasonal=FALSE) ## can add trace=TRUE to see comparison of different models ## show modelfitA
The model we get back is ARIMA(0,1,1) which means p=0, d=1, q=1. We can generate a model by setting these parameters manually, but auto.arima automatically checks a variety of models and selects the best. When comparing models, lowest AIC and BIC are preferred.
We can check the accuracy of the model. Most useful item here for interpretation and comparison is MAPE (mean average percent error). In this case,
## check accuracy - based on historical dataaccuracy(fitA)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.0218 2.26 1.67 0.00169 1.33 0.999 -0.0405
fitAa <-accuracy(fitA)100-fitAa[,5]
[1] 98.7
So in this case a MAPE of 1.325 can be seen as accuracy of 98.675%.
We can also plot the residuals of the model for visual inspection.
## check residualstsdisplay(residuals(fitA), main='Residuals of Simple ARIMA Forecasting Model for Stock Price')
As usual with residuals, we are looking for mean around 0, roughly evenly distributed. For ARIMA we also get ACF and PACF, where we are looking for bars to be short and at least within blue dotted lines. So looks like we are good to go here.
Create A Forecast
We just need a little more code to create and plot forecast. We can set the forecast period for whatever we want, based on the periodicity of the data, in this case days and we are looking out 30 days.
That was easy! And we can use this approach to quickly iterate over various models, if we are not convinced that auto.arima is the best. Of course you can use data frames to create forecasts of various sorts but the xts object makes it super-easy to apply common time series methods.
This also reveals a shortcoming of times-series forecasting:
dependence of pattern recognition and pattern repetition, which can lead to conservative forecast, especially with noisy data.
as a result, the forecast is: ‘steady as she goes, with possibility of moving either quite a bit higher or quite a bit lower’.
So not that useful. To be fair, if stock market prices are not actually predictable, so it is a perfectly reasonable outcome that grounds us in reality.
Conclusion
Times series objects are obviously a powerful way to work with time-based data and a go-to when your data is based on time. Particular strengths inculde:
Ease of manipulation such as aggregation by date periods, selecting date ranges, period calculations.
Some great visualization options for exploring the data.
Forecasting which is really the bread and butter of time series objects.
There are some cases where you may prefer to stick with data frames:
Multi-dimensional data: time series work best when each row represents a distinct time. If you are dealing with multi-dimensional data where dates are broken down by customer, or region, etc., especially in tidy format, you may want to stick with data frame.
Visualization preferences: if you are more comfortable with using ggplot2 (or other visualization tools geared toward data frames) a data frame may be preferable. Or if the document you are producing has ggplot2 charts, you may want to maintain standard presentation.
Forecasting needs: if you are doing time series forecasting you will want to use a time series object. If you’re not doing forecasting, there is less of a need. Limitation is that time series forecasting is based only on historical trends in the data and doesn’t include things like correlation with other factors.
Ultimately, the right tool for the job depends on a variety of situational factors, and having a collection of tools at your disposal helps you avoid the ‘when you have is a hammer…’ pitfall. If your data is based on time, time series should be in consideration.
So that’s quite a lot for one blog post - hopefully helps you make the most of your ‘time’!
Resources
Additional resources that may be helpful with time-series and xts in particular: