Promedio Móvil En R Series De Tiempo

Base R tiene una gran cantidad de funcionalidad útil para series de tiempo, en particular en el paquete de estadísticas. Esto se complementa con muchos paquetes en CRAN, que se resumen brevemente a continuación. También hay una considerable superposición entre las herramientas para las series de tiempo y las de las vistas de las tareas de Econometría y Finanzas. Los paquetes de esta vista se pueden estructurar en los siguientes temas. Si piensa que falta algún paquete en la lista, háganoslo saber. Infraestructura. Base R contiene una infraestructura sustancial para representar y analizar datos de series de tiempo. La clase fundamental es quottsquot que puede representar series temporales regularmente espaciadas (usando sellos de tiempo numéricos). Por lo tanto, es particularmente adecuado para datos anuales, mensuales, trimestrales, etc. Los promedios móviles se calculan por ma desde el pronóstico. Y rollmean del zoológico. Este último también proporciona una aplicación de rol de función general. Junto con otras funciones de estadísticas de balanceo específicas. Roll proporciona funciones paralelas para calcular estadísticas de balanceo. Gráficos. Las gráficas de series temporales se obtienen con el gráfico () aplicado a los objetos ts. (Parciales) las funciones de autocorrelación se implementan en acf () y pacf (). Versiones alternativas son proporcionadas por Acf () y Pacf () en el pronóstico. Junto con una pantalla de combinación usando tsdisplay (). SDD proporciona diagramas de dependencia en serie más generales, mientras que dCovTS calcula y traza las funciones de covarianza y correlación de distancia de series de tiempo. Las exhibiciones estacionales se obtienen usando la trama de mes () en las estadísticas y el gráfico de temporada en el pronóstico. Wats implementa gráficos de series de tiempo envolventes. Ggseas proporciona gráficos ggplot2 para series ajustadas estacionalmente y estadísticas de balanceo. Dygraphs proporciona una interfaz a la serie de gráficos interactivos de Dygraphs. ZRA traza objetos de pronóstico del paquete de pronóstico usando dígrafos. Los pronósticos y los vars son los diagramas básicos de las distribuciones de pronósticos. Los diagramas de ventiladores más flexibles de cualquier distribución secuencial se implementan en fanplot. Los quottsquot de clase sólo pueden ocuparse de sellos de tiempo numéricos, pero muchas más clases están disponibles para almacenar información de tiempo / fecha e informática con ella. Para una visión general vea R Help Desk: Clases de fecha y hora en R de Gabor Grothendieck y Thomas Petzoldt en R News 4 (1). 29-32. Las clases quotyearmonquot y quotyearqtrquot del zoológico permiten un cómputo más conveniente con observaciones mensuales y trimestrales, respectivamente. Clase quotDatequot del paquete base es la clase básica para tratar las fechas en los datos diarios. Las fechas se almacenan internamente como el número de días desde 1970-01-01. El paquete cron proporciona clases para las fechas (). Horas () y fecha / hora (intra-día) en cron (). No hay soporte para zonas horarias y el horario de verano. Internamente, los objetos quotchronquot son días (fraccionados) desde 1970-01-01. Las clases quotPOSIXctquot y quotPOSIXlt implementan el estándar POSIX para información de fecha / hora (intra-día) y también admiten zonas horarias y horario de verano. Sin embargo, los cálculos de la zona horaria requieren cierta atención y pueden depender del sistema. Internamente, los objetos quotPOSIXctquot son el número de segundos desde 1970-01-01 00:00:00 GMT. El paquete lubridate proporciona funciones que facilitan ciertos cálculos basados en POSIX. La clase quottimeDatequot se proporciona en el paquete timeDate (anteriormente: fCalendar). Está dirigido a la información financiera de tiempo / fecha y se ocupa de los husos horarios y los horarios de ahorro de luz a través de un nuevo concepto de centros cuantitativos. Internamente, almacena toda la información en quotPOSIXctquot y hace todos los cálculos en GMT solamente. Funcionalidad de calendario, p. Incluyendo información sobre fines de semana y feriados para diversas bolsas de valores, también está incluido. El paquete tis proporciona la clase quottiquot para la información de hora / fecha. La clase quotmondatequot del paquete mondate facilita la computación con fechas en términos de meses. El paquete tempdisagg incluye métodos para la desagregación temporal y la interpolación de una serie temporal de baja frecuencia a una serie de frecuencias más altas. TimeProjection extrae los componentes de tiempo útil de un objeto de fecha, como día de la semana, fin de semana, día festivo, día del mes, etc, y ponerlo en un marco de datos. Como se mencionó anteriormente, quottsquot es la clase básica para series de tiempo regularmente espaciadas usando sellos de tiempo numéricos. El paquete del zoológico proporciona infraestructura para series temporales regulares e irregularmente espaciadas usando clases arbitrarias para los sellos de tiempo (es decir, permitiendo todas las clases de la sección anterior). Está diseñado para ser lo más consistente posible con quottsquot. La coacción desde y hacia quotzooquot está disponible para todas las otras clases mencionadas en esta sección. El paquete xts se basa en zoológico y proporciona un manejo uniforme de Rs diferentes clases de datos basadas en tiempo. Varios paquetes implementan series temporales irregulares basadas en sellos de tiempo quotPOSIX, especialmente destinados a aplicaciones financieras. Estos incluyen quotitsquot de su. Cita de tseries. Y quotftsquot de fts. La clase quottimeSeriesquot en timeSeries (anteriormente: fSeries) implementa series de tiempo con quottimeDatequot sellos de tiempo. La clase quottisquot in tis implementa series de tiempo con sellos de tiempo quottiquot. El paquete tframe contiene infraestructura para establecer intervalos de tiempo en diferentes formatos. Pronóstico y modelado univariado El paquete de pronóstico proporciona una clase y métodos para predicciones de series de tiempo univariadas, y proporciona muchas funciones implementando diferentes modelos de pronóstico incluyendo todos aquellos en el paquete de estadísticas. Desvanecimiento exponencial . HoltWinters () en stats proporciona algunos modelos básicos con optimización parcial, ets () del paquete de pronóstico proporciona un conjunto más grande de modelos e instalaciones con optimización completa. El paquete MAPA combina modelos de suavizado exponencial a diferentes niveles de agregación temporal para mejorar la precisión de pronóstico. Smooth implementa algunas generalizaciones de suavizado exponencial. El método theta se implementa en la función thetaf del paquete de pronóstico. Una implementación alternativa y extendida se proporciona en forectheta. Modelos autorregresivos. Ar () en las estadísticas (con la selección del modelo) y FitAR para los modelos del subconjunto AR. Modelos ARIMA. Arima () en stats es la función básica de los modelos ARIMA, SARIMA, ARIMAX y subconjunto ARIMA. Se mejora en el paquete de pronóstico a través de la función Arima () junto con auto. arima () para la selección automática de órdenes. Arma () en el paquete tseries proporciona diferentes algoritmos para los modelos ARMA y subconjunto ARMA. FitARMA implementa un algoritmo rápido MLE para los modelos ARMA. El paquete gsarima contiene funcionalidad para la simulación generalizada de series temporales SARIMA. El paquete mar1s maneja RA multiplicativo (1) con procesos estacionales. TSTutorial proporciona un tutorial interactivo para el modelado de Box-Jenkins. Los intervalos de predicción mejorados para ARIMA y los modelos de series temporales estructurales son proporcionados por tsPI. Modelos periódicos ARMA. Pera y partsm para modelos periódicos de series temporales autorregresivas y perARMA para el modelado periódico ARMA y otros procedimientos para el análisis periódico de series temporales. Modelos ARFIMA. Algunas instalaciones para los modelos ARFIMA fraccionados diferenciados se proporcionan en el paquete fracdiff. El paquete arfima tiene instalaciones más avanzadas y generales para los modelos ARFIMA y ARIMA, incluyendo modelos de regresión dinámica (función de transferencia). ArmaFit () del paquete fArma es una interfaz para los modelos ARIMA y ARFIMA. El ruido gaussiano fraccional y los modelos simples para las series de tiempo de decaimiento hiperbólico se manejan en el paquete FGN. Los modelos de función de transferencia son proporcionados por la función arimax en el paquete TSA y la función arfima en el paquete arfima. La detección de valores atípicos siguiendo el enfoque de Chen-Liu es proporcionada por tsoutliers. Los modelos estructurales se implementan en StructTS () en stats, y en stsm y stsm. class. KFKSDS proporciona una implementación ingenua del filtro de Kalman y suavizadores para modelos de espacio de estados univariados. Los modelos de series de tiempo estructurales bayesianos se implementan en bsts. Las series de tiempo no gaussianas se pueden manejar con modelos de espacio de estado GLARMA vía glarma. Y utilizando Modelos de Puntuación Autoregressive Generalizada en el paquete GAS. Los modelos de Auto-Regresión condicional que utilizan métodos Monte Carlo Likelihood se implementan en mclcar. Modelos GARCH. Garch () de tseries se ajusta a los modelos básicos de GARCH. Muchas variaciones en los modelos de GARCH son proporcionadas por rugarch. Otros paquetes GARCH univariados incluyen fGarch que implementa modelos ARIMA con una amplia clase de innovaciones GARCH. Hay muchos más paquetes de GARCH descritos en la vista de tareas de Finanzas. Los modelos de volatilidad estocástica son manejados por stochvol en un marco bayesiano. Los modelos de series temporales de conteo se manejan en los paquetes tscount y acp. ZIM proporciona Modelos Inflados Cero para series de tiempo de conteo. Tsintermittent implementa varios modelos para analizar y pronosticar series de tiempo de demanda intermitente. Las series temporales censuradas pueden ser modeladas usando centavos y carx. Las pruebas de Portmanteau se proporcionan a través de Box. test () en el paquete de estadísticas. Pruebas adicionales son realizadas por portes y WeightedPortTest. La detección de puntos de cambio se proporciona en strucchange (usando modelos de regresión lineal), en tendencia (usando pruebas no paramétricas) y en wbsts (usando segmentación binaria salvaje). El paquete de punto de cambio proporciona muchos métodos de punto de cambio populares, y ecp hace detección de punto de cambio no paramétrico para series univariadas y multivariantes. La detección en línea de puntos de cambio para series cronológicas univariadas y multivariadas es proporcionada por CPD online. InspectChangepoint utiliza una proyección dispersa para estimar los puntos de cambio en series temporales de alta dimensión. La imputación de series temporales es proporcionada por el paquete imputeTS. Algunas instalaciones más limitadas están disponibles utilizando na. interp () del paquete de pronóstico. Los pronósticos se pueden combinar usando ForecastCombinations, que soporta los métodos más utilizados para combinar pronósticos. ForecastHybrid proporciona funciones para previsiones de conjuntos, combinando enfoques del paquete de pronóstico. Opera tiene facilidades para predicciones en línea basadas en combinaciones de pronósticos proporcionados por el usuario. La evaluación de pronósticos se proporciona en la función accuracy () de la previsión. La evaluación de la predicción distributiva usando reglas de puntuación está disponible en scoringRules Miscellaneous. Ltsa contiene métodos para el análisis lineal de series temporales, timsac para el análisis y control de series temporales y tsbugs para modelos BUGS de series temporales. La estimación de la densidad espectral es proporcionada por el espectro () en el paquete de estadísticas, incluyendo el periodograma, el periodograma suavizado y las estimaciones de AR. La inferencia espectral bayesiana viene dada por bspec. Quantspec incluye métodos para calcular y representar los periodogramas de Laplace para series temporales univariadas. El periodograma de Lomb-Scargle para series temporales muestradas de forma irregular se calcula mediante lomb. El espectro utiliza transformadas de Fourier y Hilbert para el filtrado espectral. Psd produce estimaciones de densidad espectral adaptativas, seno-multitaper. Kza proporciona filtros adaptativos Kolmogorov-Zurbenko incluyendo detección de rotura, análisis espectral, wavelets y transformadas de Fourier KZ. Multitaper también proporciona algunas herramientas de análisis espectral multitaper. Métodos Wavelet. El paquete de wavelets incluye computar filtros wavelet, transformadas wavelet y análisis multiresolución. Los métodos wavelet para el análisis de series temporales basados en Percival y Walden (2000) se dan en wmtsa. WaveletComp proporciona algunas herramientas para el análisis basado en wavelets de series temporales univariadas y bivariadas incluyendo ondas cruzadas, pruebas de diferencia de fase y significantes. Biwavelet puede usarse para trazar y calcular los espectros de wavelets, espectros de onda cruzada y coherencia de wavelet de series temporales no estacionarias. También incluye funciones para agrupar series temporales basadas en las (des) similitudes en su espectro. Pruebas de ruido blanco utilizando wavelets son proporcionados por hwwntest. Se pueden encontrar métodos wavelet adicionales en los paquetes de brainwaver. Rwt. Waveslim Wavethresh y mvcwt. La regresión armónica utilizando términos de Fourier se implementa en la Reacción Armónica. El paquete de previsión también proporciona algunas facilidades de regresión de armónicos simples a través de la función fourier. Descomposición y filtración Filtros y suavizado. Filter () en stats proporciona un filtrado lineal promedio autorregresivo y móvil de series temporales univariadas múltiples. El paquete Robfilter proporciona varios filtros robustos de series temporales, mientras que mFilter incluye diversos filtros de series temporales útiles para suavizar y extraer componentes de tendencia y cíclicos. Smooth () del paquete de estadísticas calcula Tukeys que ejecutan los suavizadores medianos, 3RS3R, 3RSS, 3R, etc. sleekts calcula el 4253H dos veces el método de suavizado. Descomposición. La descomposición estacional se discute a continuación. La descomposición basada en la auto-regresión es proporcionada por ArDec. Rmaf utiliza un refinado filtro de media móvil para la descomposición. El Análisis de Espectro Singular se implementa en Rssa y métodos espectrales. La descomposición en modo empírico (EMD) y el análisis espectral de Hilbert son proporcionados por EMD. Las herramientas adicionales, incluyendo EMD del conjunto, están disponibles en hht. Una implementación alternativa del conjunto EMD y su variante completa están disponibles en Rlibeemd. Descomposición estacional. El paquete stats proporciona descomposición clásica en descomposición (). Y la descomposición STL en stl (). La descomposición STL mejorada está disponible en stlplus. StR proporciona la descomposición Seasonal-Trend basada en la regresión. X12 proporciona una envoltura para los binarios X12 que deben instalarse primero. X12GUI proporciona una interfaz gráfica de usuario para x12. Los binarios X-13-ARIMA-SEATS se proporcionan en el paquete x13binary, con una interfaz estacional proporcionando R. Análisis de la estacionalidad. El paquete de bfast proporciona métodos para detectar y caracterizar cambios abruptos dentro de la tendencia y componentes estacionales obtenidos de una descomposición. Npst proporciona una generalización de la prueba de estacionalidad de Hewitts. temporada. Análisis estacional de datos de salud incluyendo modelos de regresión, crossover de casos estratificados en el tiempo, funciones de trazado y chequeos residuales. Mares Análisis estacional y gráficos, especialmente para climatología. Deseasonalize Desestacionalización óptima para las series de tiempo geofísico utilizando ajuste AR. Stationarity, Unit Roots, y Cointegration Stationarity y raíces unitarias. Tseries proporciona varias pruebas de estacionariedad y raíz unitaria, incluyendo Dickey-Fuller aumentado, Phillips-Perron y KPSS. Las implementaciones alternativas de las pruebas ADF y KPSS se encuentran en el paquete urca, que también incluye otros métodos como las pruebas Elliott-Rothenberg-Stock, Schmidt-Phillips y Zivot-Andrews. El paquete fUnitRoots también proporciona la prueba MacKinnon, mientras que uroot proporciona pruebas estacionales de raíz unitaria. CADFtest proporciona implementaciones tanto de la ADF estándar como de una prueba ADF (CADF) aumentada en covariable. Estacionaria local. Locs proporciona una prueba de estacionaridad local y calcula la autocovariancia localizada. La determinación de la determinación de la costera de series temporales viene dada por costat. LSTS tiene funciones para el análisis de series de tiempo estacionario localmente. Los modelos de onda fija localmente estacionarios para series de tiempo no estacionarias se implementan en wavethresh (incluyendo la estimación, el trazado y la funcionalidad de simulación para espectros que varían en el tiempo). Cointegración. El método Engle-Granger de dos pasos con la prueba de cointegración Phillips-Ouliaris se implementa en tseries y urca. Este último contiene adicionalmente funcionalidad para las pruebas Johansen trace y lambda-max. TsDyn proporciona la prueba de Johansens y AIC / BIC selección simultánea de rango-lag. CommonTrend proporciona herramientas para extraer y trazar tendencias comunes de un sistema de cointegración. La estimación de parámetros y la inferencia en una regresión de cointegración se implementan en cointReg. Análisis no lineal de series temporales Autorregulación no lineal. Varias formas de autorregresión no lineal están disponibles en tsDyn incluyendo AR aditivo, redes neurales, modelos SETAR y LSTAR, umbral VAR y VECM. También se proporciona autoregresión de red neuronal en GMDH. BentcableAR implementa la autorregresión Bent-Cable. BAYSTAR proporciona un análisis bayesiano de modelos umbral autorregresivos. TseriesChaos proporciona una implementación R de los algoritmos del proyecto TISEAN. Autoregresión Los modelos de conmutación de Markov se proporcionan en MSwM. Mientras que las mezclas dependientes de los modelos de Markov latentes se dan en depmix y depmixS4 para series temporales categóricas y continuas. Pruebas. Diversas pruebas para la no linealidad se proporcionan en fNonlinear. TseriesEntropy pruebas de dependencia en serie no lineal basado en métricas de entropía. Las funciones adicionales para series temporales no lineales están disponibles en nlts y nonlinearTseries. El modelado y el análisis de la serie del tiempo de Fractal es proporcionado por el fractal. Fractalrock genera series de tiempo fractal con distribuciones de retornos no normales. Modelos dinámicos de regresión Modelos lineales dinámicos. Una interfaz conveniente para la adaptación de modelos de regresión dinámica a través de OLS está disponible en dynlm un enfoque mejorado que también funciona con otras funciones de regresión y más clases de series de tiempo se implementa en dyn. Se pueden montar ecuaciones de sistemas dinámicos más avanzadas usando dse. Los modelos de espacio de estados lineales gaussianos se pueden ajustar usando dlm (vía máxima verosimilitud, filtración / suavizado de Kalman y métodos bayesianos), o usando bsts que usa MCMC. Las funciones para el modelado no lineal de retraso distribuido se proporcionan en dlnm. Los modelos de parámetros que varían en función del tiempo se pueden montar utilizando el paquete tpr. OrderedLasso se ajusta a un modelo lineal escaso con una restricción de orden en los coeficientes para manejar regresores rezagados donde los coeficientes disminuyen a medida que aumenta el desfase. El modelado dinámico de varios tipos está disponible en dynr, incluyendo modelos discretos y continuos, lineales y no lineales, y diferentes tipos de variables latentes. Modelos de series temporales multivariantes Los modelos vectoriales autorregresivos (VAR) se proporcionan a través de ar () en el paquete de estadísticas básicas, incluyendo la selección de pedidos a través de AIC. Estos modelos se limitan a ser estacionarios. MTS es un conjunto de herramientas para el análisis de series temporales multivariadas que incluyen VAR, VARMA, VARMA estacional, modelos VAR con variables exógenas, regresión multivariable con errores de series temporales y mucho más. Es posible que los modelos VAR no estacionarios estén instalados en el paquete mAr, lo que también permite modelos VAR en el espacio de los componentes principales. Sparsevar permite la estimación de modelos VAR y VECM escasos, ecm proporciona funciones para construir modelos VECM, mientras que BigVAR estima modelos VAR y VARX con sanciones estructuradas por lazo. Los modelos y redes VAR automatizados están disponibles en autovarCore. Modelos más elaborados se proporcionan en el paquete vars. TsDyn. EstVARXls () en dse. Y un enfoque Bayesiano está disponible en MSBVAR. Otra implementación con intervalos de predicción bootstrap se da en VAR. etp. MlVAR proporciona autorregresión vectorial multi-nivel. VARsignR proporciona rutinas para identificar choques estructurales en modelos VAR usando restricciones de signo. Los modelos VARIMA y los modelos de espacio de estado se proporcionan en el paquete dse. EvalEst facilita los experimentos de Monte Carlo para evaluar los métodos de estimación asociados. Los modelos de corrección de errores vectoriales están disponibles a través de la urca. Vars y tsDyn, incluyendo versiones con restricciones estructurales y umbrales. Análisis de componentes de series temporales. El análisis de factores de series temporales se proporciona en tsfa. ForeCA implementa el análisis de componentes forecables mediante la búsqueda de las mejores transformaciones lineales que hacen que una serie temporal multivariada sea lo más previsible posible. PCA4TS encuentra una transformación lineal de una serie de tiempo multivariable dando secundario-dimensional subseries que están uncorrelated con uno a. Los modelos de espacio de estado multivariable se implementan en el paquete FKF (Fast Kalman Filter). Esto proporciona modelos de espacio de estados relativamente flexibles a través de la función fkf (): los parámetros de espacio de estado se permiten variar en el tiempo y las intercepciones se incluyen en ambas ecuaciones. Una implementación alternativa es proporcionada por el paquete de KFAS que proporciona un filtro de Kalman multivariable rápido, más suave, simulación más suave y previsión. Otra implementación se da en el paquete dlm que también contiene herramientas para convertir otros modelos multivariados en forma de espacio de estado. Dlmodeler proporciona una interfaz unificada para dlm. KFAS y FKF. MARSS se ajusta a los modelos de estado-espacio autorregresivos multivariados y constreñidos sin restricciones usando un algoritmo EM. Todos estos paquetes suponen que los términos de error de observación y de estado no están correlacionados. Los procesos de Markov parcialmente observados son una generalización de los modelos de espacio de estado multivariados lineales usuales, permitiendo modelos no-Gaussianos y no lineales. Estos se implementan en el paquete de pompa. Modelos de volatilidad estocástica multivariable (utilizando factores latentes) son proporcionados por factores tochvol. Análisis de grandes grupos de series temporales La agrupación de series temporales se implementa en TSclust. Dtwclust. BNPTSclust y pdc. TSdist proporciona medidas de distancia para datos de series de tiempo. Jmotif implementa herramientas basadas en la discretización simbólica de series temporales para encontrar motivos en series de tiempo y facilita la clasificación de series temporales interpretables. Los métodos para trazar y pronosticar colecciones de series temporales jerárquicas y agrupadas son proporcionados por hts. Ladrón usa métodos jerárquicos para conciliar pronósticos de series de tiempo temporalmente agregadas. Un método alternativo para conciliar las previsiones de series de tiempo jerárquicas es proporcionado por gtop. Ladrón Modelos de tiempo continuo Modelado autorregresivo de tiempo continuo en cts. Sim. DiffProc simula y modela ecuaciones diferenciales estocásticas. La simulación y la inferencia para ecuaciones diferenciales estocásticas son proporcionadas por sde y yuima. Bootstrapping. El paquete de arranque proporciona la función tsboot () para el arranque en serie temporal, incluido el bloqueo de arranque con varias variantes. Tsbootstrap () de tseries proporciona estacionamiento rápido y bloques de arranque. El bootstrap máximo de entropía para series de tiempo está disponible en meboot. Timesboot calcula el CI de arranque para la muestra ACF y el periodograma. BootPR calcula la predicción corregida por sesgo y los intervalos de predicción de boostrap para series temporales autorregresivas. Datos de Makridakis, Wheelwright y Hyndman (1998) Pronóstico: los métodos y aplicaciones se proporcionan en el paquete fma. Datos de Hyndman, Koehler, Ord y Snyder (2008) Las predicciones con suavizado exponencial se encuentran en el paquete expsmooth. Datos de Hyndman y Athanasopoulos (2013) Pronóstico: los principios y la práctica están en el paquete fpp. Los datos de la competición M y la competencia M3 se proporcionan en el paquete Mcomp. Los datos de la competencia M4 se dan en M4comp. Mientras que Tcomp proporciona datos del Concurso de Pronósticos de Turismo de la IJF 2010. Pdfetch proporciona instalaciones para descargar series económicas y financieras de fuentes públicas. Los datos del portal en línea de Quandl a los conjuntos de datos financieros, económicos y sociales pueden ser consultados interactivamente usando el paquete de Quandl. Los datos del portal en línea de Datamarket se pueden obtener usando el paquete rdatamarket. Los datos de Cryer y Chan (2010) están en el paquete TSA. Los datos de Shumway y Stoffer (2011) están en el paquete astsa. Datos de Tsay (2005) El análisis de las series de tiempo financiero está en el paquete FinTS, junto con algunas funciones y archivos de script necesarios para trabajar algunos de los ejemplos. TSdbi proporciona una interfaz común a las bases de datos de series de tiempo. Fama proporciona una interfaz para las bases de datos de la serie de tiempo FAME AER y Ecdat contienen muchos conjuntos de datos (incluyendo datos de series de tiempo) de muchos libros de texto de econometría dtw. Algoritmos de deformación de tiempo dinámicos para calcular y trazar alineaciones por pares entre series de tiempo. EnsembleBMA. Modelo Bayesiano Promedio para crear pronósticos probabilísticos a partir de pronósticos de conjuntos y observaciones meteorológicas. Primeros avisos La alerta temprana señala una caja de herramientas para detectar transiciones críticas en eventos de series temporales. Convierte datos de eventos extraídos por máquina en series temporales multivariadas agregadas regulares. Comentarios. Análisis de la temporalidad fragmentada para investigar la retroalimentación en series de tiempo. LPStimeSeries tiene como objetivo encontrar la similitud de patrón de quotlearned para series de tiempo. MAR1 proporciona herramientas para preparar datos de series temporales de comunidades ecológicas para el modelado multivariable de AR. Redes Rutinas para la estimación de las redes parciales de correlación parcial de largo plazo para datos de series de tiempo. Paleots Modelización de la evolución en series temporales paleontológicas. Pastecs Regulación, descomposición y análisis de series espacio-temporales. Ptw. Deformación paramétrica del tiempo. RGENERATE proporciona herramientas para generar series de tiempo vectoriales. RMAWGEN es un conjunto de funciones S3 y S4 para la generación estocástica espacial multi-sitio de series cronológicas diarias de temperatura y precipitación utilizando modelos VAR. El paquete puede utilizarse en climatología e hidrología estadística. RSEIS. Herramientas sísmicas de análisis de series temporales. Rts. Análisis de series temporales de ráster (por ejemplo, series temporales de imágenes de satélite). Sae2. Modelos de series temporales para la estimación de áreas pequeñas. SpTimer. Modelización bayesiana espacio-temporal. vigilancia. Modelado temporal y espacio-temporal y seguimiento de fenómenos epidémicos. TED. Turbulencia de la serie temporal Detección y clasificación de eventos. Mareas Funciones para calcular características de series temporales cuasi periódicas, p. Niveles estuarinos de agua observados. Tigre. Los grupos resueltos temporalmente de las diferencias típicas (errores) entre dos series temporales son determinados y visualizados. TSMining. Minería de motivos univariados y multivariados en datos de series de tiempo. TsModel. Modelado de series temporales para la contaminación del aire y la salud. Paquetes de CRAN: Enlaces relacionados: Un tutorial completo sobre el modelado de series de tiempo en R Introducción 8216Time8217 es el factor más importante que asegura el éxito en un negocio. Es difícil mantener el paso del tiempo. Pero, la tecnología ha desarrollado algunos métodos poderosos con los cuales podemos ver cosas 8217 antes de tiempo. No se preocupe, no estoy hablando de Time Machine. Sería realista aquí hablar de los métodos de predicción y predicción de amplificadores. Uno de estos métodos, que se ocupa de datos basados en el tiempo, es Modelado en Serie de Tiempo. Como sugiere su nombre, implica trabajar en datos basados en el tiempo (años, días, horas, minutos) para obtener ideas ocultas para tomar decisiones informadas. Los modelos de series temporales son modelos muy útiles cuando se tienen datos correlacionados en serie. La mayoría de las casas comerciales trabajan en datos de series de tiempo para analizar el número de ventas para el próximo año, el tráfico del sitio web, la posición de la competencia y mucho más. Sin embargo, es también una de las áreas, que muchos analistas no entienden. Por lo tanto, si no está seguro sobre el proceso completo de modelado de series de tiempo, esta guía le presentará a varios niveles de modelado de series de tiempo y sus técnicas relacionadas. Los siguientes temas se tratan en este tutorial como se muestra a continuación: Tabla de contenidos Conceptos básicos 8211 Modelado de series de tiempo Exploración de series de tiempo Datos en R Introducción a ARMA Modelos de series de tiempo Modelado y aplicación de ARIMA Modelado de series de tiempo Tiempo de inicio 1. Conceptos básicos 8211 Tiempo Series Modeling Let8217s comienza desde lo básico. Esto incluye series estacionarias, paseos aleatorios. Rho Coeficiente, Dickey Fuller Prueba de estacionariedad. Si estos términos ya le están asustando, no se preocupe 8211 se harán claros en un poco y apuesto a que comenzará a disfrutar el tema como lo explico. Serie estacionaria Hay tres criterios básicos para que una serie sea clasificada como serie estacionaria: 1. La media de la serie no debe ser una función del tiempo sino una constante. La imagen de abajo tiene el gráfico de la mano izquierda que satisface la condición, mientras que el gráfico en rojo tiene una media dependiente del tiempo. 2. La varianza de la serie no debe ser una función del tiempo. Esta propiedad es conocida como homoscedasticidad. El siguiente gráfico representa lo que es y lo que no es una serie estacionaria. (Obsérvese la distribución variable de la distribución en el gráfico de la derecha) 3. La covarianza del i-ésimo término y del (i m) término no debe ser una función del tiempo. En el siguiente gráfico, notará que el spread se hace más cercano a medida que aumenta el tiempo. Por lo tanto, la covarianza no es constante con el tiempo para la serie 8217. ¿Por qué me importa 8216stationarity8217 de una serie de tiempo La razón por la que tomé esta sección primero fue que hasta que a menos que su serie de tiempo es estacionario, no se puede construir un modelo de series de tiempo. En los casos en que el criterio estacionario es violado, el primer requisito se convierte en estacionarizar la serie temporal y luego intentar modelos estocásticos para predecir esta serie temporal. Hay múltiples maneras de traer esta stationaridad. Algunos de ellos son Detrending, Differencing, etc Random Walk Este es el concepto más básico de la serie de tiempo. Es posible que conozcas bien el concepto. Pero, encontré a mucha gente en la industria que interpreta el paseo al azar como un proceso estacionario. En esta sección, con la ayuda de algunas matemáticas, voy a hacer este concepto de cristal claro para siempre. Tomemos un ejemplo. Ejemplo: Imagina a una chica moviéndose al azar en un tablero de ajedrez gigante. En este caso, la siguiente posición de la niña sólo depende de la última posición. Ahora imagine, usted está sentado en otra habitación y no son capaces de ver a la chica. Usted quiere predecir la posición de la niña con el tiempo. ¿Cuán preciso será usted? Por supuesto que se volverá cada vez más inexacto a medida que cambia la posición de la niña. En t0 sabes exactamente dónde está la chica. La próxima vez, ella sólo puede moverse a 8 cuadrados y por lo tanto su probabilidad de inmersiones a 1/8 en lugar de 1 y sigue bajando. Ahora vamos a intentar formular esta serie: donde Er (t) es el error en el punto temporal t. Esta es la aleatoriedad que la chica trae en cada momento. Ahora, si recurrimos de forma recursiva en todas las X, finalmente terminaremos con la siguiente ecuación: Ahora, probemos a validar nuestras suposiciones de series estacionarias sobre esta formulación de caminata aleatoria: 1. ¿Es la Constante Media Sabemos que la Expectativa de cualquier Error Será cero, ya que es aleatorio. Por lo tanto, obtenemos EX (t) EX (0) Constante. 2. ¿Es la variación constante? Por lo tanto, inferimos que la caminata aleatoria no es un proceso estacionario ya que tiene una varianza variante en el tiempo. Además, si comprobamos la covarianza, vemos que también depende del tiempo. Let8217s picante las cosas un poco, Ya sabemos que un paseo al azar es un proceso no estacionario. Vamos a introducir un nuevo coeficiente en la ecuación para ver si podemos hacer la formulación estacionaria. Coeficiente introducido. Rho Ahora, vamos a variar el valor de Rho para ver si podemos hacer la serie estacionaria. Aquí interpretaremos la dispersión visualmente y no haremos ninguna prueba para verificar la estacionariedad. Let8217s comienzan con una serie perfectamente estacionaria con Rho 0. Aquí está el diagrama para la serie de tiempo: Aumentar el valor de Rho a 0.5 nos da el siguiente gráfico: Usted puede notar que nuestros ciclos se han vuelto más amplios, pero esencialmente no parece ser un Grave violación de los supuestos estacionarios. Let8217s ahora tomar un caso más extremo de Rho 0.9 Todavía vemos que la X regresa de valores extremos a cero después de algunos intervalos. Esta serie tampoco está violando significativamente la no estacionariedad. Ahora, vamos a echar un vistazo a la caminata aleatoria con rho 1. Esto obviamente es una violación a las condiciones estacionarias. Lo que hace rho 1 un caso especial que sale mal en la prueba estacionaria Vamos a encontrar la razón matemática para esto. Los resultados obtenidos en la ecuación 8220X (t) Rho X (t-1) Er (t) 8221 Esta ecuación es muy perspicaz. The next X (or at time point t) is being pulled down to Rho Last value of X. For instance, if X(t 8211 1 ) 1, EX(t) 0.5 ( for Rho 0.5). Now, if X moves to any direction from zero, it is pulled back to zero in next step. The only component which can drive it even further is the error term. Error term is equally probable to go in either direction. What happens when the Rho becomes 1 No force can pull the X down in the next step. Dickey Fuller Test of Stationarity What you just learnt in the last section is formally known as Dickey Fuller test. Here is a small tweak which is made for our equation to convert it to a Dickey Fuller test: We have to test if Rho 8211 1 is significantly different than zero or not. If the null hypothesis gets rejected, we8217ll get a stationary time series. Stationary testing and converting a series into a stationary series are the most critical processes in a time series modelling. You need to memorize each and every detail of this concept to move on to the next step of time series modelling. Let8217s now consider an example to show you what a time series looks like. 2. Exploration of Time Series Data in R Here we8217ll learn to handle time series data on R. Our scope will be restricted to data exploring in a time series type of data set and not go to building time series models. I have used an inbuilt data set of R called AirPassengers. The dataset consists of monthly totals of international airline passengers, 1949 to 1960. Loading the Data Set Following is the code which will help you load the data set and spill out a few top level metrics. Important Inferences The year on year trend clearly shows that the passengers have been increasing without fail. The variance and the mean value in July and August is much higher than rest of the months. Even though the mean value of each month is quite different their variance is small. Hence, we have strong seasonal effect with a cycle of 12 months or less. Exploring data becomes most important in a time series model 8211 without this exploration, you will not know whether a series is stationary or not. As in this case we already know many details about the kind of model we are looking out for. Let8217s now take up a few time series models and their characteristics. We will also take this problem forward and make a few predictions. 3. Introduction to ARMA Time Series Modeling ARMA models are commonly used in time series modeling. In ARMA model, AR stands for auto-regression and MA stands for moving average. If these words sound intimidating to you, worry not 8211 I8217ll simplify these concepts in next few minutes for you We will now develop a knack for these terms and understand the characteristics associated with these models. But before we start, you should remember, AR or MA are not applicable on non-stationary series . In case you get a non stationary series, you first need to stationarize the series (by taking difference / transformation) and then choose from the available time series models. First, I8217ll explain each of these two models (AR amp MA) individually. Next, we will look at the characteristics of these models. Auto-Regressive Time Series Model Let8217s understanding AR models using the case below: The current GDP of a country say x(t) is dependent on the last year8217s GDP i. e. x(t 8211 1). The hypothesis being that the total cost of production of products amp services in a country in a fiscal year (known as GDP) is dependent on the set up of manufacturing plants / services in the previous year and the newly set up industries / plants / services in the current year. But the primary component of the GDP is the former one. Hence, we can formally write the equation of GDP as: This equation is known as AR(1) formulation . The numeral one (1) denotes that the next instance is solely dependent on the previous instance. The alpha is a coefficient which we seek so as to minimize the error function. Notice that x(t - 1) is indeed linked to x(t-2) in the same fashion. Hence, any shock to x(t) will gradually fade off in future. For instance, let8217s say x(t) is the number of juice bottles sold in a city on a particular day. During winters, very few vendors purchased juice bottles. Suddenly, on a particular day, the temperature rose and the demand of juice bottles soared to 1000. However, after a few days, the climate became cold again. But, knowing that the people got used to drinking juice during the hot days, there were 50 of the people still drinking juice during the cold days. In following days, the proportion went down to 25 (50 of 50) and then gradually to a small number after significant number of days. The following graph explains the inertia property of AR series: Moving Average Time Series Model Let8217s take another case to understand Moving average time series model. A manufacturer produces a certain type of bag, which was readily available in the market. Being a competitive market, the sale of the bag stood at zero for many days. So, one day he did some experiment with the design and produced a different type of bag. This type of bag was not available anywhere in the market. Thus, he was able to sell the entire stock of 1000 bags (lets call this as x(t) ). The demand got so high that the bag ran out of stock. As a result, some 100 odd customers couldn8217t purchase this bag. Lets call this gap as the error at that time point. With time, the bag had lost its woo factor. But still few customers were left who went empty handed the previous day. Following is a simple formulation to depict the scenario : If we try plotting this graph, it will look something like this : Did you notice the difference between MA and AR model In MA model, noise / shock quickly vanishes with time. The AR model has a much lasting effect of the shock. Difference between AR and MA models The primary difference between an AR and MA model is based on the correlation between time series objects at different time points. The correlation between x(t) and x(t-n) for n gt order of MA is always zero. This directly flows from the fact that covariance between x(t) and x(t-n) is zero for MA models (something which we refer from the example taken in the previous section). However, the correlation of x(t) and x(t-n) gradually declines with n becoming larger in the AR model. This difference gets exploited irrespective of having the AR model or MA model. The correlation plot can give us the order of MA model. Exploiting ACF and PACF plots Once we have got the stationary time series, we must answer two primary questions: Q1. Is it an AR or MA process Q2. What order of AR or MA process do we need to use The trick to solve these questions is available in the previous section. Didn8217t you notice The first question can be answered using Total Correlation Chart (also known as Auto 8211 correlation Function / ACF). ACF is a plot of total correlation between different lag functions. For instance, in GDP problem, the GDP at time point t is x(t). We are interested in the correlation of x(t) with x(t-1). x(t-2) and so on. Now let8217s reflect on what we have learnt above. In a moving average series of lag n, we will not get any correlation between x(t) and x(t 8211 n -1). Hence, the total correlation chart cuts off at nth lag. So it becomes simple to find the lag for a MA series. For an AR series this correlation will gradually go down without any cut off value. So what do we do if it is an AR series Here is the second trick. If we find out the partial correlation of each lag, it will cut off after the degree of AR series. For instance, if we have a AR(1) series, if we exclude the effect of 1st lag (x (t-1) ), our 2nd lag (x (t-2) ) is independent of x(t). Hence, the partial correlation function (PACF) will drop sharply after the 1st lag. Following are the examples which will clarify any doubts you have on this concept : The blue line above shows significantly different values than zero. Clearly, the graph above has a cut off on PACF curve after 2nd lag which means this is mostly an AR(2) process. Clearly, the graph above has a cut off on ACF curve after 2nd lag which means this is mostly a MA(2) process. Till now, we have covered on how to identify the type of stationary series using ACF amp PACF plots. Now, I8217ll introduce you to a comprehensive framework to build a time series model. In addition, we8217ll also discuss about the practical applications of time series modelling. 4. Framework and Application of ARIMA Time Series Modeling A quick revision, Till here we8217ve learnt basics of time series modeling, time series in R and ARMA modeling. Now is the time to join these pieces and make an interesting story. Overview of the Framework This framework(shown below) specifies the step by step approach on 8216 How to do a Time Series Analysis 8216: As you would be aware, the first three steps have already been discussed above. Nevertheless, the same has been delineated briefly below: Step 1: Visualize the Time Series It is essential to analyze the trends prior to building any kind of time series model. The details we are interested in pertains to any kind of trend, seasonality or random behaviour in the series. We have covered this part in the second part of this series. Step 2: Stationarize the Series Once we know the patterns, trends, cycles and seasonality. we can check if the series is stationary or not. Dickey 8211 Fuller is one of the popular test to check the same. We have covered this test in the first part of this article series. This doesn8217t ends here What if the series is found to be non-stationary There are three commonly used technique to make a time series stationary: 1. Detrending . Here, we simply remove the trend component from the time series. For instance, the equation of my time series is: We8217ll simply remove the part in the parentheses and build model for the rest. 2. Differencing . This is the commonly used technique to remove non-stationarity. Here we try to model the differences of the terms and not the actual term. For instance, This differencing is called as the Integration part in AR(I)MA. Now, we have three parameters 3. Seasonality . Seasonality can easily be incorporated in the ARIMA model directly. More on this has been discussed in the applications part below. Step 3: Find Optimal Parameters The parameters p, d,q can be found using ACF and PACF plots. An addition to this approach is can be, if both ACF and PACF decreases gradually, it indicates that we need to make the time series stationary and introduce a value to 8220d8221. Step 4: Build ARIMA Model With the parameters in hand, we can now try to build ARIMA model. The value found in the previous section might be an approximate estimate and we need to explore more (p, d,q) combinations. The one with the lowest BIC and AIC should be our choice. We can also try some models with a seasonal component. Just in case, we notice any seasonality in ACF/PACF plots. Step 5: Make Predictions Once we have the final ARIMA model, we are now ready to make predictions on the future time points. We can also visualize the trends to cross validate if the model works fine. Applications of Time Series Model Now, we8217ll use the same example that we have used above. Then, using time series, we8217ll make future predictions. We recommend you to check out the example before proceeding further. Where did we start Following is the plot of the number of passengers with years. Try and make observations on this plot before moving further in the article. Here are my observations : 1. There is a trend component which grows the passenger year by year. 2. There looks to be a seasonal component which has a cycle less than 12 months. 3. The variance in the data keeps on increasing with time. We know that we need to address two issues before we test stationary series. One, we need to remove unequal variances. We do this using log of the series. Two, we need to address the trend component. We do this by taking difference of the series. Now, let8217s test the resultant series. Augmented Dickey-Fuller Test We see that the series is stationary enough to do any kind of time series modelling. Next step is to find the right parameters to be used in the ARIMA model. We already know that the 8216d8217 component is 1 as we need 1 difference to make the series stationary. We do this using the Correlation plots. Following are the ACF plots for the series : What do you see in the chart shown above Clearly, the decay of ACF chart is very slow, which means that the population is not stationary. We have already discussed above that we now intend to regress on the difference of logs rather than log directly. Let8217s see how ACF and PACF curve come out after regressing on the difference. Clearly, ACF plot cuts off after the first lag. Hence, we understood that value of p should be 0 as the ACF is the curve getting a cut off. While value of q should be 1 or 2. After a few iterations, we found that (0,1,1) as (p, d,q) comes out to be the combination with least AIC and BIC. Let8217s fit an ARIMA model and predict the future 10 years. Also, we will try fitting in a seasonal component in the ARIMA formulation. Then, we will visualize the prediction along with the training data. You can use the following code to do the same : End Notes With this, we come to this end of tutorial on Time Series Modeling. I hope this will help you to improve your knowledge to work on time based data. To reap maximum benefits out of this tutorial, I8217d suggest you to practice these R codes side by side and check your progress. Did you find the article useful Share with us if you have done similar kind of analysis before. Do let us know your thoughts about this article in the box below. If you like what you just read amp want to continue your analytics learning, subscribe to our emails. follow us on twitter or like our facebook page . Share this: Hi Tavish. First off all, congratulations on your work around here. It8217s been very useful. Thank you I a doubt and i hope that you can help me I performed a Dickey-Fuller test on both series AirPassengers and diff( log(AirPassengers)) Here the results: Augmented Dickey-Fuller Test data: diff(log(AirPassengers)) Dickey-Fuller -9.6003, Lag order 0, p-value 0.01 alternative hypothesis: stationary Augmented Dickey-Fuller Test data: diff(log(AirPassengers)) Dickey-Fuller -9.6003, Lag order 0, p-value 0.01 alternative hypothesis: stationary In both tests i got a small p-value that allows me to reject the non stationary hypothesis. Am I right If so, the first series is already stationary This means that if i had performed a stationary test on the original series had move on to the next step. Gracias de antemano. Now with the right results . Augmented Dickey-Fuller Test data: AirPassengers Dickey-Fuller -4.6392, Lag order 0, p-value 0.01 alternative hypothesis: stationary Augmented Dickey-Fuller Test data: diff(log(AirPassengers)) Dickey-Fuller -9.6003, Lag order 0, p-value 0.01 alternative hypothesis: stationary Yes, the adf. test(AirPassengers) indicates that the series is stationary. This is a bit misleading. Reason: This test first does a de-trend on the series, (ie. removes the trend component), then checks for stationarity. Hence it flags the series as stationary. There is another test in package fUnitRoots. Please try this code: Start install. packages(8220fUnitRoots8221) If you already have installed this package, you can omit this line library(fUnitRoots) adfTest(AirPassengers) adfTest(log(AirPassengers)) adfTest(diff(AirPassengers)) End Hope this helps.. thanks Ram, I had the same question as Hugo and your explanation helped I just wanted to point out for the benefit of anyone else looking at this that R is cap sensitive, do not forget to capitalize the T in adfTest else your function will not work. Fortunately the auto. arima function allows us to model time series quite nicely though it is quite useful to know the basics. Here is some code I wrote on the same data Hi, After you run this pred lt - predict(APmodel, n. ahead1012) take a look at 039pred039 It is a list of 2 (pred and se 8211 I assume these are predictions and errors.) I would suggest using a name other than pred in the predict function to avoid confusion. I used the following APforecast lt - predict(APmodel, n. ahead1012) So APforecast is a list of pred and se and we need to plot the pred values. ie APforecastpred Also we did the arima on log of AirPassengers, so the forecast we have got is actually log of the true forecast. Hence we need to find the log inverse of what we have got. es decir. log(forecast) APforecastpred so forecast e APforecastpred e 2.718 If you find that confusing, I would suggest reading up on natural logarithms and their inverse the log quoty039 is to plot on a logarithmic scale 8211 this is not needed, try the function without it and with and observe the results. The lty bit I have not figured out yet. Drop it and try the ts. plot, it works fine. Hey Amy, ts. plot() will plot several time series on the same plot. The first two entries are the two time series he8217s plotting. The last two entries are nice visual parameters (we8217ll come back to that). Clearly, this plots the AirPassengers time series in a dark, continuous line. The second entry is also a time series, but it is a little more confusing: 8221 2.718predpred8221. First, you have to know what predpred is. The function predict() here is a generic function that will work differently for different classes plugged into it (it says so if you type predict). The class we8217re working with is an Arima class. If you type predict. Arima you will find a good description of what the function is all about. predict. Arima() spits out something with a 8220pred8221 part (for predict) and a 8220se8221 part (for standard error). We want the 8220pred8221 part, hence predpred. So, predpred is a time series. Now, 2.718predpred is also. You have to remember that 2.718 is approximately the constant e, and then this makes sense. He8217s just undoing the log that he placed on the data when he created 8220fit8221. As for the last two parameters, log 8220y8221 sets the y-axis to be on a log scale. And finally, lty c(1,3) will set the LineTYpe to 1 (for solid) for the original time series and 3 (for dotted) for the predicted time series. Hey Tavish, really enjoyed the content, Just a small doubt: Can you please ebaorate the covariance in stationary terms. I understand the covariance term, but here in time series, it is not coming to my mind. Can you please help me understand the third condition of stationary series i. e 8220The covariance of the i th term and the (i m) th term should not be a function of time.8221 Please help me understand from data perspective e. g if i have sales data for each date. how can you explain convariance in real life example with daily sales data. Parth Gera says: Hi Tavish, Thanks a lot. This article was immensely helpful . I just had one small issue. After the last step, If I want to extract the predicted values from the curve. How do we do that You get the predicted values from the variable pred. pred is a list with two items: pred and se. ( prediction and standard error). To see the predictions, use this command: print(predpred) Parth Gera says: Hi Ram, Thanks for your help. Yeah, print(predpred) would give us log of the predicted values. print(2.718predpred) would give us the actual predicted values. Thanks Yes, if you use 8216log8217 when creating the model, you will use antilog or exponent to get the predicted values. If you create a model without the log function, you will not use exponent to get the predicted values how to extract the data for the predicted and actual values from R hello, the data you used in your tutorial, AirPassengers, is already a time series object. my question is, HOW can i make/prepare my own time series object i currently have a historical currency exchange data set, with first column being date, and the rest 20 columns are titled by country, and their values are the exchange rate. after i convert my date column into date object, when i use the same commands used in your tutorial, the results are funny. for example, start(dataDate) will give me a result of: 1 1 1 and frequency(dataDate) will return: 1 1 can you please explain HOW to prepare our data accordingly so we can use the functions thank you If you type in ts then you should be on your way. You only need a (single) time series, a frequency, and a start date. The examples at the bottom of the documentation should be very helpful. I8217m guessing you8217d write something like ts( yourtimeseriesdata, frequency 365, start c(1980, 153)) for instance if your data started on the 153rd day of 1980.Using R for Time Series Analysis Time Series Analysis This booklet itells you how to use the R statistical software to carry out some simple analyses that are common in analysing time series data. This booklet assumes that the reader has some basic knowledge of time series analysis, and the principal focus of the booklet is not to explain time series analysis, but rather to explain how to carry out these analyses using R. If you are new to time series analysis, and want to learn more about any of the concepts presented here, I would highly recommend the Open University book 8220Time series8221 (product code M249/02), available from from the Open University Shop . In this booklet, I will be using time series data sets that have been kindly made available by Rob Hyndman in his Time Series Data Library at robjhyndman/TSDL/ . If you like this booklet, you may also like to check out my booklet on using R for biomedical statistics, a-little-book-of-r-for-biomedical-statistics. readthedocs. org/. and my booklet on using R for multivariate analysis, little-book-of-r-for-multivariate-analysis. readthedocs. org/ . Reading Time Series Data The first thing that you will want to do to analyse your time series data will be to read it into R, and to plot the time series. You can read data into R using the scan() function, which assumes that your data for successive time points is in a simple text file with one column. For example, the file robjhyndman/tsdldata/misc/kings. dat contains data on the age of death of successive kings of England, starting with William the Conqueror (original source: Hipel and Mcleod, 1994). The data set looks like this: Only the first few lines of the file have been shown. The first three lines contain some comment on the data, and we want to ignore this when we read the data into R. We can use this by using the 8220skip8221 parameter of the scan() function, which specifies how many lines at the top of the file to ignore. To read the file into R, ignoring the first three lines, we type: In this case the age of death of 42 successive kings of England has been read into the variable 8216kings8217. Once you have read the time series data into R, the next step is to store the data in a time series object in R, so that you can use R8217s many functions for analysing time series data. To store the data in a time series object, we use the ts() function in R. For example, to store the data in the variable 8216kings8217 as a time series object in R, we type: Sometimes the time series data set that you have may have been collected at regular intervals that were less than one year, for example, monthly or quarterly. In this case, you can specify the number of times that data was collected per year by using the 8216frequency8217 parameter in the ts() function. For monthly time series data, you set frequency12, while for quarterly time series data, you set frequency4. You can also specify the first year that the data was collected, and the first interval in that year by using the 8216start8217 parameter in the ts() function. For example, if the first data point corresponds to the second quarter of 1986, you would set startc(1986,2). An example is a data set of the number of births per month in New York city, from January 1946 to December 1959 (originally collected by Newton). This data is available in the file robjhyndman/tsdldata/data/nybirths. dat We can read the data into R, and store it as a time series object, by typing: Similarly, the file robjhyndman/tsdldata/data/fancy. dat contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, for January 1987-December 1993 (original data from Wheelwright and Hyndman, 1998). We can read the data into R by typing: Plotting Time Series Once you have read a time series into R, the next step is usually to make a plot of the time series data, which you can do with the plot. ts() function in R. For example, to plot the time series of the age of death of 42 successive kings of England, we type: We can see from the time plot that this time series could probably be described using an additive model, since the random fluctuations in the data are roughly constant in size over time. Likewise, to plot the time series of the number of births per month in New York city, we type: We can see from this time series that there seems to be seasonal variation in the number of births per month: there is a peak every summer, and a trough every winter. Again, it seems that this time series could probably be described using an additive model, as the seasonal fluctuations are roughly constant in size over time and do not seem to depend on the level of the time series, and the random fluctuations also seem to be roughly constant in size over time. Similarly, to plot the time series of the monthly sales for the souvenir shop at a beach resort town in Queensland, Australia, we type: In this case, it appears that an additive model is not appropriate for describing this time series, since the size of the seasonal fluctuations and random fluctuations seem to increase with the level of the time series. Thus, we may need to transform the time series in order to get a transformed time series that can be described using an additive model. For example, we can transform the time series by calculating the natural log of the original data: Here we can see that the size of the seasonal fluctuations and random fluctuations in the log-transformed time series seem to be roughly constant over time, and do not depend on the level of the time series. Thus, the log-transformed time series can probably be described using an additive model. Decomposing Time Series Decomposing a time series means separating it into its constituent components, which are usually a trend component and an irregular component, and if it is a seasonal time series, a seasonal component. Decomposing Non-Seasonal Data A non-seasonal time series consists of a trend component and an irregular component. Decomposing the time series involves trying to separate the time series into these components, that is, estimating the the trend component and the irregular component. To estimate the trend component of a non-seasonal time series that can be described using an additive model, it is common to use a smoothing method, such as calculating the simple moving average of the time series. The SMA() function in the 8220TTR8221 R package can be used to smooth time series data using a simple moving average. To use this function, we first need to install the 8220TTR8221 R package (for instructions on how to install an R package, see How to install an R package ). Once you have installed the 8220TTR8221 R package, you can load the 8220TTR8221 R package by typing: You can then use the 8220SMA()8221 function to smooth time series data. To use the SMA() function, you need to specify the order (span) of the simple moving average, using the parameter 8220n8221. For example, to calculate a simple moving average of order 5, we set n5 in the SMA() function. For example, as discussed above, the time series of the age of death of 42 successive kings of England appears is non-seasonal, and can probably be described using an additive model, since the random fluctuations in the data are roughly constant in size over time: Thus, we can try to estimate the trend component of this time series by smoothing using a simple moving average. To smooth the time series using a simple moving average of order 3, and plot the smoothed time series data, we type: There still appears to be quite a lot of random fluctuations in the time series smoothed using a simple moving average of order 3. Thus, to estimate the trend component more accurately, we might want to try smoothing the data with a simple moving average of a higher order. This takes a little bit of trial-and-error, to find the right amount of smoothing. For example, we can try using a simple moving average of order 8: The data smoothed with a simple moving average of order 8 gives a clearer picture of the trend component, and we can see that the age of death of the English kings seems to have decreased from about 55 years old to about 38 years old during the reign of the first 20 kings, and then increased after that to about 73 years old by the end of the reign of the 40th king in the time series. Decomposing Seasonal Data A seasonal time series consists of a trend component, a seasonal component and an irregular component. Decomposing the time series means separating the time series into these three components: that is, estimating these three components. To estimate the trend component and seasonal component of a seasonal time series that can be described using an additive model, we can use the 8220decompose()8221 function in R. This function estimates the trend, seasonal, and irregular components of a time series that can be described using an additive model. The function 8220decompose()8221 returns a list object as its result, where the estimates of the seasonal component, trend component and irregular component are stored in named elements of that list objects, called 8220seasonal8221, 8220trend8221, and 8220random8221 respectively. For example, as discussed above, the time series of the number of births per month in New York city is seasonal with a peak every summer and trough every winter, and can probably be described using an additive model since the seasonal and random fluctuations seem to be roughly constant in size over time: To estimate the trend, seasonal and irregular components of this time series, we type: The estimated values of the seasonal, trend and irregular components are now stored in variables birthstimeseriescomponentsseasonal, birthstimeseriescomponentstrend and birthstimeseriescomponentsrandom. For example, we can print out the estimated values of the seasonal component by typing: The estimated seasonal factors are given for the months January-December, and are the same for each year. The largest seasonal factor is for July (about 1.46), and the lowest is for February (about -2.08), indicating that there seems to be a peak in births in July and a trough in births in February each year. We can plot the estimated trend, seasonal, and irregular components of the time series by using the 8220plot()8221 function, for example: The plot above shows the original time series (top), the estimated trend component (second from top), the estimated seasonal component (third from top), and the estimated irregular component (bottom). We see that the estimated trend component shows a small decrease from about 24 in 1947 to about 22 in 1948, followed by a steady increase from then on to about 27 in 1959. Seasonally Adjusting If you have a seasonal time series that can be described using an additive model, you can seasonally adjust the time series by estimating the seasonal component, and subtracting the estimated seasonal component from the original time series. We can do this using the estimate of the seasonal component calculated by the 8220decompose()8221 function. For example, to seasonally adjust the time series of the number of births per month in New York city, we can estimate the seasonal component using 8220decompose()8221, and then subtract the seasonal component from the original time series: We can then plot the seasonally adjusted time series using the 8220plot()8221 function, by typing: You can see that the seasonal variation has been removed from the seasonally adjusted time series. The seasonally adjusted time series now just contains the trend component and an irregular component. Forecasts using Exponential Smoothing Exponential smoothing can be used to make short-term forecasts for time series data. Simple Exponential Smoothing If you have a time series that can be described using an additive model with constant level and no seasonality, you can use simple exponential smoothing to make short-term forecasts. The simple exponential smoothing method provides a way of estimating the level at the current time point. Smoothing is controlled by the parameter alpha for the estimate of the level at the current time point. The value of alpha lies between 0 and 1. Values of alpha that are close to 0 mean that little weight is placed on the most recent observations when making forecasts of future values. For example, the file robjhyndman/tsdldata/hurst/precip1.dat contains total annual rainfall in inches for London, from 1813-1912 (original data from Hipel and McLeod, 1994). We can read the data into R and plot it by typing: You can see from the plot that there is roughly constant level (the mean stays constant at about 25 inches). The random fluctuations in the time series seem to be roughly constant in size over time, so it is probably appropriate to describe the data using an additive model. Thus, we can make forecasts using simple exponential smoothing. To make forecasts using simple exponential smoothing in R, we can fit a simple exponential smoothing predictive model using the 8220HoltWinters()8221 function in R. To use HoltWinters() for simple exponential smoothing, we need to set the parameters betaFALSE and gammaFALSE in the HoltWinters() function (the beta and gamma parameters are used for Holt8217s exponential smoothing, or Holt-Winters exponential smoothing, as described below). The HoltWinters() function returns a list variable, that contains several named elements. For example, to use simple exponential smoothing to make forecasts for the time series of annual rainfall in London, we type: The output of HoltWinters() tells us that the estimated value of the alpha parameter is about 0.024. This is very close to zero, telling us that the forecasts are based on both recent and less recent observations (although somewhat more weight is placed on recent observations). By default, HoltWinters() just makes forecasts for the same time period covered by our original time series. In this case, our original time series included rainfall for London from 1813-1912, so the forecasts are also for 1813-1912. In the example above, we have stored the output of the HoltWinters() function in the list variable 8220rainseriesforecasts8221. The forecasts made by HoltWinters() are stored in a named element of this list variable called 8220fitted8221, so we can get their values by typing: We can plot the original time series against the forecasts by typing: The plot shows the original time series in black, and the forecasts as a red line. The time series of forecasts is much smoother than the time series of the original data here. As a measure of the accuracy of the forecasts, we can calculate the sum of squared errors for the in-sample forecast errors, that is, the forecast errors for the time period covered by our original time series. The sum-of-squared-errors is stored in a named element of the list variable 8220rainseriesforecasts8221 called 8220SSE8221, so we can get its value by typing: That is, here the sum-of-squared-errors is 1828.855. It is common in simple exponential smoothing to use the first value in the time series as the initial value for the level. For example, in the time series for rainfall in London, the first value is 23.56 (inches) for rainfall in 1813. You can specify the initial value for the level in the HoltWinters() function by using the 8220l. start8221 parameter. For example, to make forecasts with the initial value of the level set to 23.56, we type: As explained above, by default HoltWinters() just makes forecasts for the time period covered by the original data, which is 1813-1912 for the rainfall time series. We can make forecasts for further time points by using the 8220forecast. HoltWinters()8221 function in the R 8220forecast8221 package. To use the forecast. HoltWinters() function, we first need to install the 8220forecast8221 R package (for instructions on how to install an R package, see How to install an R package ). Once you have installed the 8220forecast8221 R package, you can load the 8220forecast8221 R package by typing: When using the forecast. HoltWinters() function, as its first argument (input), you pass it the predictive model that you have already fitted using the HoltWinters() function. For example, in the case of the rainfall time series, we stored the predictive model made using HoltWinters() in the variable 8220rainseriesforecasts8221. You specify how many further time points you want to make forecasts for by using the 8220h8221 parameter in forecast. HoltWinters(). For example, to make a forecast of rainfall for the years 1814-1820 (8 more years) using forecast. HoltWinters(), we type: The forecast. HoltWinters() function gives you the forecast for a year, a 80 prediction interval for the forecast, and a 95 prediction interval for the forecast. For example, the forecasted rainfall for 1920 is about 24.68 inches, with a 95 prediction interval of (16.24, 33.11). To plot the predictions made by forecast. HoltWinters(), we can use the 8220plot. forecast()8221 function: Here the forecasts for 1913-1920 are plotted as a blue line, the 80 prediction interval as an orange shaded area, and the 95 prediction interval as a yellow shaded area. The 8216forecast errors8217 are calculated as the observed values minus predicted values, for each time point. We can only calculate the forecast errors for the time period covered by our original time series, which is 1813-1912 for the rainfall data. As mentioned above, one measure of the accuracy of the predictive model is the sum-of-squared-errors (SSE) for the in-sample forecast errors. The in-sample forecast errors are stored in the named element 8220residuals8221 of the list variable returned by forecast. HoltWinters(). If the predictive model cannot be improved upon, there should be no correlations between forecast errors for successive predictions. In other words, if there are correlations between forecast errors for successive predictions, it is likely that the simple exponential smoothing forecasts could be improved upon by another forecasting technique. To figure out whether this is the case, we can obtain a correlogram of the in-sample forecast errors for lags 1-20. We can calculate a correlogram of the forecast errors using the 8220acf()8221 function in R. To specify the maximum lag that we want to look at, we use the 8220lag. max8221 parameter in acf(). For example, to calculate a correlogram of the in-sample forecast errors for the London rainfall data for lags 1-20, we type: You can see from the sample correlogram that the autocorrelation at lag 3 is just touching the significance bounds. To test whether there is significant evidence for non-zero correlations at lags 1-20, we can carry out a Ljung-Box test. This can be done in R using the 8220Box. test()8221, function. The maximum lag that we want to look at is specified using the 8220lag8221 parameter in the Box. test() function. For example, to test whether there are non-zero autocorrelations at lags 1-20, for the in-sample forecast errors for London rainfall data, we type: Here the Ljung-Box test statistic is 17.4, and the p-value is 0.6, so there is little evidence of non-zero autocorrelations in the in-sample forecast errors at lags 1-20. To be sure that the predictive model cannot be improved upon, it is also a good idea to check whether the forecast errors are normally distributed with mean zero and constant variance. To check whether the forecast errors have constant variance, we can make a time plot of the in-sample forecast errors: The plot shows that the in-sample forecast errors seem to have roughly constant variance over time, although the size of the fluctuations in the start of the time series (1820-1830) may be slightly less than that at later dates (eg. 1840-1850). To check whether the forecast errors are normally distributed with mean zero, we can plot a histogram of the forecast errors, with an overlaid normal curve that has mean zero and the same standard deviation as the distribution of forecast errors. To do this, we can define an R function 8220plotForecastErrors()8221, below: You will have to copy the function above into R in order to use it. You can then use plotForecastErrors() to plot a histogram (with overlaid normal curve) of the forecast errors for the rainfall predictions: The plot shows that the distribution of forecast errors is roughly centred on zero, and is more or less normally distributed, although it seems to be slightly skewed to the right compared to a normal curve. However, the right skew is relatively small, and so it is plausible that the forecast errors are normally distributed with mean zero. The Ljung-Box test showed that there is little evidence of non-zero autocorrelations in the in-sample forecast errors, and the distribution of forecast errors seems to be normally distributed with mean zero. This suggests that the simple exponential smoothing method provides an adequate predictive model for London rainfall, which probably cannot be improved upon. Furthermore, the assumptions that the 80 and 95 predictions intervals were based upon (that there are no autocorrelations in the forecast errors, and the forecast errors are normally distributed with mean zero and constant variance) are probably valid. Holt8217s Exponential Smoothing If you have a time series that can be described using an additive model with increasing or decreasing trend and no seasonality, you can use Holt8217s exponential smoothing to make short-term forecasts. Holt8217s exponential smoothing estimates the level and slope at the current time point. Smoothing is controlled by two parameters, alpha, for the estimate of the level at the current time point, and beta for the estimate of the slope b of the trend component at the current time point. As with simple exponential smoothing, the paramters alpha and beta have values between 0 and 1, and values that are close to 0 mean that little weight is placed on the most recent observations when making forecasts of future values. An example of a time series that can probably be described using an additive model with a trend and no seasonality is the time series of the annual diameter of women8217s skirts at the hem, from 1866 to 1911. The data is available in the file robjhyndman/tsdldata/roberts/skirts. dat (original data from Hipel and McLeod, 1994). We can read in and plot the data in R by typing: We can see from the plot that there was an increase in hem diameter from about 600 in 1866 to about 1050 in 1880, and that afterwards the hem diameter decreased to about 520 in 1911. To make forecasts, we can fit a predictive model using the HoltWinters() function in R. To use HoltWinters() for Holt8217s exponential smoothing, we need to set the parameter gammaFALSE (the gamma parameter is used for Holt-Winters exponential smoothing, as described below). For example, to use Holt8217s exponential smoothing to fit a predictive model for skirt hem diameter, we type: The estimated value of alpha is 0.84, and of beta is 1.00. These are both high, telling us that both the estimate of the current value of the level, and of the slope b of the trend component, are based mostly upon very recent observations in the time series. This makes good intuitive sense, since the level and the slope of the time series both change quite a lot over time. The value of the sum-of-squared-errors for the in-sample forecast errors is 16954. We can plot the original time series as a black line, with the forecasted values as a red line on top of that, by typing: We can see from the picture that the in-sample forecasts agree pretty well with the observed values, although they tend to lag behind the observed values a little bit. If you wish, you can specify the initial values of the level and the slope b of the trend component by using the 8220l. start8221 and 8220b. start8221 arguments for the HoltWinters() function. It is common to set the initial value of the level to the first value in the time series (608 for the skirts data), and the initial value of the slope to the second value minus the first value (9 for the skirts data). For example, to fit a predictive model to the skirt hem data using Holt8217s exponential smoothing, with initial values of 608 for the level and 9 for the slope b of the trend component, we type: As for simple exponential smoothing, we can make forecasts for future times not covered by the original time series by using the forecast. HoltWinters() function in the 8220forecast8221 package. For example, our time series data for skirt hems was for 1866 to 1911, so we can make predictions for 1912 to 1930 (19 more data points), and plot them, by typing: The forecasts are shown as a blue line, with the 80 prediction intervals as an orange shaded area, and the 95 prediction intervals as a yellow shaded area. As for simple exponential smoothing, we can check whether the predictive model could be improved upon by checking whether the in-sample forecast errors show non-zero autocorrelations at lags 1-20. For example, for the skirt hem data, we can make a correlogram, and carry out the Ljung-Box test, by typing: Here the correlogram shows that the sample autocorrelation for the in-sample forecast errors at lag 5 exceeds the significance bounds. However, we would expect one in 20 of the autocorrelations for the first twenty lags to exceed the 95 significance bounds by chance alone. Indeed, when we carry out the Ljung-Box test, the p-value is 0.47, indicating that there is little evidence of non-zero autocorrelations in the in-sample forecast errors at lags 1-20. As for simple exponential smoothing, we should also check that the forecast errors have constant variance over time, and are normally distributed with mean zero. We can do this by making a time plot of forecast errors, and a histogram of the distribution of forecast errors with an overlaid normal curve: The time plot of forecast errors shows that the forecast errors have roughly constant variance over time. The histogram of forecast errors show that it is plausible that the forecast errors are normally distributed with mean zero and constant variance. Thus, the Ljung-Box test shows that there is little evidence of autocorrelations in the forecast errors, while the time plot and histogram of forecast errors show that it is plausible that the forecast errors are normally distributed with mean zero and constant variance. Therefore, we can conclude that Holt8217s exponential smoothing provides an adequate predictive model for skirt hem diameters, which probably cannot be improved upon. In addition, it means that the assumptions that the 80 and 95 predictions intervals were based upon are probably valid. Holt-Winters Exponential Smoothing If you have a time series that can be described using an additive model with increasing or decreasing trend and seasonality, you can use Holt-Winters exponential smoothing to make short-term forecasts. Holt-Winters exponential smoothing estimates the level, slope and seasonal component at the current time point. Smoothing is controlled by three parameters: alpha, beta, and gamma, for the estimates of the level, slope b of the trend component, and the seasonal component, respectively, at the current time point. The parameters alpha, beta and gamma all have values between 0 and 1, and values that are close to 0 mean that relatively little weight is placed on the most recent observations when making forecasts of future values. An example of a time series that can probably be described using an additive model with a trend and seasonality is the time series of the log of monthly sales for the souvenir shop at a beach resort town in Queensland, Australia (discussed above): To make forecasts, we can fit a predictive model using the HoltWinters() function. For example, to fit a predictive model for the log of the monthly sales in the souvenir shop, we type: The estimated values of alpha, beta and gamma are 0.41, 0.00, and 0.96, respectively. The value of alpha (0.41) is relatively low, indicating that the estimate of the level at the current time point is based upon both recent observations and some observations in the more distant past. The value of beta is 0.00, indicating that the estimate of the slope b of the trend component is not updated over the time series, and instead is set equal to its initial value. This makes good intuitive sense, as the level changes quite a bit over the time series, but the slope b of the trend component remains roughly the same. In contrast, the value of gamma (0.96) is high, indicating that the estimate of the seasonal component at the current time point is just based upon very recent observations. As for simple exponential smoothing and Holt8217s exponential smoothing, we can plot the original time series as a black line, with the forecasted values as a red line on top of that: We see from the plot that the Holt-Winters exponential method is very successful in predicting the seasonal peaks, which occur roughly in November every year. To make forecasts for future times not included in the original time series, we use the 8220forecast. HoltWinters()8221 function in the 8220forecast8221 package. For example, the original data for the souvenir sales is from January 1987 to December 1993. If we wanted to make forecasts for January 1994 to December 1998 (48 more months), and plot the forecasts, we would type: The forecasts are shown as a blue line, and the orange and yellow shaded areas show 80 and 95 prediction intervals, respectively. We can investigate whether the predictive model can be improved upon by checking whether the in-sample forecast errors show non-zero autocorrelations at lags 1-20, by making a correlogram and carrying out the Ljung-Box test: The correlogram shows that the autocorrelations for the in-sample forecast errors do not exceed the significance bounds for lags 1-20. Furthermore, the p-value for Ljung-Box test is 0.6, indicating that there is little evidence of non-zero autocorrelations at lags 1-20. We can check whether the forecast errors have constant variance over time, and are normally distributed with mean zero, by making a time plot of the forecast errors and a histogram (with overlaid normal curve): From the time plot, it appears plausible that the forecast errors have constant variance over time. From the histogram of forecast errors, it seems plausible that the forecast errors are normally distributed with mean zero. Thus, there is little evidence of autocorrelation at lags 1-20 for the forecast errors, and the forecast errors appear to be normally distributed with mean zero and constant variance over time. This suggests that Holt-Winters exponential smoothing provides an adequate predictive model of the log of sales at the souvenir shop, which probably cannot be improved upon. Furthermore, the assumptions upon which the prediction intervals were based are probably valid. ARIMA Models Exponential smoothing methods are useful for making forecasts, and make no assumptions about the correlations between successive values of the time series. However, if you want to make prediction intervals for forecasts made using exponential smoothing methods, the prediction intervals require that the forecast errors are uncorrelated and are normally distributed with mean zero and constant variance. While exponential smoothing methods do not make any assumptions about correlations between successive values of the time series, in some cases you can make a better predictive model by taking correlations in the data into account. Autoregressive Integrated Moving Average (ARIMA) models include an explicit statistical model for the irregular component of a time series, that allows for non-zero autocorrelations in the irregular component. Differencing a Time Series ARIMA models are defined for stationary time series. Therefore, if you start off with a non-stationary time series, you will first need to 8216difference8217 the time series until you obtain a stationary time series. If you have to difference the time series d times to obtain a stationary series, then you have an ARIMA(p, d,q) model, where d is the order of differencing used. You can difference a time series using the 8220diff()8221 function in R. For example, the time series of the annual diameter of women8217s skirts at the hem, from 1866 to 1911 is not stationary in mean, as the level changes a lot over time: We can difference the time series (which we stored in 8220skirtsseries8221, see above) once, and plot the differenced series, by typing: The resulting time series of first differences (above) does not appear to be stationary in mean. Therefore, we can difference the time series twice, to see if that gives us a stationary time series: Formal tests for stationarity Formal tests for stationarity called 8220unit root tests8221 are available in the fUnitRoots package, available on CRAN, but will not be discussed here. The time series of second differences (above) does appear to be stationary in mean and variance, as the level of the series stays roughly constant over time, and the variance of the series appears roughly constant over time. Thus, it appears that we need to difference the time series of the diameter of skirts twice in order to achieve a stationary series. If you need to difference your original time series data d times in order to obtain a stationary time series, this means that you can use an ARIMA(p, d,q) model for your time series, where d is the order of differencing used. For example, for the time series of the diameter of women8217s skirts, we had to difference the time series twice, and so the order of differencing (d) is 2. This means that you can use an ARIMA(p,2,q) model for your time series. The next step is to figure out the values of p and q for the ARIMA model. Another example is the time series of the age of death of the successive kings of England (see above): From the time plot (above), we can see that the time series is not stationary in mean. To calculate the time series of first differences, and plot it, we type: The time series of first differences appears to be stationary in mean and variance, and so an ARIMA(p,1,q) model is probably appropriate for the time series of the age of death of the kings of England. By taking the time series of first differences, we have removed the trend component of the time series of the ages at death of the kings, and are left with an irregular component. We can now examine whether there are correlations between successive terms of this irregular component if so, this could help us to make a predictive model for the ages at death of the kings. Selecting a Candidate ARIMA Model If your time series is stationary, or if you have transformed it to a stationary time series by differencing d times, the next step is to select the appropriate ARIMA model, which means finding the values of most appropriate values of p and q for an ARIMA(p, d,q) model. To do this, you usually need to examine the correlogram and partial correlogram of the stationary time series. To plot a correlogram and partial correlogram, we can use the 8220acf()8221 and 8220pacf()8221 functions in R, respectively. To get the actual values of the autocorrelations and partial autocorrelations, we set 8220plotFALSE8221 in the 8220acf()8221 and 8220pacf()8221 functions. Example of the Ages at Death of the Kings of England For example, to plot the correlogram for lags 1-20 of the once differenced time series of the ages at death of the kings of England, and to get the values of the autocorrelations, we type: We see from the correlogram that the autocorrelation at lag 1 (-0.360) exceeds the significance bounds, but all other autocorrelations between lags 1-20 do not exceed the significance bounds. To plot the partial correlogram for lags 1-20 for the once differenced time series of the ages at death of the English kings, and get the values of the partial autocorrelations, we use the 8220pacf()8221 function, by typing: The partial correlogram shows that the partial autocorrelations at lags 1, 2 and 3 exceed the significance bounds, are negative, and are slowly decreasing in magnitude with increasing lag (lag 1: -0.360, lag 2: -0.335, lag 3:-0.321). The partial autocorrelations tail off to zero after lag 3. Since the correlogram is zero after lag 1, and the partial correlogram tails off to zero after lag 3, this means that the following ARMA (autoregressive moving average) models are possible for the time series of first differences: an ARMA(3,0) model, that is, an autoregressive model of order p3, since the partial autocorrelogram is zero after lag 3, and the autocorrelogram tails off to zero (although perhaps too abruptly for this model to be appropriate) an ARMA(0,1) model, that is, a moving average model of order q1, since the autocorrelogram is zero after lag 1 and the partial autocorrelogram tails off to zero an ARMA(p, q) model, that is, a mixed model with p and q greater than 0, since the autocorrelogram and partial correlogram tail off to zero (although the correlogram probably tails off to zero too abruptly for this model to be appropriate) We use the principle of parsimony to decide which model is best: that is, we assume that the model with the fewest parameters is best. The ARMA(3,0) model has 3 parameters, the ARMA(0,1) model has 1 parameter, and the ARMA(p, q) model has at least 2 parameters. Therefore, the ARMA(0,1) model is taken as the best model. An ARMA(0,1) model is a moving average model of order 1, or MA(1) model. This model can be written as: Xt - mu Zt - (theta Zt-1), where Xt is the stationary time series we are studying (the first differenced series of ages at death of English kings), mu is the mean of time series Xt, Zt is white noise with mean zero and constant variance, and theta is a parameter that can be estimated. A MA (moving average) model is usually used to model a time series that shows short-term dependencies between successive observations. Intuitively, it makes good sense that a MA model can be used to describe the irregular component in the time series of ages at death of English kings, as we might expect the age at death of a particular English king to have some effect on the ages at death of the next king or two, but not much effect on the ages at death of kings that reign much longer after that. Shortcut: the auto. arima() function The auto. arima() function can be used to find the appropriate ARIMA model, eg. type 8220library(forecast)8221, then 8220auto. arima(kings)8221. The output says an appropriate model is ARIMA(0,1,1). Since an ARMA(0,1) model (with p0, q1) is taken to be the best candidate model for the time series of first differences of the ages at death of English kings, then the original time series of the ages of death can be modelled using an ARIMA(0,1,1) model (with p0, d1, q1, where d is the order of differencing required). Example of the Volcanic Dust Veil in the Northern Hemisphere Let8217s take another example of selecting an appropriate ARIMA model. The file file robjhyndman/tsdldata/annual/dvi. dat contains data on the volcanic dust veil index in the northern hemisphere, from 1500-1969 (original data from Hipel and Mcleod, 1994). This is a measure of the impact of volcanic eruptions8217 release of dust and aerosols into the environment. We can read it into R and make a time plot by typing: From the time plot, it appears that the random fluctuations in the time series are roughly constant in size over time, so an additive model is probably appropriate for describing this time series. Furthermore, the time series appears to be stationary in mean and variance, as its level and variance appear to be roughly constant over time. Therefore, we do not need to difference this series in order to fit an ARIMA model, but can fit an ARIMA model to the original series (the order of differencing required, d, is zero here). We can now plot a correlogram and partial correlogram for lags 1-20 to investigate what ARIMA model to use: We see from the correlogram that the autocorrelations for lags 1, 2 and 3 exceed the significance bounds, and that the autocorrelations tail off to zero after lag 3. The autocorrelations for lags 1, 2, 3 are positive, and decrease in magnitude with increasing lag (lag 1: 0.666, lag 2: 0.374, lag 3: 0.162). The autocorrelation for lags 19 and 20 exceed the significance bounds too, but it is likely that this is due to chance, since they just exceed the significance bounds (especially for lag 19), the autocorrelations for lags 4-18 do not exceed the signifiance bounds, and we would expect 1 in 20 lags to exceed the 95 significance bounds by chance alone. From the partial autocorrelogram, we see that the partial autocorrelation at lag 1 is positive and exceeds the significance bounds (0.666), while the partial autocorrelation at lag 2 is negative and also exceeds the significance bounds (-0.126). The partial autocorrelations tail off to zero after lag 2. Since the correlogram tails off to zero after lag 3, and the partial correlogram is zero after lag 2, the following ARMA models are possible for the time series: an ARMA(2,0) model, since the partial autocorrelogram is zero after lag 2, and the correlogram tails off to zero after lag 3, and the partial correlogram is zero after lag 2 an ARMA(0,3) model, since the autocorrelogram is zero after lag 3, and the partial correlogram tails off to zero (although perhaps too abruptly for this model to be appropriate) an ARMA(p, q) mixed model, since the correlogram and partial correlogram tail off to zero (although the partial correlogram perhaps tails off too abruptly for this model to be appropriate) Shortcut: the auto. arima() function Again, we can use auto. arima() to find an appropriate model, by typing 8220auto. arima(volcanodust)8221, which gives us ARIMA(1,0,2), which has 3 parameters. However, different criteria can be used to select a model (see auto. arima() help page). If we use the 8220bic8221 criterion, which penalises the number of parameters, we get ARIMA(2,0,0), which is ARMA(2,0): 8220auto. arima(volcanodust, ic8221bic8221)8221. The ARMA(2,0) model has 2 parameters, the ARMA(0,3) model has 3 parameters, and the ARMA(p, q) model has at least 2 parameters. Therefore, using the principle of parsimony, the ARMA(2,0) model and ARMA(p, q) model are equally good candidate models. An ARMA(2,0) model is an autoregressive model of order 2, or AR(2) model. This model can be written as: Xt - mu (Beta1 (Xt-1 - mu)) (Beta2 (Xt-2 - mu)) Zt, where Xt is the stationary time series we are studying (the time series of volcanic dust veil index), mu is the mean of time series Xt, Beta1 and Beta2 are parameters to be estimated, and Zt is white noise with mean zero and constant variance. An AR (autoregressive) model is usually used to model a time series which shows longer term dependencies between successive observations. Intuitively, it makes sense that an AR model could be used to describe the time series of volcanic dust veil index, as we would expect volcanic dust and aerosol levels in one year to affect those in much later years, since the dust and aerosols are unlikely to disappear quickly. If an ARMA(2,0) model (with p2, q0) is used to model the time series of volcanic dust veil index, it would mean that an ARIMA(2,0,0) model can be used (with p2, d0, q0, where d is the order of differencing required). Similarly, if an ARMA(p, q) mixed model is used, where p and q are both greater than zero, than an ARIMA(p,0,q) model can be used. Forecasting Using an ARIMA Model Once you have selected the best candidate ARIMA(p, d,q) model for your time series data, you can estimate the parameters of that ARIMA model, and use that as a predictive model for making forecasts for future values of your time series. You can estimate the parameters of an ARIMA(p, d,q) model using the 8220arima()8221 function in R. Example of the Ages at Death of the Kings of England For example, we discussed above that an ARIMA(0,1,1) model seems a plausible model for the ages at deaths of the kings of England. You can specify the values of p, d and q in the ARIMA model by using the 8220order8221 argument of the 8220arima()8221 function in R. To fit an ARIMA(p, d,q) model to this time series (which we stored in the variable 8220kingstimeseries8221, see above), we type: As mentioned above, if we are fitting an ARIMA(0,1,1) model to our time series, it means we are fitting an an ARMA(0,1) model to the time series of first differences. An ARMA(0,1) model can be written Xt - mu Zt - (theta Zt-1), where theta is a parameter to be estimated. From the output of the 8220arima()8221 R function (above), the estimated value of theta (given as 8216ma18217 in the R output) is -0.7218 in the case of the ARIMA(0,1,1) model fitted to the time series of ages at death of kings. Specifying the confidence level for prediction intervals You can specify the confidence level for prediction intervals in forecast. Arima() by using the 8220level8221 argument. For example, to get a 99.5 prediction interval, we would type 8220forecast. Arima(kingstimeseriesarima, h5, levelc(99.5))8221. We can then use the ARIMA model to make forecasts for future values of the time series, using the 8220forecast. Arima()8221 function in the 8220forecast8221 R package. For example, to forecast the ages at death of the next five English kings, we type: The original time series for the English kings includes the ages at death of 42 English kings. The forecast. Arima() function gives us a forecast of the age of death of the next five English kings (kings 43-47), as well as 80 and 95 prediction intervals for those predictions. The age of death of the 42nd English king was 56 years (the last observed value in our time series), and the ARIMA model gives the forecasted age at death of the next five kings as 67.8 years. We can plot the observed ages of death for the first 42 kings, as well as the ages that would be predicted for these 42 kings and for the next 5 kings using our ARIMA(0,1,1) model, by typing: As in the case of exponential smoothing models, it is a good idea to investigate whether the forecast errors of an ARIMA model are normally distributed with mean zero and constant variance, and whether the are correlations between successive forecast errors. For example, we can make a correlogram of the forecast errors for our ARIMA(0,1,1) model for the ages at death of kings, and perform the Ljung-Box test for lags 1-20, by typing: Since the correlogram shows that none of the sample autocorrelations for lags 1-20 exceed the significance bounds, and the p-value for the Ljung-Box test is 0.9, we can conclude that there is very little evidence for non-zero autocorrelations in the forecast errors at lags 1-20. To investigate whether the forecast errors are normally distributed with mean zero and constant variance, we can make a time plot and histogram (with overlaid normal curve) of the forecast errors: The time plot of the in-sample forecast errors shows that the variance of the forecast errors seems to be roughly constant over time (though perhaps there is slightly higher variance for the second half of the time series). The histogram of the time series shows that the forecast errors are roughly normally distributed and the mean seems to be close to zero. Therefore, it is plausible that the forecast errors are normally distributed with mean zero and constant variance. Since successive forecast errors do not seem to be correlated, and the forecast errors seem to be normally distributed with mean zero and constant variance, the ARIMA(0,1,1) does seem to provide an adequate predictive model for the ages at death of English kings. Example of the Volcanic Dust Veil in the Northern Hemisphere We discussed above that an appropriate ARIMA model for the time series of volcanic dust veil index may be an ARIMA(2,0,0) model. To fit an ARIMA(2,0,0) model to this time series, we can type: As mentioned above, an ARIMA(2,0,0) model can be written as: written as: Xt - mu (Beta1 (Xt-1 - mu)) (Beta2 (Xt-2 - mu)) Zt, where Beta1 and Beta2 are parameters to be estimated. The output of the arima() function tells us that Beta1 and Beta2 are estimated as 0.7533 and -0.1268 here (given as ar1 and ar2 in the output of arima()). Now we have fitted the ARIMA(2,0,0) model, we can use the 8220forecast. ARIMA()8221 model to predict future values of the volcanic dust veil index. The original data includes the years 1500-1969. To make predictions for the years 1970-2000 (31 more years), we type: We can plot the original time series, and the forecasted values, by typing: One worrying thing is that the model has predicted negative values for the volcanic dust veil index, but this variable can only have positive values The reason is that the arima() and forecast. Arima() functions don8217t know that the variable can only take positive values. Clearly, this is not a very desirable feature of our current predictive model. Again, we should investigate whether the forecast errors seem to be correlated, and whether they are normally distributed with mean zero and constant variance. To check for correlations between successive forecast errors, we can make a correlogram and use the Ljung-Box test: The correlogram shows that the sample autocorrelation at lag 20 exceeds the significance bounds. However, this is probably due to chance, since we would expect one out of 20 sample autocorrelations to exceed the 95 significance bounds. Furthermore, the p-value for the Ljung-Box test is 0.2, indicating that there is little evidence for non-zero autocorrelations in the forecast errors for lags 1-20. To check whether the forecast errors are normally distributed with mean zero and constant variance, we make a time plot of the forecast errors, and a histogram: The time plot of forecast errors shows that the forecast errors seem to have roughly constant variance over time. However, the time series of forecast errors seems to have a negative mean, rather than a zero mean. We can confirm this by calculating the mean forecast error, which turns out to be about -0.22: The histogram of forecast errors (above) shows that although the mean value of the forecast errors is negative, the distribution of forecast errors is skewed to the right compared to a normal curve. Therefore, it seems that we cannot comfortably conclude that the forecast errors are normally distributed with mean zero and constant variance Thus, it is likely that our ARIMA(2,0,0) model for the time series of volcanic dust veil index is not the best model that we could make, and could almost definitely be improved upon Links and Further Reading Here are some links for further reading. For a more in-depth introduction to R, a good online tutorial is available on the 8220Kickstarting R8221 website, cran. r-project. org/doc/contrib/Lemon-kickstart . There is another nice (slightly more in-depth) tutorial to R available on the 8220Introduction to R8221 website, cran. r-project. org/doc/manuals/R-intro. html . You can find a list of R packages for analysing time series data on the CRAN Time Series Task View webpage . To learn about time series analysis, I would highly recommend the book 8220Time series8221 (product code M249/02) by the Open University, available from the Open University Shop . There are two books available in the 8220Use R8221 series on using R for time series analyses, the first is Introductory Time Series with R by Cowpertwait and Metcalfe, and the second is Analysis of Integrated and Cointegrated Time Series with R by Pfaff. Acknowledgements I am grateful to Professor Rob Hyndman. for kindly allowing me to use the time series data sets from his Time Series Data Library (TSDL) in the examples in this booklet. Many of the examples in this booklet are inspired by examples in the excellent Open University book, 8220Time series8221 (product code M249/02), available from the Open University Shop . Thank you to Ravi Aranke for bringing auto. arima() to my attention, and Maurice Omane-Adjepong for bringing unit root tests to my attention, and Christian Seubert for noticing a small bug in plotForecastErrors(). Thank you for other comments to Antoine Binard and Bill Johnston. Contact I will be grateful if you will send me (Avril Coghlan) corrections or suggestions for improvements to my email address alc 64 sanger 46 ac 46 uk License

Opciones Binarias garcia

Search This Blog

Promedio Móvil En R Series De Tiempo

Comments

Post a Comment

Popular posts from this blog

Modelo De Agencia De Brokers De Forex

Most Accurate Forex Trading Signals

Opciones De Comercio Electrónico