Applying Attention on Lagged page views for Time-series Forecasting

Concepts used:-

  1. Attention mechanism
  2. Sliding window for multiple days forecasting
  3. New feature — Attention on Compressed Lag page views
  4. Deep learning
  5. Keras Model Subclassing

Ml Problem Formulation:-

Given a time series of length n predict 64 days of future web views

Core Idea:-

Use Attention on Lagged Page views to capture long term seasonality to predict next x days

Dataset Overview

● Dataset is taken from the Kaggle web traffic prediction competition. We Basically have days starting from 2015–7–01 till 2017–09–10, so that’s around 800 days worth of data to predict 64 days and we have 140k such series each divided by language, access agents, etc.

Language page views vs Days
Mean page views
Median page views
std page views

Feature Generation:-

The minimalist approach here, because LSTM is potent enough to dig up and learn features on its own. Model feature list:

Model PipeLine:-

64 batch size,100 days of data, features — — 5 for encoder and 906 for concatenation of 90 180 270 and 365 days lag

Modeling SeQ to Seq:-

  1. Our Encoder will be an LSTM unit with return sequences = True for decoding

Loss Function:-

Mean Absolute error Loss


Attention+LSTM vs LSTM:-

● LSTM suffer from long-range dependencies and hence fail to capture the seasonality of trend of this long time series

attention weights vs Days


1. Here is a loss to epoch training image for LSTM architecture

LSTM model Epochs
Attentoin model epochs


While the LStm model got stuck at local minima of .500 MAE loss Our Attention on lagged paged views approach surpasses that easily to .25 MAE


Future Work:-

  1. Train on all-time series
  2. Plot loss curves using Tensorboard
  3. use Berts self-attention



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store