Enriching shapelets with positional information for timeseries classification

How a neat simple trick can boost both predictive performance and interpretability.

Gilles Vandewiele
Towards Data Science

--

Timeseries classification

Many real-world processes produce data over time, giving rise to temporal data or timeseries. As opposed to tabular data, neighboring observations (i.e. observations that are close in time) are highly correlated, requiring special effort when analyzing timeseries. One possible task that can be performed on timeseries is the classification of them. Example use cases include surface detection based on accelerometer data, classifying the type of electrical device based on electricity usage or classifying a leaf’s type based on contour information.

Detecting the surface where a robot (Sony AIBO) is walking on using the X-axis of the accelerometer data. Image taken from http://www.timeseriesclassification.com/description.php?Dataset=SonyAIBORobotSurface1

Shapelets

Shapelets are small subseries, or parts of the timeseries, that are informative or discriminative for a certain class. They can be used to transform the timeseries to features by calculating the distance for each of the timeseries you want to classify to a shapelet.

Extracting two shapelets from the ItalyPowerDemand dataset in order to transform the timeseries into a 2-dimensional feature space. Each of the axis in the feature space represent the distance to one of the two shapelets. As can be seen, a nice linear separation can already be achieved using these two shapelets.

Positional information

Guillemé et al. recently published “Localized Random Shapelets” in which they present an approach to extract shapelets which can be used to create a feature matrix that consists of both distance-based features and location-based features, resulting in better predictive performances.

Their presented technique will mine K shapelets in order to transform the N timeseries into a N x 2K feature matrix. This feature matrix consists of both the distance to each shapelet (d(Ti, Si)) and the location (l(Ti, Si)).
The features for each timeseries are calculated by sliding the shapelet over the timeseries and calculating the euclidean distance of the shapelets to the timeseries in each location. In the end, the minimal distance and corresponding location are returned. Specified distances and locations are purely conceptual and far from exact.

GENDIS: efficiently extracting accurate shapelets

We extended our technique, GENDIS (GENetic DIscovery of Shapelets) to extract positional information in addition to the distances to the shapelets. As GENDIS is a genetic algorithm, it can optimize anything (its objective function does not need to be differentiable). As such, this required minimal effort but preliminary experiments show a significant boost in predictive performance. With this new update, GENDIS is able to extract shapelets that are very interpretable and able to achieve state-of-the-art predictive performances when fed to a machine learning classifier. Moreover, as opposed to techniques achieving similar performances, GENDIS does not need to perform a brute-force search AND the number of shapelets and corresponding lengths do not need to be tuned.

Let’s demonstrate the new capabilities by creating an artificial dataset where positional information is the key discriminative property (as the pattern will be the same for both classes):

The generated timeseries. The pattern is the same for the two classes. The only way to distinguish between the two classes is to use the location where the pattern is found.

We extract shapelets using both versions and measure the accuracy we can achieve using this single shapelet with a logistic regression classifier and default hyper-parameters. The older version was only able to achieve an accuracy of 81% while the new version achieves 100% accuracy. We also quickly compared the new version to the old version on three smaller datasets, and each time, the positional information significantly increased the accuracy.

Explainability of shapelets

One of the most interesting properties of shapelets is that they are very explainable. The extracted shapelets can easily be presented to a domain expert to show which parts of the timeseries were exactly used in order to make a decision. Moreover, by optimizing both positional and distance information of the shapelets, the extracted shapelets are more interpretable as well. Without the positional information, GENDIS and related techniques such as Learning Timeseries Shapelets, were able to often “hack” the positional information in there. Guillemé et al. demonstrate that in their paper:

In the upper image, we see a long timeseries (lightblue), corresponding to earthquake readings, and the shapelets learned by LS, which does not incorporate positional information, in color. The extracted shapelets do achieve excellent predictive performance, but are nowhere related to the original timeseries, making them hard to interpret (they are positioned on the location where the distance from the timeseries to the shapelet is minimal). On the other hand, the LRS, which does encode the positional information, achieves excellent predictive performance and extracts shapelets that are very related to the original timeseries (as we can see that the colored subseries closely match the original lightblue timeseries).

So this wraps up this short post! By extending our shapelet extraction framework to extract positional information in addition to the distances to each of the shapelets, we were able to achieve some significant gains in predictive performances with a limited effort!

If you have more questions on shapelets, or you have a use case involving time series classification? Get in touch with us!

Sources

[1] Ye, L., & Keogh, E. (2009, June). Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.
[2] Lines, J., Davis, L. M., Hills, J., & Bagnall, A. (2012, August). A shapelet transform for time series classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining.
[3] Grabocka, J., Schilling, N., Wistuba, M., & Schmidt-Thieme, L. (2014, August). Learning time-series shapelets. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.
[4] Guillemé, M., Malinowski, S., Tavenard, R., & Renard, X. (2019, September). Localized Random Shapelets. In International Workshop on Advanced Analysis and Learning on Temporal Data. Springer, Cham.
[5] Vandewiele, G., Ongenae, F., & De Turck, F. (2019). GENDIS: GENetic DIscovery of Shapelets. arXiv preprint arXiv:1910.12948.
[6] www.timeseriesclassification.com

--

--