Enriching shapelets with positional information for timeseries classification
How a neat simple trick can boost both predictive performance and interpretability.
Timeseries classification
Many real-world processes produce data over time, giving rise to temporal data or timeseries. As opposed to tabular data, neighboring observations (i.e. observations that are close in time) are highly correlated, requiring special effort when analyzing timeseries. One possible task that can be performed on timeseries is the classification of them. Example use cases include surface detection based on accelerometer data, classifying the type of electrical device based on electricity usage or classifying a leaf’s type based on contour information.
Shapelets
Shapelets are small subseries, or parts of the timeseries, that are informative or discriminative for a certain class. They can be used to transform the timeseries to features by calculating the distance for each of the timeseries you want to classify to a shapelet.
Positional information
Guillemé et al. recently published “Localized Random Shapelets” in which they present an approach to extract shapelets which can be used to create a feature matrix that consists of both distance-based features and location-based features, resulting in better predictive performances.
GENDIS: efficiently extracting accurate shapelets
We extended our technique, GENDIS (GENetic DIscovery of Shapelets) to extract positional information in addition to the distances to the shapelets. As GENDIS is a genetic algorithm, it can optimize anything (its objective function does not need to be differentiable). As such, this required minimal effort but preliminary experiments show a significant boost in predictive performance. With this new update, GENDIS is able to extract shapelets that are very interpretable and able to achieve state-of-the-art predictive performances when fed to a machine learning classifier. Moreover, as opposed to techniques achieving similar performances, GENDIS does not need to perform a brute-force search AND the number of shapelets and corresponding lengths do not need to be tuned.
Let’s demonstrate the new capabilities by creating an artificial dataset where positional information is the key discriminative property (as the pattern will be the same for both classes):
We extract shapelets using both versions and measure the accuracy we can achieve using this single shapelet with a logistic regression classifier and default hyper-parameters. The older version was only able to achieve an accuracy of 81% while the new version achieves 100% accuracy. We also quickly compared the new version to the old version on three smaller datasets, and each time, the positional information significantly increased the accuracy.
Explainability of shapelets
One of the most interesting properties of shapelets is that they are very explainable. The extracted shapelets can easily be presented to a domain expert to show which parts of the timeseries were exactly used in order to make a decision. Moreover, by optimizing both positional and distance information of the shapelets, the extracted shapelets are more interpretable as well. Without the positional information, GENDIS and related techniques such as Learning Timeseries Shapelets, were able to often “hack” the positional information in there. Guillemé et al. demonstrate that in their paper:
So this wraps up this short post! By extending our shapelet extraction framework to extract positional information in addition to the distances to each of the shapelets, we were able to achieve some significant gains in predictive performances with a limited effort!
If you have more questions on shapelets, or you have a use case involving time series classification? Get in touch with us!
Sources
[1] Ye, L., & Keogh, E. (2009, June). Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.
[2] Lines, J., Davis, L. M., Hills, J., & Bagnall, A. (2012, August). A shapelet transform for time series classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining.
[3] Grabocka, J., Schilling, N., Wistuba, M., & Schmidt-Thieme, L. (2014, August). Learning time-series shapelets. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.
[4] Guillemé, M., Malinowski, S., Tavenard, R., & Renard, X. (2019, September). Localized Random Shapelets. In International Workshop on Advanced Analysis and Learning on Temporal Data. Springer, Cham.
[5] Vandewiele, G., Ongenae, F., & De Turck, F. (2019). GENDIS: GENetic DIscovery of Shapelets. arXiv preprint arXiv:1910.12948.
[6] www.timeseriesclassification.com