Jump to main content
US EPA
United States Environmental Protection Agency
Search
Search
Main menu
Environmental Topics
Laws & Regulations
About EPA
Health & Environmental Research Online (HERO)
Contact Us
Print
Feedback
Export to File
Search:
This record has one attached file:
Add More Files
Attach File(s):
Display Name for File*:
Save
Citation
Tags
HERO ID
7094612
Reference Type
Journal Article
Title
SAZED: parameter-free domain-agnostic season length estimation in time series data
Author(s)
Toller, M; Santos, T; Kern, R; ,
Year
2019
Is Peer Reviewed?
Yes
Journal
Data Mining and Knowledge Discovery
ISSN:
1384-5810
Publisher
SPRINGER
Location
DORDRECHT
Page Numbers
1775-1798
DOI
10.1007/s10618-019-00645-z
Web of Science Id
WOS:000492844200008
Abstract
Season length estimation is the task of identifying the number of observations in the dominant repeating pattern of seasonal time series data. As such, it is a common pre-processing task crucial for various downstream applications. Inferring season length from a real-world time series is often challenging due to phenomena such as slightly varying period lengths and noise. These issues may, in turn, lead practitioners to dedicate considerable effort to preprocessing of time series data since existing approaches either require dedicated parameter-tuning or their performance is heavily domain-dependent. Hence, to address these challenges, we propose SAZED: spectral and average autocorrelation zero distance density. SAZED is a versatile ensemble of multiple, specialized time series season length estimation approaches. The combination of various base methods selected with respect to domain-agnostic criteria and a novel seasonality isolation technique, allow a broad applicability to real-world time series of varied properties. Further, SAZED is theoretically grounded and parameter-free, with a computational complexity of O(nlogn), which makes it applicable in practice. In our experiments, SAZED was statistically significantly better than every other method on at least one dataset. The datasets we used for the evaluation consist of time series data from various real-world domains, sterile synthetic test cases and synthetic data that were designed to be seasonal and yet have no finite statistical moments of any order.
Home
Learn about HERO
Using HERO
Search HERO
Projects in HERO
Risk Assessment
Transparency & Integrity