Health & Environmental Research Online (HERO)


Print Feedback Export to File
7121777 
Journal Article 
Low-Latency Analytics on Colossal Data Streams with SummaryStore 
Agrawal, N; Vulimiri, A; , 
2017 
ASSOC COMPUTING MACHINERY 
NEW YORK 
647-664 
Store is an approximate time-series store, designed for analytics, capable of storing large volumes of time-series data (similar to 1 petabyte) on a single node; it preserves high degrees of query accuracy and enables near real-time querying at unprecedented cost savings. SummaryStore contributes time-decayed summaries, a novel abstraction for summarizing data streams, along with an ingest algorithm to continually merge the summaries for efficient range queries; in conjunction, it returns reliable error estimates alongside the approximate answers, supporting a range of machine learning and analytical workloads. We successfully evaluated SummaryStore using real-world applications for forecasting, outlier detection, and Internet traffic monitoring; it can summarize aggressively with low median errors, 0.1 to 10%, for differentworkloads. Under range-querymicrobenchmarks, it stored 1 PB synthetic streamdata (1024 1TB streams), on a single node, using roughly 10 TB (100x compaction) with 95%-ile error below 5% and median cold-cache query latency of 1.3s (worst case latency under 70s). 
26th ACM Symposium on Operating Systems Principles (SOSP) 
Shanghai, PEOPLES R CHINA