Jump to main content
US EPA
United States Environmental Protection Agency
Search
Search
Main menu
Environmental Topics
Laws & Regulations
About EPA
Health & Environmental Research Online (HERO)
Contact Us
Print
Feedback
Export to File
Search:
This record has one attached file:
Add More Files
Attach File(s):
Display Name for File*:
Save
Citation
Tags
HERO ID
7121777
Reference Type
Journal Article
Title
Low-Latency Analytics on Colossal Data Streams with SummaryStore
Author(s)
Agrawal, N; Vulimiri, A; ,
Year
2017
Publisher
ASSOC COMPUTING MACHINERY
Location
NEW YORK
Page Numbers
647-664
DOI
10.1145/3132747.3132758
Web of Science Id
WOS:000522460300039
Abstract
Store is an approximate time-series store, designed for analytics, capable of storing large volumes of time-series data (similar to 1 petabyte) on a single node; it preserves high degrees of query accuracy and enables near real-time querying at unprecedented cost savings. SummaryStore contributes time-decayed summaries, a novel abstraction for summarizing data streams, along with an ingest algorithm to continually merge the summaries for efficient range queries; in conjunction, it returns reliable error estimates alongside the approximate answers, supporting a range of machine learning and analytical workloads. We successfully evaluated SummaryStore using real-world applications for forecasting, outlier detection, and Internet traffic monitoring; it can summarize aggressively with low median errors, 0.1 to 10%, for differentworkloads. Under range-querymicrobenchmarks, it stored 1 PB synthetic streamdata (1024 1TB streams), on a single node, using roughly 10 TB (100x compaction) with 95%-ile error below 5% and median cold-cache query latency of 1.3s (worst case latency under 70s).
Conference Name
26th ACM Symposium on Operating Systems Principles (SOSP)
Conference Location
Shanghai, PEOPLES R CHINA
Home
Learn about HERO
Using HERO
Search HERO
Projects in HERO
Risk Assessment
Transparency & Integrity