English

Exploiting Duality in Summarization with Deterministic Guarantees






Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. These complexity advantages offer both a spaceefficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.
Find OpenCourseWare Online Exams!
Attribution: The Open Education Consortium
http://www.ocwconsortium.org/courses/view/d2ed3372365591006ebeec605befd7b8/
Course Home http://videolectures.net/kdd07_karras_edis/