A probabilistic model for data cube compression and query approximation

From National Research Council Canada

Download	View accepted manuscript: A probabilistic model for data cube compression and query approximation (PDF, 349 KiB)
Author	Search for: Missaoui, Rokia; Search for: Goutte, Cyril¹; Search for: Choupo, Anicet Kouomou; Search for: Boujenoui, Ameur
Affiliation	National Research Council of Canada. NRC Institute for Information Technology
Format	Text, Article
Conference	The ACM Tenth International Workshop on Data Warehousing and OLAP, November 9, 2007, Lisbon, Portugal
Abstract	Databases and data warehouses contain an overwhelming volume of information that users must wade through in order to extract valuable and actionable knowledge to support the decision-making process. This contribution addresses the problem of automatically analyzing large multidimensional tables to get a concise representation of data, identify patterns and provide approximate answers to queries. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of a probabilistic modeling technique, called non-negative multi-way array factorization, for approximating aggregate and multidimensional values. Using such a technique, we compute the set of components (clusters) that best fit the initial data set and whose superposition approximates the original data. The generated components can then be exploited for approximately answering OLAP queries such as roll-up, slice and dice operations. The proposed modeling technique will then be compared against the log-linear modeling technique which has already been used in the literature for compression and outlier detection in data cubes. Finally, three data sets will be used to discuss the potential benefits of non-negative multi-way array factorization.
Publication date	2007
In	The ACM Tenth International Workshop on Data Warehousing and OLAP [Proceedings].
Language	English
NRC number	NRCC 49870
NPARC number	5763914
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	aede0cb1-b744-4692-a39b-af0315612a67
Record created	2009-03-29
Record modified	2020-08-12

Date modified:: 2025-05-11