TopCells: Supporting Keyword Search in Text Cube
• 0 Vote(s) - 0 Average
• 1
• 2
• 3
• 4
• 5
 seminar class Active In SP Posts: 5,361 Joined: Feb 2011 16-02-2011, 10:33 AM TopCells: Supporting Keyword Search in Text Cube TopCells: Keyword Search  Keyword Search – Simple but popular (as we can see in Zhai’s course) – A lot of studies in IR community  Recently, Keyword Search in DB (DB+IR) – Finding the close connection between tuples Motivation Search a light weighted powerful laptop from customer reviews Text Cube  Aggregating Multi-dimensional Text Data TopCells: Keyword Query  Simple Keyword Query – A set of keywords  Extended Keyword Query – With dimension constraints TopCells: Ranking Cells  Given a keyword query q = {t1, …, tm} How to rank the cells?  Relevance of a cell C to the given query – Average the relevance of documents in a cell – s(d, q) could be ANY IR scoring formula, like Okapi Algorithm 1 (one-scan of inverted index)  Naïve one-scan inverted-index-based algorithm – Compute s(d, q) for all documents d’s – For each d, update rel(C, q) for all cells C’s containing d (There are 2dim cells containing d) – Output top-k cells C’s with highest rel(C, q)’s – Deficiency – Given a query q, equivalent to scanning all cells once – When the dimensionality is high, the number of cells is huge – Solution (search space ordering) – Explore as small number of cells as possible Algorithm 2 (search-space ordering)  Monotonicity of relevance score – Used to estimate upper bounds – Search-Space-Ordering algorithm – Starting from single documents – Bottom-to-up computation: aggregating cells of small sizes into big ones – Aggregating cells with higher scores first – Partially-aggregated cell – Output C, when C is fully-aggregated and rel(C, q) is no less than the upper bound of rel(C’, q) for all partially-aggregated cells C’ download full report https://netfiles.uiuc.edu/bding3/www/pap...090508.ppt