specific niche or industry

Written by

in

QuickSel (Quick Selectivity Learning) is a machine learning-based framework designed for query-driven selectivity estimation in database query optimizers. Introduced by Yongjoo Park et al. at the SIGMOD 2020 conference, it replaces traditional database histograms with mixture models to enable ultra-fast, continuous model refinements from incoming queries. The core breakdown of the framework includes: The Problem It Solves

Stale Statistics: Traditional databases rely on data-driven histograms or random samples that require periodic, expensive table scans.

The Complexity Trap: Existing query-driven alternatives (like ISOMER or STHoles) build non-overlapping bucket histograms. As more queries are observed, the number of buckets grows exponentially, forcing systems to discard valuable historical data to keep overhead manageable. Key Architecture & Mechanics

Uniform Mixture Models: Instead of disjoint histogram buckets, QuickSel models the underlying data using overlapping subpopulations. It specifically leverages a uniform mixture model (rather than a Gaussian mixture model) because calculating the intersection of hyperrectangles can be solved via basic arithmetic rather than slow numerical approximations.

Quadratic Programming: Instead of utilizing time-consuming iterative scaling or maximum entropy algorithms, QuickSel reduces the optimization objective to a quadratic programming problem. This mathematical shortcut allows the model to be solved and updated analytically in near real-time.

Continuous Updates: Because of this design, QuickSel updates incrementally. It processes and refines its data distribution model in mere milliseconds upon observing the true selectivity of a executed query. Performance & Impact

Speed: QuickSel can update its internal model for 300 queries in roughly 1.9 milliseconds. This makes it 34.0× to 179.4× faster at training/refinement than state-of-the-art query-driven histograms.

Accuracy: Given the same storage budget, QuickSel yields 26.8% to 91.8% higher accuracy than periodically-refreshed histograms and static sampling methods.

If you are exploring this for a specific project, let me know if you would like to explore how the uniform mixture model handles multidimensional range queries or how it compares to deep learning-based cardinality estimators.

QuickSel: Quick Selectivity Learning with Mixture Models – arXiv

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *