Research - Clément Rouvroy

2025. Index Benefit Estimation for Column-oriented data bases .ENS-PSL Research Internship DefensePDF Show abstract

Columnar databases start to leverage indexes to enhance bottleneck queries efficiency. To decide if an index may be useful or not in row-oriented systems, a commonly used tool is a “What-If” API. It returns an estimation of the benefit of an index without constructing it, saving time and resources at the cost of possible mistakes in the prediction. Moreover, to have a complete auto-tuning database, recent researches leverage Machine Learning to have an automatic database administrator that tunes the database configuration using workload knowledge and “What-If” calls. As column-oriented databases are just starting to leverage indexes, there is yet no such tool specifically designed for these systems. In this work, we propose three contributions: heuristics for segment-aware access paths costs estimations in column-oriented databases, a hypothetical index benefit estimation designed for column-oriented databases (limited to hash indexes), and the use of quantile regression to trade-off precision and risk.

2025. Enumerating with constant delay and linear preprocessing acyclic CQs with self-joins .ENS-PSL Research Project DefensePDF Show abstract

This work studies the enumeration of answers to conjunctive queries. Research on this topic typically focuses on conjunctive queries without self-joins. We want to work towards a dichotomy that allows us to know if a conjunctive query with self-joins can be enumerated with optimal time guarantees: linear time before the first answer and constant delay between answers. In this work, we have found a sufficient and necessary conditions. However, we have examples that we can not classify and can lead to new work on the subject.

2024. Enumerating with constant delay and linear preprocessing acyclic CQs with self-joins .BOREAL Seminar on Knowledge Graph and Database TheoryPDF on LIRMM page Show abstract

Some CQs are in DelayCLin: we can enumerate all their answers with constant delay between answers after linear preprocessing. We have a dichotomy for classifying which CQs without self-joins are in DelayCLin, but this dichotomy is not correct for CQs with self-joins. As CQs often have self-joins, our goal with this internship is to find a sufficient condition and a necessary condition for acyclic CQs with self-joins.

2025. Report: Hypothetical Index Benefit Estimation for Column-oriented storage using Quantiles .Internship Report, supervised by Jiachen SHI and Gao CONG. PDF Show abstract

Columnar databases start to leverage indexes to enhance bottleneck queries efficiency. To decide if an index may be useful or not in row-oriented systems, a commonly used tool is a “What-If” API. It returns an estimation of the benefit of an index without constructing it, saving time and resources at the cost of possible mistakes in the prediction. Moreover, to have a complete auto-tuning database, recent researches leverage Machine Learning to have an automatic database administrator that tunes the database configuration using workload knowledge and “What-If” calls. As column-oriented databases are just starting to leverage indexes, there is yet no such tool specifically designed for these systems. In this work, we propose three contributions: heuristics for segment-aware access paths costs estimations in column-oriented databases, a hypothetical index benefit estimation designed for column-oriented databases (limited to hash indexes), and the use of quantile regression to trade-off precision and risk.

2025. Report: On the enumeration of answers to acyclic conjunctive queries with self-joins .Internship Report, supervised by Luc Segoufin, Nofar Carmeli and David Carral. PDF Show abstract

This work studies the enumeration of answers to conjunctive queries. Research on this topic typically focuses on conjunctive queries without self-joins. We want to work towards a dichotomy that allows us to know if a conjunctive query with self-joins can be enumerated with optimal time guarantees: linear time before the first answer and constant delay between answers. In this work, we have found a sufficient and necessary conditions. However, we have examples that we can not classify and can lead to new work on the subject.

Information

Presentations

Research Internships

Research Publications