Information
You can contact me on the e-mail accessible on reports.
Presentations
Presentations I have given either in seminars or in defense during my scholarship.
2025. Index Benefit Estimation for Column-oriented data bases
.ENS-PSL Research Internship DefensePDF
Show abstract
Columnar databases start to leverage indexes to enhance bottleneck
queries efficiency. To decide if an index may be useful or not in
row-oriented systems, a commonly used tool is a “What-If” API. It
returns an estimation of the benefit of an index without constructing it, saving time and resources at the cost of possible mistakes in
the prediction. Moreover, to have a complete auto-tuning database,
recent researches leverage Machine Learning to have an automatic
database administrator that tunes the database configuration using workload knowledge and “What-If” calls. As column-oriented
databases are just starting to leverage indexes, there is yet no such
tool specifically designed for these systems. In this work, we propose three contributions: heuristics for segment-aware access paths
costs estimations in column-oriented databases, a hypothetical index benefit estimation designed for column-oriented databases (limited to hash indexes), and the use of quantile regression to trade-off
precision and risk.
2025. Enumerating with constant delay and linear preprocessing acyclic CQs with self-joins
.ENS-PSL Research Project DefensePDF
Show abstract
This work studies the enumeration of answers to conjunctive queries. Research on this topic typically focuses on conjunctive queries without self-joins. We want to work towards a dichotomy that allows us to know if a conjunctive query with self-joins can be enumerated with optimal time guarantees: linear time before the first answer and constant delay between answers. In this work, we have found a sufficient and necessary conditions. However, we have examples that we can not classify and can lead to new work on the subject.
2024. Enumerating with constant delay and linear preprocessing acyclic CQs with self-joins
.BOREAL Seminar on Knowledge Graph and Database TheoryPDF on LIRMM page
Show abstract
Some CQs are in DelayCLin: we can enumerate all their answers with constant delay between answers after linear preprocessing. We have a dichotomy for classifying which CQs without self-joins are in DelayCLin, but this dichotomy is not correct for CQs with self-joins. As CQs often have self-joins, our goal with this internship is to find a sufficient condition and a necessary condition for acyclic CQs with self-joins.
Research Internships
Works from internship, at NTU Singapore, ENS-PSL (NormaleSup) and INRIA Montpellier
2025. Report: Hypothetical Index Benefit Estimation for Column-oriented storage using Quantiles
.Internship Report, supervised by Jiachen SHI and Gao CONG. PDF
Show abstract
Columnar databases start to leverage indexes to enhance bottleneck
queries efficiency. To decide if an index may be useful or not in
row-oriented systems, a commonly used tool is a “What-If” API. It
returns an estimation of the benefit of an index without constructing it, saving time and resources at the cost of possible mistakes in
the prediction. Moreover, to have a complete auto-tuning database,
recent researches leverage Machine Learning to have an automatic
database administrator that tunes the database configuration using workload knowledge and “What-If” calls. As column-oriented
databases are just starting to leverage indexes, there is yet no such
tool specifically designed for these systems. In this work, we propose three contributions: heuristics for segment-aware access paths
costs estimations in column-oriented databases, a hypothetical index benefit estimation designed for column-oriented databases (limited to hash indexes), and the use of quantile regression to trade-off
precision and risk.
2025. Report: On the enumeration of answers to acyclic
conjunctive queries with self-joins
.Internship Report, supervised by Luc Segoufin, Nofar Carmeli and David Carral. PDF
Show abstract
This work studies the enumeration of answers to conjunctive queries. Research on this topic typically focuses on conjunctive queries without self-joins. We want to work towards a dichotomy that allows us to know if a conjunctive query with self-joins can be enumerated with optimal time guarantees: linear time before the first answer and constant delay between answers. In this work, we have found a sufficient and necessary conditions. However, we have examples that we can not classify and can lead to new work on the subject.