Preprints

Bienvenu F, Duchamps J-J, Fuchs M. and Yu T-S. (2024+)
The B₂ index of galled trees
submitted to Annals of Applied Probability
arXiv:2407.19454 | pdf | abstract Bienvenu F, Duchamps J-J, Fuchs M. and Yu T-S. (2024+)
The B₂ index of galled trees
submitted to Annals of Applied Probability
arXiv:2407.19454 | pdf | abstract
In recent years, there has been an effort to extend the classical notion of phylogenetic balance, originally defined in the context of trees, to networks. One of the most natural ways to do this is with the so-called B₂ index. In this paper, we study the B₂ index for a prominent class of phylogenetic networks: galled trees. We show that the B₂ index of a uniform leaf-labeled galled tree converges in distribution as the network becomes large. We characterize the corresponding limiting distribution, and show that its expected value is 2.707911858984... This is the first time that a balance index has been studied to this level of detail for a random phylogenetic network.
One specificity of this work is that we use two different and independent approaches, each with its advantages: analytic combinatorics, and local limits. The analytic combinatorics approach is more direct, as it relies on standard tools; but it involves slightly more complex calculations. Because it has not previously been used to study such questions, the local limit approach requires developing an extensive framework beforehand; however, this framework is interesting in itself and can be used to tackle other similar problems.

Published journal articles

Bienvenu F. (2025)
Mathematically tractable models of random phylogenetic networks:
an overview of some recent developments
Philosophical Transactions of the Royal Society B, 380(1919):20230301
doi: 10.1098/rstb.2023.0301 | pdf | abstract Bienvenu F. (2025)
Mathematically tractable models of random phylogenetic networks:
an overview of some recent developments
Philosophical Transactions of the Royal Society B, 380(1919):20230301
doi: 10.1098/rstb.2023.0301 | pdf | abstract
Models of random phylogenetic networks have been used since the inception of the field, but the introduction and rigorous study of mathematically tractable models is a much more recent topic that has gained momentum in the last 5 years. This manuscript discusses some recent developments in the field through a selection of examples. The emphasis is on the techniques rather than on the results themselves, and on probabilistic tools rather than on combinatorial ones.
Couvert É, Bienvenu F, Duchamps J-J, Erard A, Miró Pina V, Schertzer E and Lambert A. (2024)
Opening the species box: what parsimonious microscopic models of speciation have to say about macroevolution
Journal of Evolutionary Biology, 37(12) pp. 1433–1457
doi: 10.1093/jeb/voae134 | pdf | abstract Couvert É, Bienvenu F, Duchamps J-J, Erard A, Miró Pina V, Schertzer E and Lambert A. (2024)
Opening the species box: what parsimonious microscopic models of speciation have to say about macroevolution
Journal of Evolutionary Biology, 37(12) pp. 1433–1457
doi: 10.1093/jeb/voae134 | pdf | abstract
In the last two decades, lineage-based models of diversification, where species are viewed as particles that can divide (speciate) or die (become extinct) at rates depending on some evolving trait, have been very popular tools to study macroevolutionary processes. Here, we argue that this approach cannot be used to break down the inner workings of species diversification and that "opening the species box" is necessary to understand the causes of macroevolution, but that too detailed speciation models also fail to make robust macroevolutionary predictions. We set up a general framework for parsimonious models of speciation that rely on a minimal number of mechanistic principles: (a) reproductive isolation is caused by excessive dissimilarity between genotypes; (b) dissimilarity results from a balance between differentiation processes and homogenizing processes; and (c) dissimilarity can feed back on these processes by decelerating homogenization. We classify such models according to the main homogenizing process: (a) clonal evolution models (ecological drift), (b) models of genetic isolation (gene flow), and (c) models of isolation by distance (spatial drift). We review these models and their specific predictions on macroscopic variables such as species abundances, speciation rates, interfertility relationships, or phylogenetic tree structure. We propose new avenues of research by displaying conceptual questions remaining to be solved and new models to address them: the failure of speciation at secondary contact, the feedback of dissimilarity on homogenization, and the emergence in space of breeding barriers.
Bienvenu F. and Steel M. (2024)
0–1 laws for pattern occurrences in phylogenetic trees and networks
Bulletin of Mathematical Biology, 86:94
doi: 10.1007/s11538-024-01316-x | pdf | abstract Bienvenu F. and Steel M. (2024)
0–1 laws for pattern occurrences in phylogenetic trees and networks
Bulletin of Mathematical Biology, 86:94
doi: 10.1007/s11538-024-01316-x | pdf | abstract
In a recent paper, the question of determining the fraction of binary trees that contain a fixed pattern known as the snowflake was posed. We show that this fraction goes to 1, providing two very different proofs: a purely combinatorial one that is quantitative and specific to this problem; and a proof using branching process techniques that is less explicit, but also much more general, as it applies to any fixed patterns and can be extended to other trees and networks. In particular, it follows immediately from our second proof that the fraction of d-ary trees (resp. level-k networks) that contain a fixed d-ary tree (resp. level-k network) tends to 1 as the number of leaves grows.
Bienvenu F. and Duchamps J-J. (2024)
A branching process with coalescence to model random phylogenetic networks
Electronic Journal of Probability, 29(31) pp. 1–48
doi: 10.1214/24-EJP1088 | pdf | abstract Bienvenu F. and Duchamps J-J. (2024)
A branching process with coalescence to model random phylogenetic networks
Electronic Journal of Probability, 29(31) pp. 1–48
doi: 10.1214/24-EJP1088 | pdf | abstract
We introduce a biologically natural, mathematically tractable model of random phylogenetic network to describe evolution in the presence of hybridization. One of the features of this model is that the hybridization rate of the lineages correlates negatively with their phylogenetic distance. We give formulas / characterizations for quantities of biological interest that make them straightforward to compute in practice. We show that the appropriately rescaled network, seen as a metric space, converges to the Brownian continuum random tree, and that the uniformly rooted network has a local weak limit, which we describe explicitly.
Guez J, Achaz G, Bienvenu F, Cury J, Toupance B, Heyer E, Jay F. and Austerlitz F. (2023)
Cultural transmission of reproductive success impacts genomic diversity,
coalescent tree topologies and demographic inferences
Genetics, 223(4) iyad007
doi: 10.1093/genetics/iyad007 | pdf | abstract Guez J, Achaz G, Bienvenu F, Cury J, Toupance B, Heyer E, Jay F. and Austerlitz F. (2023)
Cultural transmission of reproductive success impacts genomic diversity,
coalescent tree topologies and demographic inferences
Genetics, 223(4) iyad007
doi: 10.1093/genetics/iyad007 | pdf | abstract
Cultural Transmission of Reproductive Success (CTRS) has been observed in many human populations as well as other animals. It consists in a positive correlation of non-genetic origin between the progeny size of parents and children. This correlation can result from various factors, such as the social influence of parents on their children, the increase of children's survival through allocare from uncle and aunts, or the transmission of resources. Here, we study the evolution of genomic diversity through time under CTRS. We show that CTRS has a double impact on population genetics: (1) effective population size decreases when CTRS starts, mimicking a population contraction, and increases back to its original value when CTRS stops; (2) coalescent trees topologies are distorted under CTRS, with higher imbalance and higher number of polytomies. Under long-lasting CTRS, effective population size stabilises but the distortion of tree topology remains, which yields U-shaped Site Frequency Spectra (SFS) under constant population size. We show that this CTRS' impact yields a bias in SFS-based demographic inference. Considering that CTRS was detected in numerous human and animal populations worldwide, one should be cautious that inferring population past histories from genomic data can be biased by this cultural process.
Bienvenu F, Lambert A. and Steel M. (2022)
Combinatorial and stochastic properties of ranked tree-child networks
Random Structures & Algorithms, 60(4) pp. 653–689
doi: 10.1002/rsa.21048 | pdf | abstract Bienvenu F, Lambert A. and Steel M. (2022)
Combinatorial and stochastic properties of ranked tree-child networks
Random Structures & Algorithms, 60(4) pp. 653–689
doi: 10.1002/rsa.21048 | pdf | abstract
Tree-child networks are a recently-described class of directed acyclic graphs that have risen to prominence in phylogenetics (the study of evolutionary trees and networks). Although these networks have a number of attractive mathematical properties, many combinatorial questions concerning them remain intractable. In this paper, we show that endowing these networks with a biologically relevant ranking structure yields mathematically tractable objects, which we term ranked tree-child networks (RTCNs). We explain how to derive exact and explicit combinatorial results concerning the enumeration and generation of these networks. We also explore probabilistic questions concerning the properties of RTCNs when they are sampled uniformly at random. These questions include the lengths of random walks between the root and leaves (both from the root to the leaves and from a leaf to the root); the distribution of the number of cherries in the network; and sampling RTCNs conditional on displaying a given tree. We also formulate a conjecture regarding the scaling limit of the process that counts the number of lineages in the ancestry of a leaf. The main idea in this paper, namely using ranking as a way to achieve combinatorial tractability, may also extend to other classes of networks.
Coste CFD*, Bienvenu F*, Ronget V, Ramirez-Loza J-P, Cubaynes S. and Pavard S. (2021)
The kinship matrix: inferring the kinship structure of a population from its demography
Ecology Letters, 24(12) pp. 2750–2762
doi: 10.1111/ele.13854 | pdf | abstract Coste CFD*, Bienvenu F*, Ronget V, Ramirez-Loza J-P, Cubaynes S. and Pavard S. (2021)
The kinship matrix: inferring the kinship structure of a population from its demography
Ecology Letters, 24(12) pp. 2750–2762
doi: 10.1111/ele.13854 | pdf | abstract
The familial structure of a population and the relatedness of its individuals are determined by its demography. There is, however, no general method to infer kinship directly from the life-cycle of a structured population. Yet this question is central to fields such as ecology, evolution and conservation, especially in contexts where there is a strong interdependence between familial structure and population dynamics. Here, we give a general formula to compute, from any matrix population model, the expected number of arbitrary kin (sisters, nieces, cousins, etc) of a focal individual ego, structured by the class of ego and of its kin. Central to our approach are classic but little-used tools known as genealogical matrices, which we combine in a new way. Our method can be used to obtain both individual-based and population-wide metrics of kinship, as we illustrate. It also makes it possible to analyze the sensitivity of the kinship structure to the traits implemented in the model.
Bienvenu F, Cardona G. and Scornavacca C. (2021)
Revisiting Shao and Sokal's B₂ index of phylogenetic balance
Journal of Mathematical Biology, 83:52
doi: 10.1007/s00285-021-01662-7 | pdf | abstract Bienvenu F, Cardona G. and Scornavacca C. (2021)
Revisiting Shao and Sokal's B₂ index of phylogenetic balance
Journal of Mathematical Biology, 83:52
doi: 10.1007/s00285-021-01662-7 | pdf | abstract
Measures of phylogenetic balance, such as the Colless and Sackin indices, play an important role in phylogenetics. Unfortunately, these indices are specifically designed for phylogenetic trees, and do not extend naturally to phylogenetic networks (which are increasingly used to describe reticulate evolution). This led us to consider a lesser-known balance index, whose definition is based on a probabilistic interpretation that is equally applicable to trees and to networks. This index, known as the B₂ index, was first proposed by Shao and Sokal in 1990. Surprisingly, it does not seem to have been studied mathematically since. Likewise, it is used only sporadically in the biological literature, where it tends to be viewed as arcane and not very useful in practice – even though the evidence for this is scarce. In this paper, we study mathematical properties of B₂ such as its distribution under the most common models of random trees and its range over various classes of phylogenetic networks. We also assess its relevance in biological applications, and find it to be comparable to that of the Colless and Sackin indices. Altogether, our results call for a reevaluation of the status of this somewhat forgotten measure of phylogenetic balance.
Bienvenu F, Duchamps J-J. and Foutel-Rodier F. (2021)
The Moran forest
Random Structures & Algorithms, 59(2) pp. 155–188
doi: 10.1002/rsa.20997 | pdf | abstract Bienvenu F, Duchamps J-J. and Foutel-Rodier F. (2021)
The Moran forest
Random Structures & Algorithms, 59(2) pp. 155–188
doi: 10.1002/rsa.20997 | pdf | abstract
Starting from any graph on {1, … , n}, consider the Markov chain where at each time-step a uniformly chosen vertex is disconnected from all of its neighbors and reconnected to another uniformly chosen vertex. This Markov chain has a stationary distribution whose support is the set of non-empty forests on {1, … , n}. The random forest corresponding to this stationary distribution has interesting connections with the uniform rooted labeled tree and the uniform attachment tree. We fully characterize its degree distribution, the distribution of its number of trees, and the limit distribution of the size of a tree sampled uniformly. We also show that the size of the largest tree is asymptotically α log n, where α = (1 - log(e - 1))^-1 ≈ 2.18, and that the degree of the most connected vertex is asymptotically log n / log log n.
Bienvenu F. (2019)
Positive association of the oriented percolation cluster in randomly oriented graphs
Combinatorics, Probability and Computing, 28(6) pp. 811–815
doi: 10.1017/S0963548319000191 | pdf | abstract Bienvenu F. (2019)
Positive association of the oriented percolation cluster in randomly oriented graphs
Combinatorics, Probability and Computing, 28(6) pp. 811–815
doi: 10.1017/S0963548319000191 | pdf | abstract
Consider any fixed graph whose edges have been randomly and independently oriented, and write {S ⇝ i} to indicate that there is an oriented path going from a vertex s ∈ S to vertex i. Narayanan (2016) proved that for any set S and any two vertices i and j, {S ⇝ i} and {S ⇝ j} are positively correlated. His proof relies on the Ahlswede-Daykin inequality, a rather advanced tool of probabilistic combinatorics. In this short note, I give an elementary proof of the following, stronger result: writing V for the vertex set of the graph, for any source set S, the events {S ⇝ i}, i ∈ V, are positively associated – meaning that the expectation of the product of increasing functionals of the family {S ⇝ i} for i ∈ V is greater than the product of their expectations.
Bienvenu F. (2019)
The equivocal mean age of parents in a cohort
The American Naturalist, 194(2) pp. 276–284
doi: 10.1086/704110 | pdf | abstract Bienvenu F. (2019)
The equivocal mean age of parents in a cohort
The American Naturalist, 194(2) pp. 276–284
doi: 10.1086/704110 | pdf | abstract
The mean age at which parents give birth is an important notion in demography, ecology, and evolution, where it is used as a measure of generation time. A standard way to quantify it is to compute the mean age of the parents of all offspring produced by a cohort, and the resulting measure is thought to represent the mean age at which a typical parent produces offspring. In this note, I explain why this interpretation is problematic. I also introduce a new measure of the mean age at reproduction and show that it can be very different from the mean age of parents of offspring of a cohort. In particular, the mean age of parents of offspring of a cohort systematically overestimates the mean age at reproduction and can even be greater than the expected life span of parents.
Bienvenu F, Débarre F. and Lambert A. (2019)
The split-and-drift random graph, a null model for speciation
Stochastic Processes and their Applications, 129(6) pp. 2010–2048
doi: 10.1016/j.spa.2018.06.009 | pdf | abstract Bienvenu F, Débarre F. and Lambert A. (2019)
The split-and-drift random graph, a null model for speciation
Stochastic Processes and their Applications, 129(6) pp. 2010–2048
doi: 10.1016/j.spa.2018.06.009 | pdf | abstract
We introduce a new random graph model motivated by biological questions relating to speciation. This random graph is defined as the stationary distribution of a Markov chain on the space of graphs on {1, …, n}. The dynamics of this Markov chain is governed by two types of events: vertex duplication, where at constant rate a pair of vertices is sampled uniformly and one of these vertices loses its incident edges and is rewired to the other vertex and its neighbors; and edge removal, where each edge disappears at constant rate. Besides the number of vertices n, the model has a single parameter r_n. Using a coalescent approach, we obtain explicit formulas for the first moments of several graph invariants such as the number of edges or the number of complete subgraphs of order k. These are then used to identify five non-trivial regimes depending on the asymptotics of the parameter r_n. We derive an explicit expression for the degree distribution, and show that under appropriate rescaling it converges to classical distributions when the number of vertices goes to infinity. Finally, we give asymptotic bounds for the number of connected components, and show that in the sparse regime the number of edges is Poissonian.
Bienvenu F, Akçay E, Legendre S. and McCandlish D. M. (2017)
The genealogical decomposition of a matrix population model
with applications to the aggregation of stages
Theoretical Population Biology, 115 pp. 69–80
doi: 10.1016/j.tpb.2017.04.002 | pdf | abstract Bienvenu F, Akçay E, Legendre S. and McCandlish D. M. (2017)
The genealogical decomposition of a matrix population model
with applications to the aggregation of stages
Theoretical Population Biology, 115 pp. 69–80
doi: 10.1016/j.tpb.2017.04.002 | pdf | abstract
Matrix projection models are a central tool in many areas of population biology. In most applications, one starts from the projection matrix to quantify the asymptotic growth rate of the population (the dominant eigenvalue), the stable stage distribution, and the reproductive values (the dominant right and left eigenvectors, respectively). Any primitive projection matrix also has an associated ergodic Markov chain that contains information about the genealogy of the population. In this paper, we show that these facts can be used to specify any matrix population model as a triple consisting of the ergodic Markov matrix, the dominant eigenvalue and one of the corresponding eigenvectors. This decomposition of the projection matrix separates properties associated with lineages from those associated with individuals. It also clarifies the relationships between many quantities commonly used to describe such models, including the relationship between eigenvalue sensitivities and elasticities. We illustrate the utility of such a decomposition by introducing a new method for aggregating classes in a matrix population model to produce a simpler model with a smaller number of classes. Unlike the standard method, our method has the advantage of preserving reproductive values and elasticities. It also has conceptually satisfying properties such as commuting with changes of units.
Bienvenu F. and Legendre S. (2015)
A new approach to the generation time in matrix population models
The American Naturalist, 185(6) pp. 834–843
doi: 10.1086/681104 | pdf | abstract Bienvenu F. and Legendre S. (2015)
A new approach to the generation time in matrix population models
The American Naturalist, 185(6) pp. 834–843
doi: 10.1086/681104 | pdf | abstract
The generation time is commonly defined as the mean age of mothers at birth. In matrix population models, a general formula is available to compute this quantity. However, it is complex and hard to interpret. Here, we present a new approach where the generation time is envisioned as a return time in an appropriate Markov chain. This yields surprisingly simple results, such as the fact that the generation time is the inverse of the sum of the elasticities of the growth rate to changes in the fertilities. This result sheds new light on the interpretation of elasticities (which as we show correspond to the frequency of events in the ancestral lineage of the population), and we use it to generalize a result known as Lebreton's formula. Finally, we also show that the generation time can be seen as a random variable, and we give a general expression for its distribution.

Scientific software

Bienvenu, F. and Doulcier, G. (2021)
MatPopMod, a Python library for matrix population models
Zenodo. doi: 10.5281/zenodo.5557426

last update: 04/03/2024