Ancestral Problems in Population Genetics

January 23, 2008, IHP, Amphi Darboux

Moderator: Thierry Huillet (Cergy-Pontoise)

E. Baake (Bielefeld), R. A. Blythe (Edinburgh), R.C. Griffiths, A. Lambert (Paris), M. Moehle (Dusseldorf), M. Serva (L'Aquila), D. Simon (Paris).

Ellen Baake (Bielefeld): Ancestral processes with selection: Branching and Moran models.

We consider two versions of stochastic population models with mutation and selection. The first approach relies on a multitype branching process; here, individuals reproduce and change type (i.e., mutate) independently of each other, without restriction on population size. We analyse the equilibrium behaviour of this model, both in the forward and in the backward direction of time; the backward point of view emerges if the ancestry of individuals chosen randomly from the present population is traced back into the past.

The second approach is the Moran model with selection. Here, the population has constant size $N$. Individuals reproduce (at rates depending on their types), the offspring inherits the parent's type, and replaces a randomly chosen individual (to keep population size constant). Independently of the reproduction process, individuals can change type. As in the branching model, we consider the ancestral lines of single individuals chosen from the equilibrium population. We use analytical results of Fearnhead (2002) to determine the explicit properties, and parameter dependence, of the ancestral distribution of types, and its relationship with the stationary distribution in forward time. Furthermore, we establish a connection with the diffusion approach of Taylor (2007).

Richard Blythe (Edinburgh): Dialects, demes and duality.

We have recently proposed a model for language change - in particular new-dialect formation - which turns out to map onto genetic drift in the presence of a considerable degree of population subdivision. Key to the evaluation of current theories fpr new-dialect formation is the way in which the time to fixation from a known initial condition increases with the number of subpopulations (demes) as this becomes large. Using duality and other symmetry relations between the forward- and backward-time formulations of the population dynamics we obtain some exact results and efficient numerical methods for addressing precisely this problem. An application of these results to New Zealand English will be discussed.

Bob Griffiths (Oxford): Ancestral inference from gene trees.

A unique gene tree describing the mutation history of a sample of DNA sequences can be can be constructed as a perfect phylogeny under an as- sumption of non-recurrent point mutations. An empirical distribution of the stochastic history of the gene tree, conditional on its topology, can be found by an advanced simulation technique of importance sampling on coalescent histories. The distribution of the time to the most recent common ances- tor and ages of mutations in the gene tree, conditional on its topology, can be found from the empirical distribution. This talk will present examples of ancestral inference from gene trees, microsatellite data, and sketch the importance sampling technique.

Amaury Lambert (Paris): Mutation patterns for the coalescent point process.

We consider a branching population whose individuals have i.i.d. lifespans with general distribution, during which they give birth at constant rate, independently, to copies of themselves. In a previous work, it is shown that for any fixed time t, alive individuals can be ranked in such a way that the coalescence times between consecutive individuals are i.i.d. with specified distribution. The ranked sequence of these coalescence times is called a coalescent point process, and encodes all the information about the genealogical structure of the population at time t.

When individual are given DNA sequences, there are two quantities of interest for a sample of n DNA sequences, each belonging to a distinct individual of the population: the number Sn of polymorphic sites (sites at which at least two sequences differ), and the number Kn of different haplotypes (distinct sequences). It is standard to assume that mutations arrive at constant rate mu (on germ lines), and never hit the same site on the DNA sequence. Then for the Wright-Fisher model with large population size, it is well-known that both Sn and Kn grow like mu log n as the sample size n grows.

Here, we study the mutation pattern associated to general coalescent point processes. We show that Sn and Kn grow linearly with n, with a speed which is linear in mu for Sn, but not for Kn. In addition, we study the frequency spectrum of the sample, that is, the numbers of polymorphic sites/haplotypes carried by k individuals in the sample. These numbers are shown to grow linearly with the sample size, with deterministic slopes whose dependence on k is displayed.

Martin Moehle (Dusseldorf): Asymptotic results for some functionals of the Bolthausen-Sznitman coalescent.

The asymptotic behaviour of the Bolthausen-Sznitman n-coalescent is examined as the sample size n tends to infinity. Weak convergence results are provided for functionals such as the number of collision events that take place until there is just a single block, the total branch length of the tree, the time back to the most recent common ancestor and the length of an external branch chosen at random.

The proofs are mainly analytic and based on a singularity analysis of generating functions. For the number of collision events we provide an alternative probabilistic proof based on a coupling related to a certain random walk. This coupling is useful to derive the asymptotics of absorption times of certain Markov chains, in particular of the number X(n) of collision events in beta(a,1)-coalescents with parameter 02.

Maurizio Serva (L'Aquila): Family trees: languages and genetics.

We consider a population of large size which evolves according to neutral haploid reproduction. The genealogical tree is very complex and genealogical distances are distributed according to a probability density which remains random in the limit of large population. We give a description of the statical and dynamical aspects of the problem.

The evolution of languages closely resembles the evolution of haploid organisms or mtDNA. This similarity allows for the construction of languages trees. The key point is the definition of a distance between pairs of languages. Here we use a renormalized Levenshtein distance among words with same meaning and we average on all the words contained in a list. Assuming a constant rate of mutation, these genetic distances should be proportional in average to genealogical distances. Nevertheless, the relation between genetic and genealogical distances must be investigated. We test our method by constructing the tree of the Indo-European group.

Finally, we consider the case of two populations where the reproduction is diploid for what concerns nuclear DNA and haploid for what concerns mtDNA. Moreover, at any generation some individuals migrate from a population to the other. In a finite but random time, the mtDNA of a population is completely replaced by the mtDNA of the other while, in the same time, the nuclear DNA is not completely replaced. Results may have some relevance for the Out of Africa/Multiregional debate in Paleoanthropology.

Damien Simon (Paris): Effect of selection on population dynamics : travelling waves and genealogies.

I will present models of evolution processes based on branching random walks and show how some problems can be reduced to the study of travelling waves. In particular, I will show how the knowledge of the survival probability of an individual in some simple models is sufficient to describe and simulate regimes conditioned on the final size). In particular, it will help us to understand some aspects of the genealogical structure of a population in presence of selection. Consequently, I will show how coalescence times and genealogies in population models may help us to characterize selection processes.