A genomewide map of Neandertal ancestry in modern humans
,2, Nick Patterson2, Swapan Mallick1
,2, Svante Paabo3, David Reich1
1Harvard Medical School, Boston, USA, 2Broad Institute of Harvard and MIT, Cambridge, USA, 3Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
Analysis of the genomes of archaic hominins, such as Neandertals and Denisovans, has revealed that these groups have contributed to the genetic variation of modern human populations. Yet, we know little about how these ancient mixtures have shaped the genetic structure of human populations and even less about their impact on human evolution. To answer these questions systematically, we need a map of archaic ancestry i.e., a map that labels whether each region of an individual genome is descended from these archaics.
Building such a map is technically challenging because of the antiquity of these gene flow events. We have identified signatures based on patterns of variation at single SNPs as well as haplotypes that are informative of ancient gene flow. We propose a principled method based on the statistical framework of Conditional Random Fields (CRFs) that integrates these patterns leading to highly accurate predictions.
We applied our method to polymorphism data in European and East Asian individuals from the 1000 genomes project, in conjunction with the draft sequence of the Neandertal genome, to obtain the first genomewide map of Neandertal ancestry. Analysis of this map reveals several findings:
1. We identify around 35,000 Neandertal-derived alleles in Europeans and 21,000 in East Asians.
2. The map allows us to identify Neandertal alleles that have been the target of selection since introgression. We identified over 100 regions in which the frequency of Neandertal ancestry is extremely unlikely under a model of neutral evolution. The highest frequency region on chromosome 4 has a frequency of Neandertal ancestry of about 85% in Europe and overlaps CLOCK, a key gene in Circadian function in mammals. The high frequency, Neandertal-derived variant is specific to Europeans; it is not very common in East Asians. This gene has been found in other selection scans in Eurasian populations, but has never before been linked to Neandertal gene flow.
3. Several of the Neandertal-derived alleles identified in 1) above are found in the >6,000 SNPs associated with common diseases listed in the NHGRI catalog. These Neandertal derived variants are found to be risk variants associated with obesity and protective variants against breast cancer.
4. We also investigate the possibility of using this map to reconstruct the genome of the introgressing Neandertal. Using the ancestries in Europe and East Asia, we can reconstruct about 600 Mb which we expect to increase with larger samples and additional populations.