Characterizing Gene Expression Variation Across Seven Diverse Human Populations

Alicia Martin 1, Helio Costa1, Jeffrey Kidd2, Brenna Henn1, Muh-Ching Yee1, Stephen Montgomery1, Howard Cann3, Michael Snyder1, Carlos Bustamante1
1Stanford University School of Medicine, Department of Genetics, Stanford, CA, USA, 2University of Michigan School of Medicine, Department of Human Genetics, Ann Arbor, MI, USA, 3Foundation Jean Dausset, Centre d'Etude du Polymorphisme Humain, Paris, France

Genetic variation has been studied across diverse human populations, but our understanding of its impact on phenotypic variation is limited without extending these studies to determine the effect of variation on gene expression. Genome-wide mRNA sequencing (RNAseq) studies in individual populations have yielded insights into natural variation in mRNA levels, isoform diversity, and novel transcripts—connecting gene expression differences to complex phenotypes. However, a complete understanding of human transcriptome variation requires examining many populations from a wide range of biogeographic ancestries. In order to elucidate how expression changes occurred throughout historical human migrations, we have integrated the genome and transcriptome sequences of 45 lymphoblastoid cell lines from seven populations within the Human Genome Diversity Project that represent the full spectrum of human migration history. These populations include the San Bushmen of southern Africa, Mbuti Pygmies of central Africa, Mozabites of north Africa, Pathans of central Asia, Cambodians of east Asia, Yakut of Siberia, and Mayans of Mexico. This approach allows us to perform a comparative study using the single-nucleotide resolution of RNAseq to assess rare transcripts, novel gene structures, alternative splicing, allele-specific expression, and differential expression within and among populations. We have quantified reads for known exons, transcripts and whole genes and have employed a novel statistical approach for identifying systematically differentially expressed genes among populations. Preliminary results suggest that on average, over 7,000 are expressed in each population. Further, we have identified several transcripts that are differentially expressed by population. In the future, we aim to analyze population specific splice variants, which we believe will be consistent with drift and/or selection. For example, the Mbuti Pygmies are shorter in stature on average than other populations. Eighty percent of the variation in height among individuals is due to genetic factors, so we expect to find differential expression among candidate height genes in Pygmies compared to other populations. We also expect that under a neutral model, the number of population-specific transcripts will be proportional to divergence time as measured by time to most recent common ancestor (TMRCA), and that deviations from this model will be suggestive of selection. Our dataset will allow a detailed investigation of the landscape of human transcriptome variation in diverse human populations.