11 Jun 2024

Understanding the Human Pangenome

blog

Understanding the Human Pangenome

The human reference genome has formed the backbone of human genomics since its initial draft release more than 20 years ago [1]. This achievement revolutionized genome sequencing and helped scientists find thousands of disease-linked mutations. However, this version of the genome was a snapshot of human genetic makeup, predominantly representing individuals of European descent.

This lack of representation limits the kind of genetic variation that can be detected, potentially leaving certain patients undiagnosed and deprived of the benefits provided by personalized healthcare. Furthermore, the human genome was not just lacking in representation but was also incomplete – missing 8% of its total content, including half of the Y chromosome!

Researchers have been working to address these limitations and published the first complete human genome in 2022 [2], followed by a human pangenome draft in 2023 [3]. By including genomes from a much wider range of populations, ethnicities, and geographical regions, the human pangenome draft represents a significant leap forward in genomic research. This comprehensive dataset offers a more accurate reflection of human genetic diversity, providing key insights into population-specific genetic variations and disease susceptibilities.

This newfound inclusivity promises to enhance the accuracy of genetic diagnoses, broaden the scope of personalized healthcare initiatives, and ultimately contribute to more equitable healthcare practices worldwide.

A different kind of genome reference

The production of the first human genome relied heavily on data from a small group of 20 individuals in North America, with most of the sequence generated from blood donors from Buffalo, New York. Remarkably, approximately 70% of this sequence stemmed from a single individual. It’s clear from this that this reference genome does not and cannot represent the full spectrum of human genetic diversity.

This inherent bias is especially pronounced in underrepresented populations, such as those of Asian and African descent. Studies have highlighted glaring disparities, with vast portions of novel genomic sequences and protein-coding genes absent from the linear human reference genome, particularly in populations of Chinese and African ancestry.

For instance, a study focusing on Han Chinese genomes uncovered 29.5 million base-pairs (Mb) of novel genomic sequences and identified at least 188 novel protein-coding genes absent from the linear reference genome [4]. Similarly, two investigations involving African populations revealed that a staggering 300 Mb was absent from the reference genome, underscoring the magnitude of genetic diversity overlooked by this initial sequencing project [5,6].

In response to these shortfalls, researchers have embarked on a groundbreaking endeavor to construct a more inclusive and representative human pangenome. This transformative initiative integrates genomic data from a diverse cohort of 47 individuals, spanning African, American, Asian, and European ancestries. Furthermore, efforts are underway to expand this dataset to encompass genomic information from 350 individuals by 2024.

Unlike the linear sequence of the conventional reference genome, the human pangenome is akin to a dynamic web, comprising multiple sequences derived from distinct individuals. This web-like structure reflects the inherent variability and complexity of human genetics, with different paths representing regions of the genome where sequence divergence occurs between individuals (Figure 1). Some regions exhibit extensive variability, characterized by numerous alternative routes, while others remain conserved, unified into a singular path.

Figure 1. The new pangenome reference is a collection of different genomes from which to compare an individual genome sequence. For instance, when the graph shows a unified single path, the sequence is conserved between individuals, while detouring paths signify single nucleotide variants, or indels, or large structural variants such as inversions and duplications. Source: National Human Genome Research Institute [8].

Enabled by long-read sequencing

This remarkable work has only been made possible because of the latest advances in sequencing technology, particularly the advent of long-read sequencing (LRS) by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). This cutting-edge technology has revolutionized the field by empowering researchers to sequence entire diploid human genomes from scratch within a matter of weeks, a feat once deemed inconceivable.

LRS has several advantages that have facilitated the construction of the human pangenome:

  • Bridging gaps in complex regions: Characterized by extended read lengths ranging from 15 to 20 kilobases (kb), LRS excels in traversing repetitive and complex regions of the genome.
  • Detection and analysis of structural variations: With its high read accuracy— PacBio HiFi reads can typically provide accuracy of 99.9%—LRS facilitates the precise detection and analysis of all types of variants, from single nucleotide polymorphisms (SNPs) to structural variants (SVs).
  • Assembly improvement: Longer reads make for enhanced contiguity and completeness, ensuring a much easier reconstruction of the human genome. This capability makes it possible to sequence and construct the genomes of potentially hundreds of individuals in a much shorter timeframe than before!

How will the human pangenome improve healthcare?

Each individual’s genome exhibits slight variations, with an average difference of about 0.4% [8]. Since your complete genome contains approximately 6 billion nucleotides, this means that, on average, you and your neighbor’s genomes differ by about 27 Mb (Figure 2). Understanding these differences can help to diagnose disease, predict medical outcomes, and tailor treatments accordingly.

Figure 2. On average, when directly comparing two people’s genome sequences, they will only vary by approximately 0.4% or 27 million base-pairs. Source: National Human Genome Research Institute [7].

By using the pangenome reference, researchers can delve deeper into these genetic distinctions, enabling more comprehensive evolutionary studies, profiling of rare variants, and population-specific research [9,10]. This enhanced understanding of genetic diversity promises to revolutionize healthcare by facilitating more personalized and effective interventions tailored to individual genetic profiles.

As of 2023, the pangenome reference has revealed 119 Mb of variable sequences and 1,115 gene duplications relative to the existing linear reference [3]. Of these, roughly 90 Mb are derived from SVs, which are the rearrangement of large segments of DNA, and can have profound consequences in evolution and human disease [11,12]. These new bases will help researchers to study regions in the genome for which there was previously no reference, allowing for the identification of population-specific variants and SVs linked to disease.

Another study has already demonstrated the utility of pangenome references to identify rare structural variants associated with disease [13]. Using data from various sources, including standard reference genomes, public assemblies, and HiFi genome sequencing data from a rare disease program, researchers created a comprehensive pangenome. They found over 200,000 unique genetic variations and identified 30 potential disease-causing SVs, including a novel diagnostic SV in KMT2E, a gene known to serve an important role in neurodevelopment [14].

This 14.5kb deletion, affecting exons 9-13 in KMT2E, is predicted to result in a premature stop and loss of function. Mutations in this gene commonly lead to O’Donnell-Luria-Rodan (ODLURO) syndrome, characterized by global developmental delay, autism, epilepsy, hypotonia, macrocephaly, and mild dysmorphic facial features [15]. Consistent with these characteristics, the patient displayed a neurodevelopmental phenotype including hypotonia, macrocephaly, and developmental delay. This approach shows promise for finding rare genetic diseases and may lead to better diagnosis and treatment in the future.

As LRS technologies continue to evolve and become more accessible, their potential impact on healthcare and genetic research is expected to grow significantly. These advancements hold promise for improving the detection and understanding of rare and population-specific genetic variants, facilitating more accurate diagnosis and personalized treatment strategies.

Advancing human genetics

The human pangenome represents a continuation of decades-long efforts from scientists to understand the biological code that underlies human life and diversity. This comprehensive genomic resource marks a pivotal moment in genomic research, offering unparalleled insights into the complexities of human genetics and paving the way for transformative advancements in personalized medicine. However, while the pangenome represents a significant milestone, there are still notable communities absent from the current draft, including Native Americans. This highlights the complex ethical and social issues that still need to be addressed to ensure health equity for everyone.

Moving forward, bridging these gaps in genomic representation will be instrumental in advancing the field of personalized medicine, ensuring that healthcare solutions cater to the diverse needs of every individual, regardless of their background or heritage.

At Eremid®, we are keen to facilitate the translation of genomic insights into clinical applications. Our PacBio Revio system, offering access to cutting-edge HiFi sequencing technology, enables comprehensive genomic analysis, allowing for the detection of rare variants and structural variations across diverse populations. By leveraging the power of LRS, we can deliver top-tier clinical genomics services and bioinformatics for a wide range of human health applications.

References

  1. Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921. https://doi.org/10.1038/35057062
  2. Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., et al. (2022). The complete sequence of a human genome. Science, 376(6588), 44–53. https://doi.org/10.1126/science.abj6987
  3. Liao, W.-W., Asri, M., Ebler, J., Doerr, D., Haukness, M., et al. (2023). A draft human pangenome reference. Nature, 617(7960), 312–324. https://doi.org/10.1038/s41586-023-05896-x
  4. Duan, Z., Qiao, Y., Lu, J., Lu, H., Zhang, W., et al. (2019). HUPAN: A pan-genome analysis pipeline for human genomes. Genome Biology, 20. https://doi.org/10.1186/s13059-019-1751-y
  5. Sherman, R. M., Forman, J., Antonescu, V., Puiu, D., Daya, M., et al. (2019). Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nature Genetics, 51(1), 30–35. https://doi.org/10.1038/s41588-018-0273-y
  6. Choudhury, A., Aron, S., Botigué, L. R., Sengupta, D., Botha, G., et al. (2020). High-depth African genomes inform human migration and health. Nature, 586(7831), 741–748. https://doi.org/10.1038/s41586-020-2859-7
  7. Scientists release a new human “pangenome” reference. (2023). Retrieved 9 April 2024, from https://www.genome.gov/news/news-release/scientists-release-a-new-human-pangenome-reference
  8. Human Genomic Variation. (2023). Retrieved 5 April 2024, from https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genomic-variation
  9. Pollen, A. A., Kilik, U., Lowe, C. B., & Camp, J. G. (2023). Human-specific genetics: New tools to explore the molecular and cellular basis of human evolution. Nature Reviews Genetics, 24(10), 687–711. https://doi.org/10.1038/s41576-022-00568-4
  10. Wu, Z., Li, T., Jiang, Z., Zheng, J., Gu, Y., Liu, Y., et al. (2024). Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Research, 52(5), 2212–2230. https://doi.org/10.1093/nar/gkae086
  11. Sudmant, P. H., Rausch, T., Gardner, E. J., Handsaker, R. E., Abyzov, A., et al. (2015). An integrated map of structural variation in 2,504 human genomes. Nature, 526(7571), 75–81. https://doi.org/10.1038/nature15394
  12. Perry, G. H., Yang, F., Marques-Bonet, T., Murphy, C., Fitzgerald, T., et al. (2008). Copy number variation and evolution in humans and chimpanzees. Genome Research, 18(11), 1698–1710. https://doi.org/10.1101/gr.082016.108
  13. Groza, C., Schwendinger-Schreck, C., Cheung, W. A., Farrow, E. G., Thiffault, I., et al. (2024). Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nature Communications, 15(1), 657. https://doi.org/10.1038/s41467-024-44980-2
  14. O’Donnell-Luria, A. H., Pais, L. S., Faundes, V., Wood, J. C., Sveden, A., et al. (2019). Heterozygous Variants in KMT2E Cause a Spectrum of Neurodevelopmental Disorders and Epilepsy. American Journal of Human Genetics, 104(6), 1210–1222. https://doi.org/10.1016/j.ajhg.2019.03.021
  15. Benvenuto, M., Cesarini, S., Severi, G., Ambrosini, E., Russo, A., et al. (2024). Phenotypic Description of A Patient with ODLURO Syndrome and Functional Characterization of the Pathogenetic Role of A Synonymous Variant c.186G>A in KMT2E Gene. Genes, 15(4), Article 4. https://doi.org/10.3390/genes15040430

 

Looking for deeper insights on your next project? Discuss a project
"Eremid provides the support we need to make a global impact in our large immunogenomic oncology clinical studies. The team’s expertise and flexibility from assay design to data delivery is helping us achieve our vision – an ideal research partner." Geoffrey Erickson, Immunis AI, MI USA — Senior Vice President, Corporate Development
"Working with Eremid has been a pleasure. We received excellent data with a very fast turnaround and appreciated the attentive and helpful project management!" Steve Watkins, BCD Biosciences, CA USA — CEO
Trusted by