8 Jun 2023

The Importance of Plant Reference Genomes


The importance of new reference genomes for plant sciences

Plant genomics: The key to food security, human health, and a sustainable environment

The nearly half-million species of green plants on the planet today are fundamental drivers of global ecosystems. They make enormous contributions to agriculture, medicine, energy, and the environment and are critical to human health and wellbeing [1].

Plants exhibit a natural diversity in their forms, from floating duckweed to towering trees, and survival strategies, from cacti to coffee and chili plants. As a result, plant species comprise a hugely genetically-varied group of organisms. Through understanding their vast genetic diversity, plant science aims to increase crop yields, disease- and stress -tolerance, and the quality of food and pharmaceuticals.

In this blog, we discuss the challenges and benefits of new reference genomes to support the drive for better health, agriculture, and environment.

Create and analyze your reference genome with our expertise.

Discuss a Project Now!

What is a plant reference genome?

One method that researchers use to help comprehend and take advantage of plant natural genetic diversity is to produce reference genomes. A reference genome provides a genetic representation for a given species, or sub-species. Similar to a jigsaw puzzle, reference genomes are assembled from smaller pieces of sequence. By assembling millions of sequences into a contiguous read, a reference genome can help identify the location and potential variations of specific DNA sequences, enabling further analysis and manipulation. Researchers can use this information to map and interpret their species sequencing data and compare with previous genetic studies conducted in related species.

Genomic information has become a ubiquitous and essential aspect of biological research. However, despite the abundance of plant species, only around 800 have had a genome sequenced, often to a much lower specification than required, and usually targeting plants of high economic importance [2]. Because of this, many researchers working on under-valued or novel species don’t have access to a high-quality reference genome or any genome information at all. This can limit research possibilities and make techniques like gene editing, gene expression analysis, and methylation mapping much more difficult, if not impossible.

Advances in sequencing technology for plant genomes

Plant genomes are notoriously difficult to assemble. They are large and complex, often containing multiple sets of chromosomes and high levels of repeat DNA. In fact, the genome of one of the most well-known model species, Arabidopsis thaliana, took 10 years to complete and wasn’t released until 2000 [3].

Since then, sequencing technology has developed at an incredible pace – becoming faster, more accurate, and less expensive. However, it still requires considerable expertise to get the best from it.

When Sanger sequencing was first developed back in 1977, it was only possible to read hundreds of nucleotides in a process that required several hours [4]. However, the sequencing landscape has been revolutionized with the advent of high-throughput technologies, such as Illumina’s sequencing-by-synthesis short-read approach, which can sequence billions of DNA fragments simultaneously. As a result of these advancements, the entire human genome, made up of 3 billion nucleotides, can be sequenced in a matter of hours.

In addition to higher throughput, new technologies needed a way to produce longer reads. While short-read sequencing is highly accurate and cost-effective, it can complicate the task of assembling reads into contiguous sequences and reconstructing a full-length genome, particularly if there are highly complex or repetitive regions. As technology advanced, longer reads were made possible but these were still associated with large error rates. The technological limitations made it difficult to assemble high-quality polyploid genomes.

That’s where HiFi sequencing comes in.

Developed by PacBio, the advent of HiFi sequencing made continuous reads of 10-25,000 bp possible while providing an accuracy of 99.9%, on par with short reads and Sanger sequencing [5]. Long-read sequencing leads to fewer gaps in genome assemblies, enabling more accurate assessment of sequence duplications, structural variation, and repetitive sequences.

With the use of technologies like PacBio’s new Revio™ system, chromosomes can be accurately resolved from telomere to telomere.

These developments allow for much faster and extremely accurate sequencing of otherwise challenging plant genomes that may have highly repetitive sequences and complex polyploidy. The advances in long-read sequencing technology as well as new computational tools have made sequencing and assembly of virtually any species possible. New plant genomes are rapidly making their way into genetic repositories, with 74% of the land plant genome assemblies produced between 2019 and 2021 [6].

Applications for your reference genome

With a high-quality reference genome to hand, an abundance of tools and services can be used to identify important genes, better understand the complicated interactions between them, and develop new plant species through molecular breeding and gene editing:

  • Genome-wide association study (GWAS) or quantitative trait locus (QTL): Identify associations between phenotypic traits and their underlying genes. These studies enable the identification of genetic pathways controlling important phenotypic traits, and for the  development of new marker-assisted selection (MAS) programs for selecting agriculturally important traits in crop breeding. This has led to the production of better varieties of fruits, vegetables, and industrial crops, and an overall improvement in  human health.
  • RNA-sequencing: Transcriptome analysis to determine global changes to gene expression and how this may change in relation to environmental conditions.
  • CRISPR: Directly edit specific targeted genomic regions to understand gene function and metabolic pathways.
  • Bisulfite sequencing: Use your reference genome to construct a methylation genome! Use this in conjunction with RNA-seq to understand if methylation contributes to gene regulation in your plant.

A necessary tool for our future

Plant sciences play a vital role in addressing the critical global challenges of climate change and food security. Reference genomes are an essential part of this and, in the current climate, more crucial than ever.

With the valuable insights that reference genomes provide into plant biology, researchers can develop crops that are better suited to local conditions, produce higher yields, and are more resistant to pests and diseases. As sequencing becomes faster, cheaper, and more reliable, it’s important that we utilize this technology to ensure a sustainable future for generations to come.

Eremid is committed to utilizing the latest sequencing technologies to deliver the highest quality results for a range of needs from de-novo to re-sequencing and gene annotation. By choosing Eremid, researchers can trust that they are receiving the highest quality data and analysis from world-class scientists, supporting the development of tools and products for better health, nutrition, and environmental outcomes.

Discuss a project with us today!


  1. Corlett, R.T., 2016. Plant diversity in a changing world: Status, trends, and conservation needs. Plant Diversity 38, 10–16. https://doi.org/10.1016/j.pld.2016.01.001
  2. Marks, R.A., Hotaling, S., Frandsen, P.B., VanBuren, R., 2021. Representation and participation across 20 years of plant genome sequencing. Nat Plants 7, 1571–1578. https://doi.org/10.1038/s41477-021-01031-8
  3. The Arabidopsis Genome Initiative, 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. https://doi.org/10.1038/35048692
  4. Heather, J.M., Chain, B., 2016. The sequence of sequencers: The history of sequencing DNA. Genomics 107, 1. https://doi.org/10.1016/j.ygeno.2015.11.003
  5. Wenger, A.M., Peluso, P., Rowell, W.J., Chang, P.-C., Hall, R.J., et al., 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37, 1155–1162. https://doi.org/10.1038/s41587-019-0217-9
  6. Marks, R.A., Hotaling, S., Frandsen, P.B., VanBuren, R., 2021. Representation and participation across 20 years of plant genome sequencing. Plants 7, 1571–1578. https://doi.org/10.1038/s41477-021-01031-8




Looking for deeper insights on your next project? Discuss a project
"Eremid provides the support we need to make a global impact in our large immunogenomic oncology clinical studies. The team’s expertise and flexibility from assay design to data delivery is helping us achieve our vision – an ideal research partner." Geoffrey Erickson, Immunis AI, MI USA — Senior Vice President, Corporate Development
"Working with Eremid has been a pleasure. We received excellent data with a very fast turnaround and appreciated the attentive and helpful project management!" Steve Watkins, BCD Biosciences, CA USA — CEO
Trusted by