The Arabidopsis thaliana genome reannotation project, TAIR12, represents a significant milestone in the plant biology research landscape. This ambitious and community-driven initiative, led by the TAIR (The Arabidopsis Information Resource) team at Phoenix Bioinformatics, advances the accuracy and depth of genome data for this vital model organism. A. thaliana was the first plant genome that was completely sequenced and has served as a reference for other plants for over 25 years. The last update to the genome was almost 10 years ago. Since then sequencing, assembly, and genome annotation technology have improved dramatically, providing the opportunity to generate a high fidelity, gapless genome sequence assembly and an extremely well-supported structural gene annotation on top of it.
This project is unique in that it relied on a voluntary community effort with the infrastructure and organization provided by TAIR and Phoenix. Over 100 scientists from all over the world with expertise in many different areas, all sharing their time and skill for the good of the larger research community. The Schneeberger laboratory at the Max Planck Institute in Cologne assembled the new chromosome backbone, the foundation of the reannotation, from 13 high quality long read genome sequences of the reference ecotype Col-0. The Pikaard laboratory at Indiana University provided the sequences of the nucleolar organizer regions (NORs) at the tips of chromosomes 2 and 4 so that we could create a true complete genome of Col-0. The US National Center for Bioinformatics (NCBI) ran its Eukaryotic Genome Annotation Pipeline on this assembly and provided the resulting files to the TAIR team for review. The TAIR team coordinated the expert review of over 10% of the predicted protein coding genes, focusing on those that had changed since the Araport11 annotation. Other collaborative groups took on the task of full annotation or reannotation of the transposable elements, lncRNAs, ribosomal RNAs, repeat elements, centromere, telomere, and NORs. The final integration and quality control of all newly annotated and reviewed genes and other sequence elements was taken on by the TAIR team.
The expert review process required meticulous attention to detail while examining the supporting data. It involved primary and secondary checks on updated genes, comparisons with external protein databases like Peptide Atlas, and the resolution of gene overlaps or sequence issues.
Leading the project at Phoenix Bioinformatics is Tanya Berardini, supported by a dedicated team including Alyssa Proisa, Leonore Reiser, Shabari Subramaniam and Xingguo Chen. Since 2013, Phoenix Bioinformatics’ work, including the TAIR team’s effort in this reannotation, has been funded by the user community through subscriptions.
The TAIR12 reannotation will continue to shape plant genomics by providing the most comprehensive and reliable reference genome that has ever been available. The reannotation serves the Arabidopsis research community and strengthens the foundation for comparative genomics and functional studies in other plant species.
Stay tuned for future updates as the team moves forward with completing this comprehensive annotation project, making the results available first through GenBank and then through other resources that serve the scientific community.