Over the past two years TAIR has taken the lead in organizing a community effort to reannotate the Arabidopsis thaliana genome. The result is TAIR12, soon to be released on the European Nucleotide Archive (ENA). The project was only made possible through the work of nearly 100 volunteers based in labs all over the world who generously donated their time and expertise. Scientists like Emmanuel Boutet, a plant biocurator for the Swiss Institute of Bioinformatics chose to contribute to this work for the benefit of science. A. thaliana is key to understanding plant genetics, with implications for fields from plant biology to pharmacology to crop research. TAIR, as the most comprehensive and trusted resource on A. thaliana, is essential to that work. TAIR12 incorporates the latest discoveries and evidence using the most advanced techniques and technologies. For researchers who use TAIR in their daily work, that means greater accuracy and reliability, superior experiment design, more precise hypotheses, and ultimately better, more efficient research.

In this series, we’re pleased to introduce you to some of the volunteers who helped make this possible. Please join us in expressing gratitude for their efforts!

 

Meet Emmanuel Boutet

Emmanuel Boutet Emmanuel Boutet focuses on plant biology and physiology, with a specific emphasis on protein sequence analysis and functional characterization. As a plant biocurator with the Swiss-Prot Group at the Swiss Institute of Bioinformatics, Boutete uses TAIR every day to curate proteins, validate uncertain models, and map publications.

Why did you choose to volunteer your time to support the TAIR12 re-annotation project?

UniProtKB/Swiss-Prot aims to provide a comprehensive, up to date sequence and function description for experimentally characterized proteins, especially for key model species such as A. thaliana. It is essential for our knowledgebase to have high-quality genomic annotations to deliver the best possible proteome for this organism. TAIR is a highly reliable Model Organism Database (MOD) that offers excellent tools for sequence analysis and literature exploration, and it contributes to Gene Ontology (GO) curation, which we also integrate into UniProtKB/Swiss-Prot. Thus, it was vital for us to be involved in their reannotation effort.

Why is reannotation important to the plant biology research community? 

Reannotation of the genome assembly and of gene models is crucial to utilize recent technologies for tackling complex genome annotation challenges, such as repetitive regions of the genome, short gene models, non-coding RNAs, and transposable elements. 

How will access to the updated information impact research in the field?

The plant community will benefit from this upgrade by having access to up-to-date knowledge, both for working with A. thaliana and for extrapolating data to other species. Once the updated coding sequences from TAIR12 are made publicly available, their derived protein sequences will be imported into UniProtKB to provide updated models in our database. This will significantly benefit the plant research community, as the A. thaliana proteome at UniProtKB/Swiss-Prot is leveraged to improve the curation of other plant proteomes and to train bioinformatics tools.

What work did you perform as part of the project?

My contribution to the TAIR12 reannotation project involved correcting and validating protein-coding gene models using the Apollo interface, informed by various sequences, including mRNAs, ESTs and RNASeq. Extensive earlier efforts by myself and others at UniProtKB/Swiss-Prot had already curated more than 16,000 gene models using previous Arabidopsis thaliana assemblies. Comparing these curated models to new TAIR12 models allowed us to confirm the new models in some cases, and in other cases to correct the TAIR12 models or identify models needing updates in UniProtKB/Swiss-Prot.

What was the most challenging aspect of the work? The most rewarding?

The challenge of this task was to validate numerous models within a tight time frame while ensuring the highest possible quality, understanding that this work would serve as a reference for other researchers. The most rewarding aspect was observing that most of the models validated for TAIR12 matched the models previously curated in UniProtKB/Swiss-Prot.

What are your hopes for TAIR12 and its impact in the field?

I look forward to a substantial improvement to the updated catalog of protein isoforms compared to TAIR / ARAPORT11 resulting from the careful use of experimental evidence in the TAIR12 annotation process, along with an overall enhancement in genetic models and improved coverage of centromeres and telomeres. The significance of this substantial effort to maintain the accuracy of a leading MOD should be widely acknowledged and replicated by other plant MODs.

 


 

There’s more to come! Watch this space, for researcher profiles, publication information, and other news and updates on TAIR12.