Meet the volunteer researchers behind TAIR12: Yasin Kaya, Max Planck Institute for Plant Breeding Research

Over the past two years TAIR has taken the lead in organizing a community effort to reannotate the Arabidopsis thaliana genome. The result is TAIR12, soon to be released on the European Nucleotide Archive (ENA). The project was only made possible through the work of nearly 100 volunteers based in labs all over the world who generously donated their time and expertise. Scientists like Yasin Kaya, a PhD candidate at the Max Planck Institute for Plant Breeding Research in Cologne Germany, chose to contribute to this work for the benefit of science. A. thaliana is key to understanding plant genetics, with implications for fields from plant biology to pharmacology to crop research. TAIR, as the most comprehensive and trusted resource on A. thaliana, is essential to that work. TAIR12 incorporates the latest discoveries and evidence using the most advanced techniques and technologies. For researchers who use TAIR in their daily work, that means greater accuracy and reliability, superior experiment design, more precise hypotheses, and ultimately better, more efficient research.

In this series, we’re pleased to introduce you to some of the volunteers who helped make this possible. Please join us in expressing gratitude for their efforts!

I hope this massive community-driven effort inspires and provides a blueprint for similar manual annotation initiatives in other model organism communities. It truly highlights the continued and critical importance of expert human curation, even in our highly automated genomic era. – Yasin Kaya

Meet Yasin Kaya

As a PhD candidate with the Max Planck Institute for Plant Breeding Research, Yasin Kaya focuses on the evolutionary genomics of structural variation. In particular, he’s working on pan-genome assemblies for Arabidopsis thaliana and Arabis alpina from tropical alpine ecosystems. Kaya frequently uses TAIR, not just as a sequence database, but for the ability to search for a gene and immediately see a curated summary of its function, links to the primary literature, mutant phenotypes, stock center information, and expression data is invaluable. “This curated connection to decades of research is what fills in the gaps and allows you to truly understand what a gene does, which is something a simple genome browser just can’t provide.”

Why did you choose to volunteer your time to support the TAIR12 re-annotation project?

My primary motivation was a strong sense of gratitude toward the Arabidopsis community. As a researcher who has relied heavily on TAIR for many years, I’ve directly benefited from the incredible resources it provides. The TAIR12 re-annotation is a significant milestone for the entire field, and I saw this as a perfect opportunity to contribute back. I wanted to be part of the collective effort to provide a more accurate, up-to-date, and reliable annotation for all the researchers who, like me, depend on this data for their work. It felt important to help maintain and improve the foundational tools I use every day.

Why is reannotation important to the plant biology research community?

Reannotation is fundamentally about reliability and precision. The gene models in a reference annotation are the foundation for a vast amount of downstream analysis—from RNA-seq quantification and variant effect prediction to proteomics and comparative genomics. Automated annotation is a great start, but it’s not perfect.

How will access to the updated information impact research in the field?

This manual reannotation increases the sensitivity and accuracy of those gene models. Access to this updated information will have a direct, positive impact on research. For example, a corrected gene model might reveal a previously unknown isoform crucial for a specific stress response, clarify the true start site of a gene, or fix an error that was causing misleading results in a differential expression study. Ultimately, a meticulously curated annotation like TAIR12 ensures that the entire community is working from the most accurate “ground truth” possible, which is essential for robust and reproducible science.

What work did you perform as part of the project?

As part of the TAIR12 project, my work involved the manual curation and validation of Arabidopsis thaliana gene models. This process is like genomic detective work: we integrate various data types, such as new RNA-seq evidence, protein alignments, and data from other Arabidopsis accessions, to confirm or correct existing gene structures. My focus was on ensuring the accuracy of these models — validating predicted isoforms, resolving cases where gene models might have been incorrectly merged or split by automated pipelines, and ensuring the annotations reflect the most up-to-date biological evidence.

What was the most challenging aspect of the work? The most rewarding?

The most challenging aspect was definitely resolving ambiguous cases. It’s often not a simple “yes” or “no.” You have to weigh multiple, sometimes conflicting, pieces of evidence—like a faint RNA-seq signal versus a strong protein homology match. Deciding whether an isoform was a true biological signal or just transcriptional noise, or determining the correct boundaries between two closely adjacent genes, required a lot of careful judgment.

The most rewarding part was the “eureka” moment when you finally solved one of those difficult gene models. Knowing that you’ve corrected a persistent error or validated a complex new isoform — and that this correction will now be part of the official reference for everyone in the field — is incredibly satisfying. It feels like polishing a small but important part of the community’s shared knowledge.

What are your hopes for TAIR12 and its impact in the field?

My hope is that TAIR12 serves as both a critical resource and an inspiration. Internally, for the Arabidopsis thaliana community, it will accelerate research by providing a much more complete and accurate set of gene models. This is crucial for everything from functional studies to my own field of comparative and pan-genomics; more reliable gene sets make comparisons between species or accessions far more robust.

Externally, I hope this massive community-driven effort inspires and provides a blueprint for similar manual annotation initiatives in other model organism communities. It truly highlights the continued and critical importance of expert human curation, even in our highly automated genomic era.

There’s more to come! Watch this space, for researcher profiles, publication information, and other news and updates on TAIR12.