Transcript annotation in FANTOM3

mouse gene catalog based on physical cDNAs.

Norihiro Maeda, Takeya Kasukawa, Rieko Oyama, Julian Gough, Martin Frith, Pär G. Engström, Boris Lenhard, Rajith N. Aturaliya, Serge Batalov, Kirk Beisel, Carol J. Bult, Colin F. Fletcher, Alistair R.R. Forrest, Masaaki Furuno, David Hill, Masayoshi Itoh, Mutsumi Kanamori-Katayama, Shintaro Katayama, Masaru Katoh, Tsugumi Kawashima & 14 others John Quackenbushb, Timothy Ravasi, Brian Z. Ring, Kazuhiro Shibata, Koji Sugiura, Yoichi Takenaka, Rohan D. Teasdale, Christine A. Wells, Yunxia Zhu, Chikatoshi Kai, Jun Kawai, David A. Hume, Piero Carninci, Yoshihide Hayashizaki

Research output: Contribution to journalArticle

119 Citations (Scopus)

Abstract

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

Original languageEnglish
JournalPLoS Genetics
Volume2
Issue number4
DOIs
StatePublished - Apr 2006

Fingerprint

Complementary DNA
transcriptome
gene
mice
prediction
Genes
protein
genes
Transcriptome
proteome
molecular cloning
proteins
Proteome
Organism Cloning
Proteins
programme
cloning

All Science Journal Classification (ASJC) codes

  • Genetics
  • Molecular Biology
  • Ecology, Evolution, Behavior and Systematics
  • Cancer Research
  • Genetics(clinical)

Cite this

Maeda, N., Kasukawa, T., Oyama, R., Gough, J., Frith, M., Engström, P. G., ... Hayashizaki, Y. (2006). Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genetics, 2(4). https://doi.org/10.1371/journal.pgen.0020062

Transcript annotation in FANTOM3 : mouse gene catalog based on physical cDNAs. / Maeda, Norihiro; Kasukawa, Takeya; Oyama, Rieko; Gough, Julian; Frith, Martin; Engström, Pär G.; Lenhard, Boris; Aturaliya, Rajith N.; Batalov, Serge; Beisel, Kirk; Bult, Carol J.; Fletcher, Colin F.; Forrest, Alistair R.R.; Furuno, Masaaki; Hill, David; Itoh, Masayoshi; Kanamori-Katayama, Mutsumi; Katayama, Shintaro; Katoh, Masaru; Kawashima, Tsugumi; Quackenbushb, John; Ravasi, Timothy; Ring, Brian Z.; Shibata, Kazuhiro; Sugiura, Koji; Takenaka, Yoichi; Teasdale, Rohan D.; Wells, Christine A.; Zhu, Yunxia; Kai, Chikatoshi; Kawai, Jun; Hume, David A.; Carninci, Piero; Hayashizaki, Yoshihide.

In: PLoS Genetics, Vol. 2, No. 4, 04.2006.

Research output: Contribution to journalArticle

Maeda, N, Kasukawa, T, Oyama, R, Gough, J, Frith, M, Engström, PG, Lenhard, B, Aturaliya, RN, Batalov, S, Beisel, K, Bult, CJ, Fletcher, CF, Forrest, ARR, Furuno, M, Hill, D, Itoh, M, Kanamori-Katayama, M, Katayama, S, Katoh, M, Kawashima, T, Quackenbushb, J, Ravasi, T, Ring, BZ, Shibata, K, Sugiura, K, Takenaka, Y, Teasdale, RD, Wells, CA, Zhu, Y, Kai, C, Kawai, J, Hume, DA, Carninci, P & Hayashizaki, Y 2006, 'Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.', PLoS Genetics, vol. 2, no. 4. https://doi.org/10.1371/journal.pgen.0020062
Maeda, Norihiro ; Kasukawa, Takeya ; Oyama, Rieko ; Gough, Julian ; Frith, Martin ; Engström, Pär G. ; Lenhard, Boris ; Aturaliya, Rajith N. ; Batalov, Serge ; Beisel, Kirk ; Bult, Carol J. ; Fletcher, Colin F. ; Forrest, Alistair R.R. ; Furuno, Masaaki ; Hill, David ; Itoh, Masayoshi ; Kanamori-Katayama, Mutsumi ; Katayama, Shintaro ; Katoh, Masaru ; Kawashima, Tsugumi ; Quackenbushb, John ; Ravasi, Timothy ; Ring, Brian Z. ; Shibata, Kazuhiro ; Sugiura, Koji ; Takenaka, Yoichi ; Teasdale, Rohan D. ; Wells, Christine A. ; Zhu, Yunxia ; Kai, Chikatoshi ; Kawai, Jun ; Hume, David A. ; Carninci, Piero ; Hayashizaki, Yoshihide. / Transcript annotation in FANTOM3 : mouse gene catalog based on physical cDNAs. In: PLoS Genetics. 2006 ; Vol. 2, No. 4.
@article{0495757cbdcc4370972f1ef39629ab2d,
title = "Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.",
abstract = "The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.",
author = "Norihiro Maeda and Takeya Kasukawa and Rieko Oyama and Julian Gough and Martin Frith and Engstr{\"o}m, {P{\"a}r G.} and Boris Lenhard and Aturaliya, {Rajith N.} and Serge Batalov and Kirk Beisel and Bult, {Carol J.} and Fletcher, {Colin F.} and Forrest, {Alistair R.R.} and Masaaki Furuno and David Hill and Masayoshi Itoh and Mutsumi Kanamori-Katayama and Shintaro Katayama and Masaru Katoh and Tsugumi Kawashima and John Quackenbushb and Timothy Ravasi and Ring, {Brian Z.} and Kazuhiro Shibata and Koji Sugiura and Yoichi Takenaka and Teasdale, {Rohan D.} and Wells, {Christine A.} and Yunxia Zhu and Chikatoshi Kai and Jun Kawai and Hume, {David A.} and Piero Carninci and Yoshihide Hayashizaki",
year = "2006",
month = "4",
doi = "10.1371/journal.pgen.0020062",
language = "English",
volume = "2",
journal = "PLoS Genetics",
issn = "1553-7390",
publisher = "Public Library of Science",
number = "4",

}

TY - JOUR

T1 - Transcript annotation in FANTOM3

T2 - mouse gene catalog based on physical cDNAs.

AU - Maeda, Norihiro

AU - Kasukawa, Takeya

AU - Oyama, Rieko

AU - Gough, Julian

AU - Frith, Martin

AU - Engström, Pär G.

AU - Lenhard, Boris

AU - Aturaliya, Rajith N.

AU - Batalov, Serge

AU - Beisel, Kirk

AU - Bult, Carol J.

AU - Fletcher, Colin F.

AU - Forrest, Alistair R.R.

AU - Furuno, Masaaki

AU - Hill, David

AU - Itoh, Masayoshi

AU - Kanamori-Katayama, Mutsumi

AU - Katayama, Shintaro

AU - Katoh, Masaru

AU - Kawashima, Tsugumi

AU - Quackenbushb, John

AU - Ravasi, Timothy

AU - Ring, Brian Z.

AU - Shibata, Kazuhiro

AU - Sugiura, Koji

AU - Takenaka, Yoichi

AU - Teasdale, Rohan D.

AU - Wells, Christine A.

AU - Zhu, Yunxia

AU - Kai, Chikatoshi

AU - Kawai, Jun

AU - Hume, David A.

AU - Carninci, Piero

AU - Hayashizaki, Yoshihide

PY - 2006/4

Y1 - 2006/4

N2 - The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

AB - The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

UR - http://www.scopus.com/inward/record.url?scp=33646760380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646760380&partnerID=8YFLogxK

U2 - 10.1371/journal.pgen.0020062

DO - 10.1371/journal.pgen.0020062

M3 - Article

VL - 2

JO - PLoS Genetics

JF - PLoS Genetics

SN - 1553-7390

IS - 4

ER -