EvidentialGene genes for Theobroma cacao chocolate tree
-
The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.
Genome Biology 2013, 14:R53,
doi:10.1186/gb-2013-14-6-r53
- Theobroma cacao Genome map
- BLAST Plant Genes
Name Last modified Size
Parent Directory 21-May-2017 19:09 -
blastplants/ 08-Apr-2019 16:24 -
docs/ 08-Apr-2019 15:54 -
genes/ 08-Apr-2019 15:34 -
genome/ 08-Apr-2019 15:41 -
gnodes_cacao22measure/ 24-Oct-2022 23:04 -
intron/ 16-Sep-2011 16:35 -
orthomcl/ 21-Sep-2012 17:36 -
prot/ 19-Oct-2011 22:54 -
relseq/ 21-Mar-2012 15:42 -
rnas/ 09-Mar-2013 16:52 -
scaf4front/ 31-Dec-2011 13:30 -
Theobroma cacao (chocolate bean tree) genes and genome,
2012 March data release.
See also http://www.phytozome.net/cacao.php, NCBI Bioproject PRJNA51633
Publication Title:
The genome sequence of the most widely cultivated cacao type and its use to identify
candidate genes regulating pod color.
http://genomebiology.com/2013/14/6/r53 doi:10.1186/gb-2013-14-6-r53
Authors:
Juan C Motamayor1*^, Keithanne Mockaitis2^, Jeremy Schmutz1,3^, Niina
Haiminen4^, Donald Livingstone III1,5, Omar Cornejo6, Seth Findley1,
Ping Zheng7, Filippo Utro4, Stefan Royaert5, Christopher Saski8, Jerry
Jenkins1,3, Ram Podicheti9, Meixia Zhao10, Brian Scheffler11, Joseph C
Stack1, Alex Feltus8, Guiliana Mustiga1, Freddy Amores12, Wilbert
Phillips13, Jean Philippe Marelli14, Gregory D May15; Howard Shapiro1,
Jianxin Ma10, Carlos D. Bustamante6, Raymond J. Schnell1,5, Dorrie
Main7, Don Gilbert2, Laxmi Parida4 and David N. Kuhn5
Accession numbers
Whole Genome Shotgun project is at DDBJ/EMBL/GenBank under accession number [ALXC00000000].
Annotated genome assembly is at http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA51633
...........
Update 2014 Feb
Cacao genes from the Mars/USDA sponsored project are at top in plant gene-set completeness.
These were built using mixed methods that include more mRNA-assembly than genome-gene models.
Gene set completeness for plant orthologs
ranked by completeness (Bitscores, aaSize, nGroup, Tiny)
Common families All families
Geneset cBits dSize aBits nGroup Tiny
------------------------------------------------------
cacao1ma 671 15 544 15161 111 (0.7%)
cotton 653 3 519 15026 153 (1%)
orange1cn 648 0 499 14249 198 (1.3%)
poplar 639 -2 512 15130 244 (1.6%)
castorbean 631 -7 493 14605 460 (3.1%)
capsella 603 0 435 13397 171 (1.2%)
eucalypt 624 -5 468 13877 312 (2.2%)
soybean 618 -17 477 14559 402 (2.7%)
arabido.th 600 -1 428 13345 135 (1.0%)
arabibo.ly 604 -1 430 13304 253 (1.9%)
brassica 594 2 432 13714 283 (2%)
grape 611 -20 447 13203 726 (5.4%)
amborella 548 -6 355 11766 489 (4.1%)
banana1g 542 -19 369 12537 577 (4.6%)
------------------------------------------------------
Common families n=7540, All families n=15928
Bits = bitscore from blastp, for groups common (cBits) to all and for
all (aBits) families with 3+ plants
dSize = protein size difference from family median
Tiny = count of tiny protein size outliers (-3sd below family median)
Notes: cacao1ma, orange1cn, banana1g are best of 2 independent gene sets for
those species. cotton is close relative to cacao and its gene set has been
built using the cacao1ma gene set (among others). Bitscores are influenced
by phylogeny as well as quality, scores by alignment (somewhat less phylo-dependent)
show same ordering. Protein size is closely +correlated with bitscore.
Ranking quality by protein size and orthology families (nGroup) gives similar
result, but arabido.th and brassica move up to middle (6,7th).
Gene set completeness for plant orthologs
comparing 2 independent gene sets for 3 species
Common families All families
Geneset cBits dSize aBits nGroup Tiny
--------------------------------------------------------
cacao1ma 653 15 547 15161 112 (0.7%)
cacao1cr 641 11 530 14897 235 (1.5%)
orange1cn 629 0 502 14249 199 (1.3%)
orange1jg 610 -21 480 14039 658 (4.6%)
banana1g 522 -19 371 12537 577 (4.6%)
banana1e 521 -21 349 11733 880 (7.5%)
--------------------------------------------------------
Common families n=8461, All families n=15838
Plant comparison gene sets
amborella = amborella genome-gene predictions
BioProject PRJNA212863, http://www.amborella.org/, doi:10.1126/science.1241089
banana1g = Banana genome-gene predictions
BioProject PRJNA81189, http://www.musagenomics.org/, doi:10.1038/nature11241
banana1e = Banana mRNA-seq only assembly with Evigene
http://arthropods.eugenes.org/EvidentialGene/plants/banana/
cacao1cr = Cacao Cirad genome-gene predictions
http://cocoagendb.cirad.fr/ doi:10.1038/ng.736
cacao1ma = Cacao Mars mRNA-assembly + genome-genes with Evigene
BioProject PRJNA51633, http://arthropods.eugenes.org/EvidentialGene/plants/cacao/ doi:10.1186/gb-2013-14-6-r53
orange1cn = Sweet orange, Cn genome-genes gene set
BioProject PRJNA86123, http://citrus.hzau.edu.cn/orange, doi:10.1038/ng.2472
orange1jg = Sweet orange, JGI genome-genes gene set
http://www.phytozome.net/citrus.php
arath = arabido.th, arabidopsis TAIR10,
poptr = poplar, Populus poptr_Ptrichocarpa_156 JGI phytozome
ricco = castorbean, Ricinus v0.1 from castorbean.jcvi.org
soybn = soybean, soybn_Gmax_109 JGI phytozome
vitvi = grape, vitvi_Vvinifera_145 JGI phytozome
soltu = potato, Solanum v3.4 from potatogenomics.plantbiology.msu.edu/
sorbi = sorghum, sorbi_Sbicolor_79 JGI phytozome
cotton = gossypium phytozome/v9.0/Graimondii/
capsella = phytozome/v9.0/Crubella/
eucalyptus = phytozome/v9.0/Egrandis/
brassica = phytozome/v9.0/Brapa/
arabido.ly = phytozome/v9.0/Alyrata/
................................................................................
|