Theobroma cacao public gene set pub3i 08 March 2012, D. Gilbert, gilbertd at indiana edu ------------------------------------------------------------ Gene sets compared: Mars.v11 = this Theobroma cacao public gene set pub3i Cirad.v1 = Nature Genetics 2011, doi:10.1038/ng.736, http://www.nature.com/ng/journal/v43/n2/full/ng.736.html Table E1. Cacao gene sets summary counts Statistic Mars.v11 Cirad.v1 --------- ------- -------- Locus count 29283 29484 Same locus+CDS 13519 13709 Same locus/different 8599 10646 Unique locus 7337 4928 Alternate transcripts 14920 0 Poor models 17244 17342 Coding bases 35 Mb 34 Mb Exon bases 54 Mb 48 Mb ave protein size 319 286 ave transcript size 2.3 Kb 1.5 Kb ---------------------------------------- Locus = good gene loci, excluding those identified as transposons, fragments, or unsupported by gene evidence. Alternate transcripts of Mars gene set are all from EST/RNA transcript assemblies. Poor models are not counted for coding and transcript sizes. Same/unique loci for two gene sets are described in tables E4, E5. Table E2. Cacao gene evidence recovered in gene sets Evidence Nevd Mars Cirad --------- ------ ---- ---- Proteins 36Mb 76% 73% RNA exons 67Mb 57% 48% Introns 161333 91% 82% RNA genes 48404 67% 32% ----------------------------------- Proteins and RNA exons are bases of evidence aligned to genome, and percent of gene models that match those. Introns are number of unique introns from multiple EST/RNA reads, and percent of gene models matching both splice ends. RNA genes are unique transcript assemblies, and percent gene models that align >= 66% . Table E3. Homology average for gene set proteins Tree gene set TAIR10 Plant8 ----------------------------------- Cacao11_mars 632 549 Cacao1_cirad 620 522 Poplar 609 Castor bean 591 Grape 563 ----------------------------------- TAIR10= average blastp bitscore to Arabidopsis, TAIR10, using 10253 TAIR genes that are common best matches to all 5 gene sets. Plant8= average blastp bitscore to best matching plant protein of 8 plant proteome sets. Note that difference in gene sets of Cacao are in same range as difference among tree species gene sets, so that phylogeny and gene construction quality differences are confounded.