cacao11genes_pub3f 18 oct 2011, d. gilbert, gilbertd at indiana edu Gene class counts 22797 Class:Strong : >= 66% expression/homology evidence 6832 Class:Medium : >= 33% expression/homology evidence 3971 Class:Weak : worth considering but evidence is weak 8463 Class:Transposon : >= 33% transposon and no/weak expression 3796 Class:Poor : mixed bag of partial models 879 Class:None : drop, no evidence 14925 Class:AltStrong : alternate transcripts, all from EST/Rna assemblies 18 Class:AltMedium ------------------------------------------------------------ Cacao gene sets match to 8 plant species All homologs Plant genes common to all 3 (n=16780) Gene set ngene av.bits ngene av.bitscore cacao11_pub3f 32199 483 16071 651 cacao09_mars 31331 479 16018 642 cacao1_cirad 39072 438 16108 640 Tree gene sets match to common Arabidopsis genes ( TAIR10 n=10542) Nall avbits Nbest av.bitscore cacao 27716 578 10323 628 poplar 26621 524 10296 606 castorb 14673 527 10202 591 grape 16259 494 10135 563 ------------------------------------------------------------ Gene Evidence Summary for cacao11evigene, 2011sept, Evid. Nevd Statistic pub3good pub3all ba3b cirad1 mars09 ------ ------ ------------- ------ ------ ------ ------ ------ EST 49Mb BaseOverlap 0.657 0.693 0.666 0.576 0.620 Pro 36Mb BaseOverlap 0.764 0.802 0.799 0.734 0.756 RNA 67Mb BaseOverlap 0.573 0.614 0.585 0.477 0.526 Intron 161333 SplicesHit 0.907 0.917 0.900 0.817 0.829 T'poson 56Mb BaseOverlap 0.012 0.189 0.188 0.290 0.178 Specif 93Mb BaseOverlap 0.682 0.600 0.503 0.391 0.515 Progene 25481 Perfect 17576 17587 11621 10367 10437 Progene 25481 Equal66% 29183 29263 16492 15452 15581 Progene 25481 Some 36036 38206 23883 23081 22426 Progene 25481 Sensitiv. 0.662 0.665 0.650 0.610 0.616 Progene 25481 Specific. 0.601 0.474 0.366 0.313 0.438 RNAgene 48404 Perfect 23417 25482 17087 12169 12038 RNAgene 48404 Equal66% 32363 34555 19225 15741 15742 RNAgene 48404 Some 37101 40524 24179 21387 21285 RNAgene 48404 Sensitiv. 0.423 0.484 0.448 0.383 0.378 RNAgene 48404 Specific. 0.667 0.560 0.427 0.319 0.442 Homolog 21238 homolog.Nmatch 25430 32881 30929 37537 30641 Homolog 21238 homolog.Nfound 20209 20758 19224 21187 20374 Homolog 21238 homolog.%found 0.952 0.977 0.905 0.998 0.959 Homolog 21238 homolog.bits/aa 0.575 0.521 0.516 0.503 0.529 Homolog -- paralog.Nmatch 24861 33213 31462 36739 27252 Homolog -- paralog.bits/aa 0.534 0.550 0.550 0.565 0.516 Genome -- Coding Mb 37Mb 48Mb 45Mb 44Mb 43Mb Genome -- Exon Mbase 60Mb 78Mb 71Mb 79Mb 65Mb Genome -- Gene count 33600 46738 42227 49308 35601 Genome -- Alt-tr count 14943 14943 -- -- -- ------------------------------------------------------------------------ # Predictor names: pub3good=genes/cacao11pub3good.aaname.gff, pub3all=genes/cacao11pub3all.aaname.gff, ba3b=genes/bestgenes_of11.ba3b.gff, cirad1=genes/cirad1cacao_genetr_mars11.gmap8an.ac3.gff, cons9=genes/cacao9_consensus1_mars11.gmap8an.an3.gff, pub3good : good=Strong/Med/Weak only; 11.oct.18 Gene Models Summary for cacao11evigene, 2011sept ------------------------------------------------------------ Count of genes from genes/cacao11pub3good.gff 33600 Genes (version: pub3good) Evidence support: 33136 have evidence (homology, EST or RNAseq) 29707 have Protein homology 20998 have orthologs (>=33%) 19371 have in-paralogs (>=33%) 30240 have Expression (EST or RNAseq) 18374 have EST (>=33%) 22977 have RNAseq (>=33%) 16959 have >= 95% evidence coverage 26122 have >= 66% evidence coverage 31805 have >= 33% evidence coverage Quality of models: 29673 are full protein genes (complete and protein coding) 290.0, 5.4Kb, 12Mb protein size (median, maximum, sum) 1.5Kb, 18Kb, 60Mb transcript size (median, maximum, sum) 62% coding/transcript ratio 14943 are alternate transcripts to 7638 genes 2.3Kb, 17Kb, 39Mb alt-transcript size (median, maximum, sum) 43 are partial protein genes (for missing start, internal stops) 3945 may be noncoding or aberrant models (for pCDS <= 0.33 or lenCDS < 120) 232 have transposon match >=33% ------------------------------------------------------------ Count of genes from genes/cacao11pub3all.gff 46738 Genes (version: pub3all) Evidence support: 45287 have evidence (homology, EST or RNAseq) 38639 have Protein homology 24450 have orthologs (>=33%) 26933 have in-paralogs (>=33%) 37796 have Expression (EST or RNAseq) 20900 have EST (>=33%) 26761 have RNAseq (>=33%) 20049 have >= 95% evidence coverage 32469 have >= 66% evidence coverage 42656 have >= 33% evidence coverage Quality of models: 38634 are full protein genes (complete and protein coding) 248.0, 5.4Kb, 16Mb protein size (median, maximum, sum) 1.4Kb, 18Kb, 78Mb transcript size (median, maximum, sum) 62% coding/transcript ratio 14943 are alternate transcripts to 7638 genes 2.3Kb, 17Kb, 38Mb alt-transcript size (median, maximum, sum) 297 are partial protein genes (for missing start, internal stops) 7868 may be noncoding or aberrant models (for pCDS <= 0.33 or lenCDS < 120) 7914 have transposon match >=33% ------------------------------------------------------------