Cacao genes from the Mars/USDA sponsored project are at top in plant gene-set completeness. These were built using mixed methods that include more mRNA-assembly than genome-gene models. Banana genes assembled only from mRNA are about as complete as banana genome-gene and and amborella genome-gene sets. Banana expressed mRNA-seq has about 10% fewer orthology groups than the genome-set, a rate I find generally for various species. Gene set completeness for plant orthologs ranked by completeness (Bitscores, aaSize, nGroup, Tiny) Common families All families Geneset cBits dSize aBits nGroup Tiny ------------------------------------------------------ cacao1ma 671 15 544 15161 111 (0.7%) cotton 653 3 519 15026 153 (1%) orange1cn 648 0 499 14249 198 (1.3%) poplar 639 -2 512 15130 244 (1.6%) castorbean 631 -7 493 14605 460 (3.1%) capsella 603 0 435 13397 171 (1.2%) eucalypt 624 -5 468 13877 312 (2.2%) soybean 618 -17 477 14559 402 (2.7%) arabido.th 600 -1 428 13345 135 (1.0%) arabibo.ly 604 -1 430 13304 253 (1.9%) brassica 594 2 432 13714 283 (2%) grape 611 -20 447 13203 726 (5.4%) amborella 548 -6 355 11766 489 (4.1%) banana1g 542 -19 369 12537 577 (4.6%) ------------------------------------------------------ Common families n=7540, All families n=15928 Bits = bitscore from blastp, for groups common (cBits) to all and for all (aBits) families with 3+ plants dSize = protein size difference from family median Tiny = count of tiny protein size outliers (-3sd below family median) Notes: cacao1ma, orange1cn, banana1g are best of 2 independent gene sets for those species. cotton is close relative to cacao and its gene set has been built using the cacao1ma gene set (among others). Bitscores are influenced by phylogeny as well as quality, scores by alignment (somewhat less phylo-dependent) show same ordering. Protein size is closely +correlated with bitscore. Ranking quality by protein size and orthology families (nGroup) gives similar result, but arabido.th and brassica move up to middle (6,7th). Gene set completeness for plant orthologs comparing 2 independent gene sets for 3 species Common families All families Geneset cBits dSize aBits nGroup Tiny -------------------------------------------------------- cacao1ma 653 15 547 15161 112 (0.7%) cacao1cr 641 11 530 14897 235 (1.5%) orange1cn 629 0 502 14249 199 (1.3%) orange1jg 610 -21 480 14039 658 (4.6%) banana1g 522 -19 371 12537 577 (4.6%) banana1e 521 -21 349 11733 880 (7.5%) -------------------------------------------------------- Common families n=8461, All families n=15838 Plant comparison gene sets amborella = amborella genome-gene predictions BioProject PRJNA212863, http://www.amborella.org/, doi:10.1126/science.1241089 banana1g = Banana genome-gene predictions BioProject PRJNA81189, http://www.musagenomics.org/, doi:10.1038/nature11241 banana1e = Banana mRNA-seq only assembly with Evigene http://arthropods.eugenes.org/EvidentialGene/plants/banana/ cacao1cr = Cacao Cirad genome-gene predictions http://cocoagendb.cirad.fr/ doi:10.1038/ng.736 cacao1ma = Cacao Mars mRNA-assembly + genome-genes with Evigene BioProject PRJNA51633, http://arthropods.eugenes.org/EvidentialGene/plants/cacao/ doi:10.1186/gb-2013-14-6-r53 orange1cn = Sweet orange, Cn genome-genes gene set BioProject PRJNA86123, http://citrus.hzau.edu.cn/orange, doi:10.1038/ng.2472 orange1jg = Sweet orange, JGI genome-genes gene set http://www.phytozome.net/citrus.php arath = arabido.th, arabidopsis TAIR10, poptr = poplar, Populus poptr_Ptrichocarpa_156 JGI phytozome ricco = castorbean, Ricinus v0.1 from castorbean.jcvi.org soybn = soybean, soybn_Gmax_109 JGI phytozome vitvi = grape, vitvi_Vvinifera_145 JGI phytozome soltu = potato, Solanum v3.4 from potatogenomics.plantbiology.msu.edu/ sorbi = sorghum, sorbi_Sbicolor_79 JGI phytozome cotton = gossypium phytozome/v9.0/Graimondii/ capsella = phytozome/v9.0/Crubella/ eucalyptus = phytozome/v9.0/Egrandis/ brassica = phytozome/v9.0/Brapa/ arabido.ly = phytozome/v9.0/Alyrata/ ................................................................................