EvidentialGene set 3 for Bemisia tabaci whitefly, 2016/17 The EvidentialGene assembly of Bemisia tabaci whitefly genes, is updated from one I did in 2012 (EVm2bt below). This update EVm3bt adds newer RNA-seq assemblies, further improving those genes. Though the earlier Evigene version is fairly accurate, this update is more accurate/complete by objective orthology measures than the recently available Bemisia tabaci genome gene models. This Evigene assembly is "reference free", assembled directly from RNA-seq without using chromosomes for modeling genes. Evigene methods are doing very well in comparison to current popular gene reconstruction methods: Trinity-only RNA assembly, PacBio RNA assembly, MAKER genome gene modeling, NCBI EGAP/RefSeq modeling, Ensembl gene modeling, for animals and plants including corn/maize, pine trees, mosquitos, honey bee, beetles, water fleas, ticks, fishes, and mice. Not only are ortholog genes more accurately reconstructed with these methods, but also recent comparison with corn gene sets (Zea mays) shows that paralogs (identifed on whole genome duplicate chromosomes) and alternate transcripts are more accurately assembled. Pine tree gene sets compare results of Evigene, MAKER-genome, Trinity-only Illumina and PacBio gene reconstruction. See this recent work at http://arthropods.eugenes.org/EvidentialGene/ - Don Gilbert ------ Bemisia tabaci whitefly gene sets EVm3bt = Evigene gene assembly, 2016/2017 update EVm2bt = Evigene gene assembly, 2012 Gen6bt = Whitefly genome project MEAM1 genes modeled with MAKER, 2016, whiteflygenomics.org Ncb6bt = NCBI RefSeq gene models, 2016 Tsawfa = TSA.GARQ gene assembly 2015, Trinity of Illumina Tsawfb = TSA.GBII gene assembly 2015, Trinity of Illumina Protein homology summary Reference Pea aphid (NCBI RefSeq 2016) nref=15199 Source nFound %Aln Iden Algn Qlen AlgnH EVm3bt 15104 90 239.5 431.5 679.7 434.2 EVm2bt 14817 85.6 227 413 633 423.9 Ncb6bt 14822 85.5 226.4 414.7 692.8 425.2 Gen6bt 14390 80.2 210.7 386 669.2 407.7 Tsawfb 13669 71 173.4 328.7 544.5 365.5 Tsawfa 13340 68 162.3 309 514 352 ================================================= Reference Fruit fly (2015) nref=10296 Source nHit %Aln Iden Algn Qlen AlgnH, nref=10296 EVm3bt 10242 81.8 258.7 464.2 606.6 466.7 Ncb6bt 10152 78.7 245.9 448.4 617 454.8 Gen6bt 9964 74.9 232.1 423.8 601.8 437.9 Tsawfb 9406 67.4 192.3 363.2 515.1 397.6 Tsawfa 9148 65.1 182 344.5 504.4 387.7 ================================================= Reference Fruit fly BUSCO-conserved set (2015) nref=3018 Source nHit %Aln Iden Algn Qlen AlgnH EVm3bt 3016 85.4 312.6 526.3 612.9 526.7 Ncb6bt 3008 83 300.8 511.9 607.6 513.6 Gen6bt 2966 79.2 287.2 487.8 610.7 496.4 Tsawfb 2917 74.4 250.7 435.4 524 450.5 Tsawfa 2741 68.6 228.5 399.8 527.6 440.2 ================================================= Statistics nFound = number of ref proteins w/ signif. alignment pAlgn = average percent align to reference protein Iden = average protein identity alignment, aminos Algn = average protein alignment, aminos Qlen = average protein length, of query source B.t. AlgnH = average protein alignment for those found, aminos