Bemisia tabaci whitefly (cotton/crop plant pest) Evigene compared to other recent gene sets, 2017 june update Gene sets of Whitefly measured against reference proteins Reference Pea aphid Fruit fly Conserved F.Fly Geneset Found% AlnF% AlnT% Found% AlnF% AlnT% Found% AlnF% AlnT% BtEvig 81.2 90.0 88.0 74.1 81.8 74.9 98.7 85.4 80.6 BtNCBI 79.7 85.5 82.3 73.4 78.7 71.6 98.5 83.0 78.1 BtMakr 77.4 80.2 73.8 72.1 74.9 66.0 97.1 79.2 72.9 BtTrin 73.5 71.0 59.2 68.0 67.4 53.2 95.5 74.4 64.1 ---------------- ----------------- ----------------- 3b. Intron recovery for Bemisia tabaci gene sets (ni=134153 of RNA-seq mapped to chrs) Geneset GeneTr valExon Found% BtEvig 63368 91946 68.5 BtNCBI 20506 93157 69.4 BtMakr 13825 77455 57.7 BtTrin 20534 67879 50.5 ----------------------- Bemisia tabaci gene sets compared BtEvig = Evigene gene assembly, 2016 update (vers 3), available at http://arthropods.eugenes.org/EvidentialGene/arthropods/whitefly/whitefly3evigene/ BtEVm2 = Evigene gene assembly, 2012, one RNA sample at /EvidentialGene/arthropods/whitefly/whitefly2evigene/ BtNCBI = NCBI RefSeq gene models, 2016, from ncbi:/genomes/refseq/invertebrate/Bemisia_tabaci/representative/GCF_001854935.1_ASM185493v1/ BtMakr = genes modeled with MAKER, 2016, from Whitefly genome project, whiteflygenomics.org BtTrin = TSA.GBII gene assembly 2015 at GenBank, Trinity of Illumina (BtTrinB is better than BtTrinA) BtTrinA = TSA.GARQ gene assembly 2015 from GenBank, Trinity of Illumina from ncbi:/genbank/tsa/tsa.GARQ,tsa.GBII entries Reference genes: Pea aphid Acyr. pisum, NCBI RefSeq 2016, total primary isoforms n=18601 Fruit fly, NCBI RefSeq 2015, total primary isoforms n=13828 Conserved Fruit fly, NCBI RefSeq 1-copy genes identified by BUSCO, total n=3055 Method: BLASTp -query ref_proteins -db allgenesets_proteins -evalue 1e-5 .. Statistics Found% = percent of reference proteins w/ signif. alignment AlnF,AlignF% = average percent align to found reference protein AlnT,AlignT% = percent align to all reference proteins Introns Found% = percent of evidence introns aligned to gene set exons, intron evidence from Illumina RNA-seq mapped to chromosome assemblies ============================================================= Protein homology details Reference Pea aphid (NCBI RefSeq 2016) nref=15199 Source Found% AlnT% nFound Iden Algn Qlen AlgnH EVm3bt 81.2 90 15104 239.5 431.5 679.7 434.2 EVm2bt 79.7 85.6 14817 227 413 633 423.9 Ncb6bt 79.7 85.5 14822 226.4 414.7 692.8 425.2 Gen6bt 77.4 80.2 14390 210.7 386 669.2 407.7 Tsawfb 73.5 71 13669 173.4 328.7 544.5 365.5 Tsawfa 71.7 68 13340 162.3 309 514 352 ============================================================ Reference Fruit fly (NCBI RefSeq 2015) nref=10296 Source Found% AlnT% nFound Iden Algn Qlen AlgnH EVm3bt 74.1 81.8 10242 258.7 464.2 606.6 466.7 Ncb6bt 73.4 78.7 10152 245.9 448.4 617 454.8 Gen6bt 72.1 74.9 9964 232.1 423.8 601.8 437.9 Tsawfb 68.0 67.4 9406 192.3 363.2 515.1 397.6 Tsawfa 66.2 65.1 9148 182 344.5 504.4 387.7 -------------------------------------------- Reference Fruit fly BUSCO-conserved set (2015) nref=3018 Source Found% AlnT% nFound Iden Algn Qlen AlgnH EVm3bt 98.7 85.4 3016 312.6 526.3 612.9 526.7 Ncb6bt 98.5 83 3008 300.8 511.9 607.6 513.6 Gen6bt 97.1 79.2 2966 287.2 487.8 610.7 496.4 Tsawfb 95.5 74.4 2917 250.7 435.4 524 450.5 Tsawfa 89.7 68.6 2741 228.5 399.8 527.6 440.2 =========================================================== Reference Tribolium beetle (NCBI RefSeq 2015) nref=10627 Source nFound %Aln Iden Algn Qlen AlgnH EVm3bt 10573 91.5 284.8 491.6 640.1 494.1 Ncb6bt 10484 88.1 272 476.9 652.6 483.4 Gen6bt 10312 83.6 255 447.9 633.8 461.6 Tsawfb 9778 74.5 208.9 381.5 535.4 414.6 Tsawfa 9522 72 197.8 362.3 527.2 404.3 ===================================================== Statistics nFound = number of ref proteins w/ signif. alignment pAlgn = average percent align to reference protein Iden = average protein identity alignment, aminos Algn = average protein alignment, aminos Qlen = average protein length, of query source B.t. AlgnH = average protein alignment for those found, aminos nref = total reference proteins of primary isoform with alignment to any target gene (less than total reference proteins); Reference genes: Pea aphid Acyr. pisum, NCBI RefSeq 2016, total primary isoforms n=18601 Fruit fly, NCBI RefSeq 2015, total primary isoforms n=13828 Fruit fly, NCBI RefSeq BUSCO conserved 1-copy genes, total primary isoforms n=3055 Tribolium beetle, NCBI RefSeq 2015, total primary isoforms n=12420 Methods BLASTp -query ref_proteins.fa -db allgenesets_proteins.fa -evalue 1e-3 .. Tabulate best aligned protein of each wf gene set to each ref protein, Count significant aligned ref proteins per gene set, average total alignment, identical aminos alignment, percent align/ref size