Genes contstructed from publicly available RNA-Seq data from NCBI SRA, 2016.01 collected by Don Gilbert, gilbertd At indiana.edu, EvidentialGene at euGenes.org ================= evigene trasm + gene set builds of Anopheles species genes, XSEDE:comet.sdsc.edu: /home/ux455375/scratchn/chrs/aabugs/bugs/anoph/ TEST case Ano. funestus, anofun2srr trasm, evg2anofun: a. same single RNA-seq set common source for 2015 AfunF1.3 genoasm, 47 M pairs, 6 GB seq "SRX265161","'from individual 'Anopheles funestus FUMOZ'","Anopheles funestus","Illumina HiSeq 2000", "BI","SRP021067","RNA sequencing of 15 genomes of Anopheles","SRS408962","","6158.98","1","46819779","9457595358","Solexa-130247","RNA-Seq","TRANSCRIPTOMIC","cDNA" b. four rna-seq assemblers (as per daphnia.pulex, tribolium cast. ) velvet/oases : v1.2.10 2013 idba-tran : v.1.1.1 2013 soap-trans : v.1.03 2013 trinity : trinityrnaseq_r20140717 (for anofun, v2.1.1 for anoalb) Also evg1anoalb for Ano. albimanus AalbS1, 1/2 RNAseq SRA sets = anoalb1, anoalb2 Also evg1anofun for Ano. funestus AfunF1.3, newer RNAsesq ERA sets (but not as useful?, heteroz mixed pop source) = anofun1pola, anofun1norr Anopheles funestus 2015 AfunF1.3 genoasm, gene set (maker?), trasm https://www.vectorbase.org/organisms/anopheles-funestus AnoFun older 2010 trasm, http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0014202 stats: 13,344 prot genes, 225.2 Mb chrasm, 1,392 nscaf; 672 N50, GCA_000349085.1 Anopheles albimanus * 2015 genoasm, gene set (maker?), trasm https://www.vectorbase.org/organisms/anopheles-albimanus AnoAlb trasm 2012 http://www.biomedcentral.com/1471-2164/13/207 == SRX143412 reads + EST stats: 11,911 prot genes; 170.5 Mb chrasm, 204 nscaf, 18kb N50, GCA_000349125.1 RNA-seq and MAKER genes reported in doi: 10.1126/science.1258522, 2015 Highly evolvable malaria vectors:the genomes of 16 Anopheles mosquitoes #========================================================= Disagreement of EvgM vs Genogmodels: -- ~25k prot loci from EvgM vs 13k prot loci from vecbase/maker gmodels for both anofun and anoalb -- major subset of diff is 1-exon loci: >10k for evg vs <2k for gmodels -- inspection of new evg loci suggests many are insect-prot homologs, not TE prots -- need aaeval of all to determine if inspection is accurate, and if these are paralogs of found gmodel loci, vs unfound ortho genes -- EvgM of anofun includes subset of (a) fragment genes, (b) missed genes, likely due to smallish RNA-seq set (45 M read pairs). Eg. large genes like titin, dscam, .. appear as many fragment evgm loci. -- on other hand, many EvgM genes point to errors of gmodels: gene joins and alt-exon joins, splits, etc. #========================================================= rna assembler methods Anopheles funestus, evg2anofunz4b #------------------------------ assembler (longest 999 genes): Count Unique Method 292,29.2% 203,20.3%u idba 110,11.0% 41, 4.1%u soap 95, 9.5% 48, 4.8%u trin 699,70.0% 613,61.4%u velv kmer (longest 999 genes): 67, 6.7% 16, 1.6%u k05 163,16.3% 79, 7.9%u k25 401,40.1% 295,29.5%u k35 274,27.4% 159,15.9%u k45 233,23.3% 121,12.1%u k55 173,17.3% 92, 9.2%u k65 80, 8.0% 32, 3.2%u k75 67, 6.7% 29, 2.9%u k85 51, 5.1% 17, 1.7%u k95 assembler (longest 9999 genes): Count Unique Method 4092,40.9% 2450,24.5%u idba 2059,20.6% 682, 6.8%u soap 1754,17.5% 561, 5.6%u trin 6122,61.2% 4505,45.1%u velv kmer (longest 9999 genes): 1495,15.0% 263, 2.6%u k05 2785,27.9% 1156,11.6%u k25 4047,40.5% 2077,20.8%u k35 3053,30.5% 1112,11.1%u k45 2831,28.3% 983, 9.8%u k55 2173,21.7% 680, 6.8%u k65 1520,15.2% 399, 4.0%u k75 1117,11.2% 378, 3.8%u k85 719, 7.2% 213, 2.1%u k95 #------------------------------ Accurate ortholog genes assembler (BUSCOdmel 2648 genes): Count Unique Method 1269,47.9% 700,26.4%u idba 686,25.9% 178, 6.7%u soap 515,19.4% 90, 3.4%u trin 1655,62.5% 1054,39.8%u velv kmer (BUSCOdmel 2648 genes): 494,18.7% 107, 4.0%u k05 822,31.0% 261, 9.9%u k25 1089,41.1% 465,17.6%u k35 925,34.9% 293,11.1%u k45 883,33.3% 245, 9.3%u k55 731,27.6% 165, 6.2%u k65 540,20.4% 103, 3.9%u k75 411,15.5% 119, 4.5%u k85 240, 9.1% 68, 2.6%u k95 #------------------------------ Anopheles albimanus, evg4anoalb #----------------------------- assembler (longest 9999 genes): Count Unique Method 4622,46.2% 1464,14.6%u idba 2900,29.0% 352, 3.5%u soap 2408,24.1% 305, 3.1%u trin 7636,76.4% 4492,44.9%u velv kmer (longest 9999 genes): 2219,22.2% 80, 0.8%u k05 3903,39.0% 811, 8.1%u k25 5130,51.3% 1255,12.6%u k35 4897,49.0% 771, 7.7%u k45 4553,45.5% 492, 4.9%u k55 4764,47.6% 779, 7.8%u k65 4341,43.4% 544, 5.4%u k75 3920,39.2% 375, 3.8%u k85 3460,34.6% 168, 1.7%u k95 assembler (longest 999 genes): Count Unique Method 477,47.7% 168,16.8%u idba 290,29.0% 49, 4.9%u soap 136,13.6% 20, 2.0%u trin 741,74.2% 441,44.1%u velv kmer (longest 999 genes): 160,16.0% 2, 0.2%u k05 301,30.1% 55, 5.5%u k25 519,52.0% 149,14.9%u k35 511,51.2% 90, 9.0%u k45 496,49.6% 78, 7.8%u k55 458,45.8% 67, 6.7%u k65 393,39.3% 29, 2.9%u k75 359,35.9% 27, 2.7%u k85 314,31.4% 10, 1.0%u k95 Accurate ortholog genes: assembler (BUSCOdmel 2561 genes): Count Unique Method 1082,42.2% 309,12.1%u idba 692,27.0% 75, 2.9%u soap 569,22.2% 50, 2.0%u trin 2089,81.6% 1285,50.2%u velv kmer (BUSCOdmel 2561 genes): 458,17.9% 28, 1.1%u k05 957,37.4% 174, 6.8%u k25 1177,46.0% 251, 9.8%u k35 1169,45.6% 200, 7.8%u k45 1085,42.4% 133, 5.2%u k55 1203,47.0% 266,10.4%u k65 1070,41.8% 192, 7.5%u k75 950,37.1% 136, 5.3%u k85 787,30.7% 70, 2.7%u k95 assembler (drosmel 6725 genes): Count Unique Method 2894,43.0% 868,12.9%u idba 1829,27.2% 207, 3.1%u soap 1568,23.3% 205, 3.0%u trin 5336,79.3% 3271,48.6%u velv kmer (drosmel 6725 genes): 1266,18.8% 64, 1.0%u k05 2515,37.4% 532, 7.9%u k25 3164,47.0% 772,11.5%u k35 3018,44.9% 516, 7.7%u k45 2808,41.8% 357, 5.3%u k55 3037,45.2% 644, 9.6%u k65 2717,40.4% 458, 6.8%u k75 2470,36.7% 362, 5.4%u k85 2095,31.2% 185, 2.8%u k95 #.............................. #=========================================================