euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Aedes and Anopheles mosquito EvidentialGene sets

EvidentialGene is a genome informatics project/pipeline for gene set construction that has a measurably high accuracy and completeness rate, compared with other gene informatics methods used for animals and plants. See http://eugenes.org/EvidentialGene/ or https://sourceforge.net/projects/evidentialgene/

Gene orthology accuracy and completeness, measured with protein homology to reference species genes, for gene sets of 2 species of malaria vector mosquito Anopheles, and yellow fever/zika virus vector mosquito Aedes aegypti are summarized here for EvidentialGene in comparison with recently published gene sets using now popular gene prediction/assembly methods (MAKER, Trinity and related methods).

For both species, EvidentialGene method used the published RNA-seq, assembled it with 4 gene assemblers, then reduced to a concise and accurate locus/alternate gene set. In these 3 tests, Evigene produced the more accurate gene sets, with minimal time and effort. The RNA data sets used were smaller than recommended for complete gene set reconstruction, and additional effort + data will improve these genes.

The software pipeline pair of MAKER and Trinity form a common recipe now for genome biologists. Those scientists don't realize that greater accuracy is possible and easier to obtain, I suspect.

Aedes aegypti yellow fever vector mosquito

Aedes_aegypti x REFERENCE Highly conserved (BUSCO drosmel,  nr=3055)
       Evigene  PubTrinVb3 Vecbase3  
found  99.5%     98.6%      98.3%   
align  91.3%     86.5%      85.1%   
best   42.3%      5.2%       3.0%     equal 52% 
                  
Aedes_aegypti x REFERENCE Drosophila mel. model (nr=11146)
       Evigene  PubTrinVb3 Vecbase3  
found  99.0%     97.5%      97.1%   
align  86.4%     82.4%      81.1%   
best   44.0%      9.2%       6.1%    equal 47%                 

Aedes_aegypti x REFERENCE  Anopheles gambia/AGAP
       Evigene  PubTrinVb3 Vecbase3  
found  99.1%     97.3%      96.6%   
align  94.3%     89.7%      87.2%   
best   44.3%     10.4%       8.5%     equal 45%

PubTrinVb3 = https://doi.org/10.1186/s12864-015-2239-0 ; Matthews et al. BMC Genomics (2016) 17:32, The neurotranscriptome of the Aedes_aegypti mosquito
Vecbase3 = Aedes-aegypti-Liverpool_PEPTIDES_AaegL3.3 gene set of vectorbase.org
Aedes PubTrinVb3 ref used Trinity denovo rna-assembler, cufflinks genome rna-assembler, and PASA EST-gene construction pipeline. RNA-seq of this paper is source for EVm gene construction. Evigene version evg12aedes, 2016.04.08, improves a subset of ortholog genes with evg2aedes assembly, data of SRP047470,SRP046160 (doi:10.1126/science.aaa2850).

Anopheles species malaria vector mosquito

    Highly conserved REFERENCE (BUSCO drosmel,  nr=3041)
         Anopheles_funestus          Anopheles_albimanus
       Evigene  MAKER  Trinity    Evigene  MAKER  Trinity
found  99.8%    98.9%   98.7%     98.6%    98.7%  97.4%
align  89.0%    85.1%   83.7%     87.2%    84.9%  82.4%
best   33.4%     6.9%    3.1%     39.7%    11.3%   4.6%
 equal      60%                         49%

    Drosophila mel. model REFERENCE (nr=11043)
         Anopheles_funestus          Anopheles_albimanus
       Evigene  MAKER  Trinity    Evigene  MAKER   Trinity
found  98.8%    97.8%   97.2%     96.5%    97.8%   95.6% 
align  83.9%    80.5%   79.3%     81.3%    80.6%   78.5%
best   38.6%    10.8%    4.0%     40.3%    17.4%    5.7%
  equal     50%                         42%

    Anopheles gambia REFERENCE (tr total=14870, locus total=12994)
         Anopheles_funestus          Anopheles_albimanus
       Evigene  MAKER  Trinity    Evigene  MAKER  Trinity
found  98.9%    98.6%  97.7%      96.4%    98.2%  96.0%
align  96.9%    93.1%  90.2%      91.0%    91.2%  85.0%
best   39.9%    12.4%   3.7%      41.5%    19.5%   6.4% 
 equal      48%                        39% 

Anopheles ref = MAKER gene source, used Trinity but not public, I redid Trinity assembly.
https://doi.org/10.1126/science.1258522 ; Highly evolvable malaria vectors:the genomes of 16 Anopheles mosquitoes

Statistics:

found = % reference proteins with significant alignment to test gene sets
align = % alignment of target proteins sets to reference proteins
best = % pairwise count of best alignment of two target gene sets to reference

Evigene ref: Gilbert, Donald (2013) Gene-omes built from mRNA seq not genome DNA. 7th annual arthropod genomics symposium. Notre Dame.
http://eugenes.org/EvidentialGene/about/EvigeneRNA2013poster.pdf and http://globalhealth.nd.edu/7th-annual-arthropod-genomics-symposium/


Developed at the Genome Informatics Lab of Indiana University Biology Department