euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Aedes and Anopheles mosquito EvidentialGene sets

EvidentialGene is a genome informatics project/pipeline for gene set construction that has a measurably high accuracy and completeness rate, compared with other gene informatics methods used for animals and plants. See or

Gene orthology accuracy and completeness, measured with protein homology to reference species genes, for gene sets of 2 species of malaria vector mosquito Anopheles, and yellow fever/zika virus vector mosquito Aedes aegypti are summarized here for EvidentialGene in comparison with recently published gene sets using now popular gene prediction/assembly methods (MAKER, Trinity and related methods).

For both species, EvidentialGene method used the published RNA-seq, assembled it with 4 gene assemblers, then reduced to a concise and accurate locus/alternate gene set. In these 3 tests, Evigene produced the more accurate gene sets, with minimal time and effort. The RNA data sets used were smaller than recommended for complete gene set reconstruction, and additional effort + data will improve these genes.

The software pipeline pair of MAKER and Trinity form a common recipe now for genome biologists. Those scientists don't realize that greater accuracy is possible and easier to obtain, I suspect.

Aedes aegypti yellow fever vector mosquito

Aedes_aegypti x REFERENCE Highly conserved (BUSCO drosmel,  nr=3055)
       Evigene  PubTrinVb3 Vecbase3  
found  99.5%     98.6%      98.3%   
align  91.3%     86.5%      85.1%   
best   42.3%      5.2%       3.0%     equal 52% 
Aedes_aegypti x REFERENCE Drosophila mel. model (nr=11146)
       Evigene  PubTrinVb3 Vecbase3  
found  99.0%     97.5%      97.1%   
align  86.4%     82.4%      81.1%   
best   44.0%      9.2%       6.1%    equal 47%                 

Aedes_aegypti x REFERENCE  Anopheles gambia/AGAP
       Evigene  PubTrinVb3 Vecbase3  
found  99.1%     97.3%      96.6%   
align  94.3%     89.7%      87.2%   
best   44.3%     10.4%       8.5%     equal 45%

PubTrinVb3 = ; Matthews et al. BMC Genomics (2016) 17:32, The neurotranscriptome of the Aedes_aegypti mosquito
Vecbase3 = Aedes-aegypti-Liverpool_PEPTIDES_AaegL3.3 gene set of
Aedes PubTrinVb3 ref used Trinity denovo rna-assembler, cufflinks genome rna-assembler, and PASA EST-gene construction pipeline. RNA-seq of this paper is source for EVm gene construction. Evigene version evg12aedes, 2016.04.08, improves a subset of ortholog genes with evg2aedes assembly, data of SRP047470,SRP046160 (doi:10.1126/science.aaa2850).

Anopheles species malaria vector mosquito

    Highly conserved REFERENCE (BUSCO drosmel,  nr=3041)
         Anopheles_funestus          Anopheles_albimanus
       Evigene  MAKER  Trinity    Evigene  MAKER  Trinity
found  99.8%    98.9%   98.7%     98.6%    98.7%  97.4%
align  89.0%    85.1%   83.7%     87.2%    84.9%  82.4%
best   33.4%     6.9%    3.1%     39.7%    11.3%   4.6%
 equal      60%                         49%

    Drosophila mel. model REFERENCE (nr=11043)
         Anopheles_funestus          Anopheles_albimanus
       Evigene  MAKER  Trinity    Evigene  MAKER   Trinity
found  98.8%    97.8%   97.2%     96.5%    97.8%   95.6% 
align  83.9%    80.5%   79.3%     81.3%    80.6%   78.5%
best   38.6%    10.8%    4.0%     40.3%    17.4%    5.7%
  equal     50%                         42%

    Anopheles gambia REFERENCE (tr total=14870, locus total=12994)
         Anopheles_funestus          Anopheles_albimanus
       Evigene  MAKER  Trinity    Evigene  MAKER  Trinity
found  98.9%    98.6%  97.7%      96.4%    98.2%  96.0%
align  96.9%    93.1%  90.2%      91.0%    91.2%  85.0%
best   39.9%    12.4%   3.7%      41.5%    19.5%   6.4% 
 equal      48%                        39% 

Anopheles ref = MAKER gene source, used Trinity but not public, I redid Trinity assembly. ; Highly evolvable malaria vectors:the genomes of 16 Anopheles mosquitoes


found = % reference proteins with significant alignment to test gene sets
align = % alignment of target proteins sets to reference proteins
best = % pairwise count of best alignment of two target gene sets to reference

Evigene ref: Gilbert, Donald (2013) Gene-omes built from mRNA seq not genome DNA. 7th annual arthropod genomics symposium. Notre Dame. and

Developed at the Genome Informatics Lab of Indiana University Biology Department