EvidentialGene set for Aedes_aegypti
Version evg12aedes includes all of evg1aedes gene assembly (data of SRP037535)
plus a subset of improved ortholog genes from evg2aedes assembly (data of SRP047470,SRP046160)

Orthology assessment is summarized in aaeval_genesets_summary.txt,
with tables per gene locus in aaeval/ comparing this and two other
recent Aedes ae. gene sets.

Aedes_aegypti x REFERENCE Highly conserved BUSCO/drosmel1g_busco (n=3055)
       Evigene  PubTrinVb3 Vecbase3  
found  99.5%     98.6%      98.3%   
align  91.3%     86.5%      85.1%   
best   42.3%      5.2%       3.0%     equal 52% 
Aedes_aegypti x REFERENCE Drosophila mel. model/drosmel1g (n=11146)
       Evigene  PubTrinVb3 Vecbase3  
found  99.0%     97.5%      97.1%   
align  86.4%     82.4%      81.1%   
best   44.0%      9.2%       6.1%    equal 47%                 

Aedes_aegypti x REFERENCE  Anopheles gambia/AGAP (n=14014)
       Evigene  PubTrinVb3 Vecbase3  
found  99.1%     97.3%      96.6%   
align  94.3%     89.7%      87.2%   
best   44.3%     10.4%       8.5%     equal 45%                 
Evigene gene set has >= 90% best genes versus <= 55% best for others

Aedes PubTrinVb3 ref = RNA-seq data source for EVm gene construction
  doi:10.1186/s12864-015-2239-0; Matthews et al. BMC Genomics (2016) 17:32, The neurotranscriptome of Aedes_aegypti 
Aedes_aegypti RNA-seq SRA accessions used for Evigene: 
 evg1aedes = SRP037535 (male+fem, 10 of 68 SRX read sets) from PubTrinVb3
 evg2aedes = SRP047470 (male+fem, 4 of 6 SRX) and SRP046160 (embryo) from
 doi:10.1126/science.aaa2850, Hall A.B. et al. 2015. A male-determining factor in the mosquito Aedes aegypti. 
evg2aedes gene set is less complete than evg1, using less effort/data,
but contains ~3000 better gene loci (some replace evg1, some are unfound in evg1)

The evg12aedes gene set has a large number of loci, putatively due to
expressed transposon genes. This evg12aedes is segregated to portions: 
a. good:   loci with insect gene homology, or/and multiple introns when mapped on chromosome assembly.
b. noho1x: loci without obvious insect species gene homology, and lacking introns.
The noho1x set likely includes many expressed transposons, gene fragments, 
and other uninteresting things. It likely includes some interesting,
species-specific, recent genes also.

Transposon screening has been done, using RepBase and repeatmasker methods, 
but the results are ambiguous, as some conserved insect single copy orthologs 
(eg. drosophila) are also marked as primarily transposons.  
The transposon data set for Aedes_aegypti is large, and possibly old, 
computational preditions that may contain false positives.

I checked 3 RNA-seq projects of Aedes_aegypti, from two populations
(London + other), all have a high count of loci with putative expressed 
transposons matching RepBase.
I leave it to consumers, and those with expertise, to better decide which 
genes of this gene set reconstruction are of interest.

- Don Gilbert, 8 April 2016

