euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

EvidentialGenes for honey bee Apis mellifera

EvidentialGene gene set evg3hbee for Apis mellifera is more complete than two other recent honey bee gene sets, measured by orthology completeness.
See EvidentialGene/arthropods/Arthropod_Orthology_Completeness/, Figures 4a, 5a, and 6a.

      Name                    Last modified       Size  

[DIR] Parent Directory 21-May-2017 19:09 - [DIR] othergenes/ 27-Dec-2014 23:38 - [TXT] 10-Jun-2014 15:25 2k [IMG] honeybee_3genesets.png 10-Jun-2014 15:12 139k [   ] honeybee_3genesets.pdf 10-Jun-2014 15:11 6k [DIR] hbee_rnaseq/ 30-Jul-2014 13:32 - [DIR] evg3hbee/ 30-Jun-2015 22:42 -

mRNA Transcript assemblies for Honey Bee
organism=Apis mellifera
evg3hbee, version 3 (2014 June)
Don Gilbert, gilbertd near indiana edu

EvidentialGene: Gene-omes from mRNA-seq assembly overtake genome gene-predictions.
An existing dogma in genome projects, that quality of a gene set is dependent on the 
quality of the genome assembly, is no longer accurate.  mRNA-seq assembly now does as well 
or better than genome-gene modelling.

This is a 'reference-free' gene set assembly from mRNA-seq, without reference made to
a genome assembly nor training/mapping from other species genes. As such it has different 
values than genome-based gene sets, one important one is no external artifacts or errors 
contribute to these genes.  Any protein orthology measured has not been influenced by 
gene modelling using other species (with their artifacts), and genome assembly errors.
EvidentialGene Honey Bee gene construction set
  includes  protein, cds and mRNA fasta sequence files,
  annotation table with homology Name (from blastp),
  view.gff location table, mapped to apis amel45 genome assembly.

Transcript assemblies and input mRNA seq
  mRNA seq used is all from public data sets found at NCBI SRA  for Apis mel.
  see the hobee_study.list and sra_result.cvs for SRA accessions.,   
  A brief summary of the 6+ Million de-novo transcript assemblies made from these RNA
  are also there, with primary statistic for selection and effort in the 'aastat' tables
  of average protein sizes found in each tr-assembly run.  This I find best way to proceed,
  to learn early on if one has enough mRNA assemblies for a full animal/plant gene set.
  Size and count proteins, not transcripts.  Measure 1000 longest proteins which has a 
  biological max and is strongly correlated with orthology.

Developed at the Genome Informatics Lab of Indiana University Biology Department