| ||||||||
EvidentialGenes for honey bee Apis melliferaEvidentialGene gene set evg3hbee for Apis mellifera is more complete than two other recent honey bee gene sets, measured by orthology completeness.See EvidentialGene/arthropods/Arthropod_Orthology_Completeness/, Figures 4a, 5a, and 6a.
Name Last modified Size mRNA Transcript assemblies for Honey Bee organism=Apis mellifera http://arthropods.eugenes.org/EvidentialGene/arthropods/honeybee/ evg3hbee, version 3 (2014 June) Don Gilbert, gilbertd near indiana edu EvidentialGene: Gene-omes from mRNA-seq assembly overtake genome gene-predictions. An existing dogma in genome projects, that quality of a gene set is dependent on the quality of the genome assembly, is no longer accurate. mRNA-seq assembly now does as well or better than genome-gene modelling. This is a 'reference-free' gene set assembly from mRNA-seq, without reference made to a genome assembly nor training/mapping from other species genes. As such it has different values than genome-based gene sets, one important one is no external artifacts or errors contribute to these genes. Any protein orthology measured has not been influenced by gene modelling using other species (with their artifacts), and genome assembly errors. EvidentialGene Honey Bee gene construction set honeybee/evg3hbee/publicset/ includes protein, cds and mRNA fasta sequence files, annotation table with homology Name (from blastp), view.gff location table, mapped to apis amel45 genome assembly. Transcript assemblies and input mRNA seq honeybee/hbee_rnaseq/ mRNA seq used is all from public data sets found at NCBI SRA for Apis mel. see the hobee_study.list and sra_result.cvs for SRA accessions. evg3hbee_traastat.info, evg3hbee_trset.info A brief summary of the 6+ Million de-novo transcript assemblies made from these RNA are also there, with primary statistic for selection and effort in the 'aastat' tables of average protein sizes found in each tr-assembly run. This I find best way to proceed, to learn early on if one has enough mRNA assemblies for a full animal/plant gene set. Size and count proteins, not transcripts. Measure 1000 longest proteins which has a biological max and is strongly correlated with orthology. |