Index of /genes2/pea_aphid2/docs
Name Last modified Description
Parent Directory 29-Jan-2012 19:30
aphid-bestpicker.html 19-Apr-2011 23:59
aphid-bestpicker.png 18-Apr-2011 19:53
aphid-genes-brief.txt 24-Jan-2011 12:53
aphid-genes-details.txt 20-Apr-2011 16:27
aphid-genes-summary.txt 05-Oct-2010 13:02
aphid2-genes2_201104.sum1.txt 13-Apr-2011 14:25
aphid2-genes2_201104v7.news 19-Apr-2011 23:19
aphid2-reconcile-best.notes 09-Jun-2011 18:18
aphid2_bestgene-compute.txt 06-Jul-2011 14:56
aphid2asm.repeatmasker_tbl.txt 04-Oct-2010 11:34
aphid2evigene.news 09-Jun-2011 11:26
draft Pea aphid gene predictions for assembly v2.0
by Don Gilbert, gilbertd at indiana edu
http://arthropods.eugenes.org/genes2/ see pea_aphid2
Annotation summary
Transcripts from Acyrthosiphon pisum (210k EST and 56,522k
Illumina single reads), are mapped to the A. pisum genome with
GMAP and GSNAP. RNA-seq assemblies are constructed from aligned
short reads with Cufflinks, and then assemblyd with EST reads
using PASA.
Arthropod proteins from Daphnia pulex, Drosophila melanogaster,
Apis, Pediculus and Tribolium are BLASTX aligned to repeat-masked
genome, then refined with Exonerate to protein gene models.
Gene models are predicted with evidence-directed AUGUSTUS
predictor. Augustus is trained for pea aphid gene parameters
using full cDNA genes derirved from the PASA EST assembly.
Several prediction sets from different evidence sets and
parameters are combined, selecting highest evidence models
per locus. The consensus gene set has the best match to
EST and protein evidence.
Gene Models
===============
32967 mRNA genes with evidence (Protein homology or EST)
21326 have Protein homology (e-value <= 1e-5)
8810 have only protein evidence
24064 have EST or Rnaseq evidence
11548 have only EST/Rna evidence
-------------------------------------------------------------
Acyrthosiphon pisum asm 2.0 gene quality scores
Evid. Nevd a1Gno epi4 epir3 mix3
==== Exon Sensitivity & Specificity ====
EST 34 Mb 0.516 0.663 0.717 0.813
Protein 25 Mb 0.536 0.757 0.785 0.786
RNAseq 17 Mb 0.575 0.839 0.792 0.875
T'poson 40 Mb 0.024 0.078 0.068 0.068
Specif 56 Mb 0.549 0.591 0.603 0.621
==== Gene model Accuracy ====
EST-genes n=10371
Perfect 1911 1968 1939 2143
Sensitv. 0.588 0.717 0.693 0.775
Specifc. 0.602 0.414 0.454 0.402
Protein-genes n=12860
Perfect 3487 4028 4071 4548
Sensitv. 0.414 0.500 0.510 0.512
Specifc. 0.665 0.484 0.479 0.578
==== Genome Coverage ====
Statistic a1Gno epi4 epir3 mix3
Coding bases 29.2 43.4 47.3 40
Exon bases 34.1 70.4 68.3 70.7
Gene count 37782 35961 39905 32967
------------------------------------------------------
Gene sets:
a1Gno = aphid assembly 1, Gnomon models, mapped to asm 2
epir1..3 = evidence directed predicts, aphid training
epi4..5 = EST assembly + protein evidence prediction
mix3 = mix picking best evidence of several gene sets
|