Index of /EvidentialGene/arthropods/nasoniawasp/genes
Name Last modified Size
Parent Directory 25-Mar-2019 14:24 -
lola/ 08-Jun-2014 13:35 -
nasvit1asm.fa.count 18-Jun-2011 14:01 248k
nasvit1asm.fa.gz 12-Jun-2011 23:15 87.1M
nvit2_evigenes_pub11u.aa.gz 26-Jan-2012 22:34 8.8M
nvit2_evigenes_pub11u.attr.simple.txt 08-Feb-2012 15:18 17.1M
nvit2_evigenes_pub11u.attr.txt 08-Feb-2012 15:18 17.0M
nvit2_evigenes_pub11u.cds.gz 29-Jan-2012 17:59 13.6M
nvit2_evigenes_pub11u.gff.gz 08-Feb-2012 15:03 19.8M
nvit2_evigenes_pub11u.good.aa.gz 29-Jan-2012 18:09 7.0M
nvit2_evigenes_pub11u.good.cds.gz 29-Jan-2012 18:09 10.9M
nvit2_evigenes_pub11u.good.gff.gz 08-Feb-2012 15:03 15.9M
nvit2_evigenes_pub11u.good.ids 08-Feb-2012 15:10 533k
nvit2_evigenes_pub11u.good.tr.gz 29-Jan-2012 18:09 18.3M
nvit2_evigenes_pub11u.readme.txt 30-Jan-2012 16:42 5k
nvit2_evigenes_pub11u.tr.gz 29-Jan-2012 18:00 26.2M
oldv/ 08-Feb-2012 15:06 -
Evigene genes for Nasonia vitripennis , 2012/January
Info at http://arthropods.eugenes.org/EvidentialGene/
Gene class counts
-------------------------
16231 Class:Strong : >= 66% expression/homology evidence
5581 Class:Medium : >= 33% expression/homology evidence
2640 Class:Weak : worth considering but evidence is weak
6605 Class:Transposon : >= 33% transposon and no/weak expression
5257 Class:Poor : mixed bag of partial models
7839 Class:Alternate : alternate transcripts, all from EST/Rna assemblies
550 transcripts are curated from transcript assemblies that do not match genome sequence
(due to genome gaps and frameshifts). These are annotated as "Protein:curated" in feature and sequence files.
Aside from these, gene CDS locations + genome sequence will translate to the proteins.
833 transcripts are annotated "expertchoice", chosen by a person as best model. These include
genes split over scaffolds (39 genes in 104 parts, with "splitgene" annotation), odorant genes,
and other cases.
Gene data files:
nvit2_evigenes_pub11u.attr.txt 29-Jan-2012 17M gene annotation table (tabbed)
nvit2_evigenes_pub11u.attr.simple.txt 29-Jan-2012 17M annotation table (tabbed, simpler+more columns for eg MSExcel)
nvit2_evigenes_pub11u.aa.gz 26-Jan-2012 8M protein fasta
nvit2_evigenes_pub11u.cds.gz 29-Jan-2012 13M coding dna
nvit2_evigenes_pub11u.tr.gz 29-Jan-2012 26M transcript dna
nvit2_evigenes_pub11u.gff.gz 29-Jan-2012 19M gene location/annotation format
nvit2_evigenes_pub11u.good.ids 29-Jan-2012 1M IDs of 24389 loci for Class:Strong|Medium|Weak (Alt included)
nvit2_evigenes_pub11u.good.gff.gz 29-Jan-2012 16M subset of good loci
nvit2_evigenes_pub11u.good.aa.gz 29-Jan-2012 7M
nvit2_evigenes_pub11u.good.cds.gz 29-Jan-2012 10M
nvit2_evigenes_pub11u.good.tr.gz 29-Jan-2012 18M
Guide to Nasvi2 Evigene annotation table columns and GFF mRNA attributes:
transcriptID (ID in gff mRNA)
geneID (gene in gff mRNA, is Parent= to mRNA)
isoform : alternate transcript number if > 1, matches ID suffix (t2,t3...)
quality : evidence quality values for Expression Homology Intron Protein
aaSize : protein aa length
cdsSize : percent of transcript, cds length / transcript length
Name : homology-derived gene name, U:UniProt or T:TAIR arath, with percent align (62%T, 74%U)
Dbxref : cross reference gene IDs to TAIR, UniProt
ortholog : protein orthology percent identity, and protein IDs
paralog : protein paralogy percent identity, and gene ID
genegroup : Hymenoptera-Arthropod ARP9 orthology gene group (Orthomcl),
ARP9_G1057,1/14/10 means 1 gene for this species grouped with 14 genes of 10 species
uniprot : UniRef50 matching protein
nasviOGS1 : Nasonia OGS1 gene equivalence: NV10001-RA/74.89 is 74% CDS equiv., 89% exon equiv.
nasviRefSeq2 : Nasonia NCBI Refseq 2 equivalence : NcbiRef2rna2392/C90.79 is 90% CDS equiv.
intron : evidence intron splices matched (10/10 for 5 matched introns)
express : expressed span as percent of transcript
location : genome location
oid : original model ID
score : evidence score sum
scorevec : evidence score vector
Quality notes:
Values are generally Strong/Medium/Weak/None
Class: gene quality class as sum of evidence parts; Transposon, Poor special classes
Express: Strong/Medium/Weak for percent of transcript with expression
Homology: Ortholog if best match is other species, Paralog for this species
Protein: complete or partial
Intron: Strong/Medium/Weak depending on % and total of splice sites matched
GFF format is 3 level (gene/mRNA/exon,CDS) with alternate transcripts flagged as isoform=N,
and ID=...t1,t2,t3 to indicate alternates. All primary models have ID=t1 suffix, but may not
be "best" form (longest protein).
Sample Annotation fields in genes.attr.txt. Same values are in mRNA lines of gene GFF.
--------------------------------------------
ID=Nasvi2EG000002t1
gene=Nasvi2EG000002
isoform=1
quality=Class:Strong,Express:Strong,Homology:OrthologStrong,Intron:Strong,Protein:complete
aaSize=835
cdsSize=72%,2508/3462
Name=Glutamate receptor 1 (62%A)
oname=AGAP006027-PA (68%U)
Dbxref=NcbiRef2rna2392,tr:XM_001600948.2,pr:XP_001600998.2,GeneID:100116528,UniRef50_Q7PNT0/68%
ortholog=D7EHZ7_TRICA,74%
paralog=Nasvi2EG010019t1,29%
genegroup=ARP9_G1477,1/14/10
equiv1=NV10001-RA/74.89
equiv2=NcbiRef2rna2392/C90.79
intron=94%,32/34
express=100%,eq:98
oid=evg11e:r8nvit1v2Svelbig0Loc16t1
score=23832
scorevec=1287,74,1454,90,98,0,3462,32,94,0,657,0,954,2508
ID=Nasvi2EG000002t2
gene=Nasvi2EG000002
isoform=2
ID=Nasvi2EG000001t1
gene=Nasvi2EG000001
quality=Class:Strong,Express:Strong,Homology:OrthologMedium,Intron:Medium,Protein:complete
aaSize=469
cdsSize=52%,1410/2689
Name=Sugar transporter ERD6 4, putative (50%U)
oname=Unknown
groupname=facilitated trehalose transporter Tret1
Dbxref=NcbiRef2rna2393,tr:XM_003423824.1,pr:XP_003423872.1,GeneID:100680237,UniRef50_E2A6A7/50%
ortholog=E9IJ89_SOLIN,57%
paralog=Nasvi2EG001977t1,22%
genegroup=ARP9_G8181,1/8/6
equiv1=na
equiv2=NcbiRef2rna2393/C100.98
intron=67%,8/12
express=81%,eq:77
oid=nvit1v2Svelbig0Loc9t1
score=12867
scorevec=650,52,205,81,77,0,2353,8,66,0,94,0,582,1826
...
|