euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/arthropods/nasoniawasp/genes

      Name                                  Last modified       Size  

[DIR] Parent Directory 25-Mar-2019 14:24 - [DIR] lola/ 08-Jun-2014 13:35 - [TXT] nasvit1asm.fa.count 18-Jun-2011 14:01 248k [   ] nasvit1asm.fa.gz 12-Jun-2011 23:15 87.1M [   ] nvit2_evigenes_pub11u.aa.gz 26-Jan-2012 22:34 8.8M [TXT] nvit2_evigenes_pub11u.attr.simple.txt 08-Feb-2012 15:18 17.1M [TXT] nvit2_evigenes_pub11u.attr.txt 08-Feb-2012 15:18 17.0M [   ] nvit2_evigenes_pub11u.cds.gz 29-Jan-2012 17:59 13.6M [   ] nvit2_evigenes_pub11u.gff.gz 08-Feb-2012 15:03 19.8M [   ] nvit2_evigenes_pub11u.good.aa.gz 29-Jan-2012 18:09 7.0M [   ] nvit2_evigenes_pub11u.good.cds.gz 29-Jan-2012 18:09 10.9M [   ] nvit2_evigenes_pub11u.good.gff.gz 08-Feb-2012 15:03 15.9M [TXT] nvit2_evigenes_pub11u.good.ids 08-Feb-2012 15:10 533k [   ] nvit2_evigenes_pub11u.good.tr.gz 29-Jan-2012 18:09 18.3M [TXT] nvit2_evigenes_pub11u.readme.txt 30-Jan-2012 16:42 5k [   ] nvit2_evigenes_pub11u.tr.gz 29-Jan-2012 18:00 26.2M [DIR] oldv/ 08-Feb-2012 15:06 -


Evigene genes for Nasonia vitripennis , 2012/January
Info at http://arthropods.eugenes.org/EvidentialGene/

Gene class counts
-------------------------
16231 Class:Strong      : >= 66% expression/homology evidence
 5581 Class:Medium      : >= 33% expression/homology evidence
 2640 Class:Weak        : worth considering but evidence is weak
 6605 Class:Transposon  : >= 33% transposon and no/weak expression
 5257 Class:Poor        : mixed bag of partial models
 7839 Class:Alternate   : alternate transcripts, all from EST/Rna assemblies

550 transcripts are curated from transcript assemblies that do not match genome sequence
  (due to genome gaps and frameshifts). These are annotated as "Protein:curated" in feature and sequence files.
  Aside from these, gene CDS locations + genome sequence will translate to the proteins. 
833 transcripts are annotated "expertchoice", chosen by a person as best model. These include 
  genes split over scaffolds (39 genes in 104 parts, with "splitgene" annotation), odorant genes,
  and other cases.
  
Gene data files:
 nvit2_evigenes_pub11u.attr.txt        29-Jan-2012  17M gene annotation table (tabbed)
 nvit2_evigenes_pub11u.attr.simple.txt 29-Jan-2012  17M annotation table (tabbed, simpler+more columns for eg MSExcel)
 nvit2_evigenes_pub11u.aa.gz           26-Jan-2012   8M protein fasta
 nvit2_evigenes_pub11u.cds.gz          29-Jan-2012  13M coding dna
 nvit2_evigenes_pub11u.tr.gz           29-Jan-2012  26M transcript dna 
 nvit2_evigenes_pub11u.gff.gz          29-Jan-2012  19M gene location/annotation format 
 
 nvit2_evigenes_pub11u.good.ids        29-Jan-2012   1M  IDs of 24389 loci for Class:Strong|Medium|Weak (Alt included)
 nvit2_evigenes_pub11u.good.gff.gz     29-Jan-2012  16M  subset of good loci  
 nvit2_evigenes_pub11u.good.aa.gz      29-Jan-2012   7M  
 nvit2_evigenes_pub11u.good.cds.gz     29-Jan-2012  10M  
 nvit2_evigenes_pub11u.good.tr.gz      29-Jan-2012  18M  


Guide to Nasvi2 Evigene annotation table columns and GFF mRNA  attributes:
  transcriptID    (ID in gff mRNA)
  geneID          (gene in gff mRNA, is Parent= to mRNA)
  isoform   : alternate transcript number if > 1, matches ID suffix (t2,t3...)
  quality   : evidence quality values for Expression Homology Intron Protein         
  aaSize    : protein aa length
  cdsSize   : percent of transcript, cds length / transcript length 
  Name      : homology-derived gene name, U:UniProt or T:TAIR arath, with percent align (62%T, 74%U)
  Dbxref    : cross reference gene IDs to TAIR, UniProt
  ortholog  : protein orthology percent identity, and protein IDs
  paralog   : protein paralogy percent identity, and gene ID
  genegroup : Hymenoptera-Arthropod ARP9 orthology gene group (Orthomcl), 
              ARP9_G1057,1/14/10 means 1 gene for this species grouped with 14 genes of 10 species
  uniprot   : UniRef50 matching protein
  nasviOGS1 : Nasonia OGS1 gene equivalence: NV10001-RA/74.89 is 74% CDS equiv., 89% exon equiv.
  nasviRefSeq2 : Nasonia NCBI Refseq 2 equivalence :  NcbiRef2rna2392/C90.79 is 90% CDS equiv.
  intron    : evidence intron splices matched (10/10 for 5 matched introns)
  express   : expressed span as percent of transcript
  location  : genome location
  oid       : original model ID
  score     : evidence score sum
  scorevec  : evidence score vector

Quality notes:
  Values are generally Strong/Medium/Weak/None
  Class:   gene quality class as sum of evidence parts; Transposon, Poor special classes
  Express:  Strong/Medium/Weak for percent of transcript with expression
  Homology:  Ortholog if best match is other species, Paralog for this species
  Protein:  complete or partial
  Intron: Strong/Medium/Weak depending on % and total of splice sites matched

  GFF format is 3 level (gene/mRNA/exon,CDS) with alternate transcripts flagged as isoform=N,
  and ID=...t1,t2,t3 to indicate alternates.  All primary models have ID=t1 suffix, but may not
  be "best" form (longest protein).

Sample Annotation fields in genes.attr.txt. Same values are in mRNA lines of gene GFF.
--------------------------------------------

ID=Nasvi2EG000002t1
	gene=Nasvi2EG000002
	isoform=1
	quality=Class:Strong,Express:Strong,Homology:OrthologStrong,Intron:Strong,Protein:complete
	aaSize=835
	cdsSize=72%,2508/3462
	Name=Glutamate receptor 1 (62%A)
	oname=AGAP006027-PA (68%U)
	Dbxref=NcbiRef2rna2392,tr:XM_001600948.2,pr:XP_001600998.2,GeneID:100116528,UniRef50_Q7PNT0/68%
	ortholog=D7EHZ7_TRICA,74%
	paralog=Nasvi2EG010019t1,29%
	genegroup=ARP9_G1477,1/14/10
	equiv1=NV10001-RA/74.89
	equiv2=NcbiRef2rna2392/C90.79
	intron=94%,32/34
	express=100%,eq:98
	oid=evg11e:r8nvit1v2Svelbig0Loc16t1
	score=23832
	scorevec=1287,74,1454,90,98,0,3462,32,94,0,657,0,954,2508

ID=Nasvi2EG000002t2
	gene=Nasvi2EG000002
	isoform=2

ID=Nasvi2EG000001t1
  gene=Nasvi2EG000001
  quality=Class:Strong,Express:Strong,Homology:OrthologMedium,Intron:Medium,Protein:complete
  aaSize=469
  cdsSize=52%,1410/2689
  Name=Sugar transporter ERD6 4, putative (50%U)
  oname=Unknown
  groupname=facilitated trehalose transporter Tret1
  Dbxref=NcbiRef2rna2393,tr:XM_003423824.1,pr:XP_003423872.1,GeneID:100680237,UniRef50_E2A6A7/50%
  ortholog=E9IJ89_SOLIN,57%
  paralog=Nasvi2EG001977t1,22%
  genegroup=ARP9_G8181,1/8/6
  equiv1=na
  equiv2=NcbiRef2rna2393/C100.98
  intron=67%,8/12
  express=81%,eq:77
  oid=nvit1v2Svelbig0Loc9t1
  score=12867
  scorevec=650,52,205,81,77,0,2353,8,66,0,94,0,582,1826

	...

Developed at the Genome Informatics Lab of Indiana University Biology Department