euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/plants/pine/pine_evigene201308/publicset

      Name                                     Last modified       Size  

[DIR] Parent Directory 21-Feb-2021 23:12 - [   ] evgTv1.aa_pub.fa.gz 27-Aug-2013 13:12 175M [   ] evgTv1.ann.txt.gz 27-Aug-2013 13:09 29.3M [   ] evgTv1.cds_pub.fa.gz 27-Aug-2013 13:13 298M [   ] evgTv1.mainalt.tab.gz 27-Aug-2013 13:07 4.8M [   ] evgTv1.mrna_pub.fa.gz 27-Aug-2013 13:10 352M [   ] evgTv1.pub.aa.qual 25-Mar-2014 13:51 37.3M [TXT] evgTv1.pub_mainr.aa.qual 26-Mar-2014 13:17 6.2M [TXT] evgTv1.realt.log 27-Aug-2013 13:15 1k [   ] evgTv1.realt_pubids.gz 27-Aug-2013 13:15 20.2M [TXT] pine_evgTv1pub130827_iu14rna_ncbisra.txt 20-Feb-2021 21:23 5k [   ] pine_evgTv1pub130827_main.aa.gz 26-Mar-2014 13:30 26.8M [TXT] pine_evgTv1pub130827_readme.txt 21-Feb-2021 23:10 2k [TXT] pinetv1reclass2.info 27-Aug-2013 15:11 11k


This Evigene reconstruction of Loblolly pine tree genes, of 2013 august, is
from of multi-kmer, 4 assemblers, Illumina RNA-seq, tagged "Pta.Tra13Evg" for Pinus taeda
associated with genome project  NCBI:PRJNA174450 of Pinus taeda (loblolly).

Source RNA-seq is at NCBI SRA, with SRA entries in pine_evgTv1pub130827_iu14rna_ncbisra.txt.

The gene sequences in evgTv1.mrna_pub.fa, with protein evgTv1.aa_pub.fa and CDS evgTv1.cds_pub.fa
contain a super-set of final gene transcripts: 458487 final kept set, and 413485 dropped as redundant (alternates).
The table of gene/transcript IDs and class, is in evgTv1.realt_pubids, with "drop" indicating discards.
Annotation table evgTv1.ann.txt includes protein name, size and homology database IDs. 
The subset of primary (longest), kept proteins per gene loduc is in pine_evgTv1pub130827_main.aa

Further statistics are in pinetv1reclass2.info including this summary of classes kept and dropped:
publicset3/evgTv1.realt_pubids
         Keep                Drop
  105310 althi         33299 dropalthi
  171593 althi1       165542 dropalthi1
   53546 altmid        34990 dropaltmid
    2849 altmida2       1299 dropaltmida2
   15835 altmidfrag    60080 dropaltmidfrag
     392 altmidfraga2   4552 dropaltmidfraga2
   77124 main              0
    5874 maina2            0
   62242 noclass       76801 dropnoclass
     581 noclassa2        62 dropnoclassa2
  ---------------------------------------
  495346 total        376625 droptotal 
  177982 aaref         44855 aaref
    49%, 71228/146480   of loci have TAIR/CDD aaref

where locus primary transcripts are "main", "maina2", noclass and noclassa2 (noclass = no alternates,
main = has alternates, a2 = protein duplicate, ie paralog). "aaref" have measured protein homology.


Developed at the Genome Informatics Lab of Indiana University Biology Department