Index of /EvidentialGene/plants/pine/pine_evigene201308/publicset
Name Last modified Size
Parent Directory 21-Feb-2021 23:12 -
evgTv1.aa_pub.fa.gz 27-Aug-2013 13:12 175M
evgTv1.ann.txt.gz 27-Aug-2013 13:09 29.3M
evgTv1.cds_pub.fa.gz 27-Aug-2013 13:13 298M
evgTv1.mainalt.tab.gz 27-Aug-2013 13:07 4.8M
evgTv1.mrna_pub.fa.gz 27-Aug-2013 13:10 352M
evgTv1.pub.aa.qual 25-Mar-2014 13:51 37.3M
evgTv1.pub_mainr.aa.qual 26-Mar-2014 13:17 6.2M
evgTv1.realt.log 27-Aug-2013 13:15 1k
evgTv1.realt_pubids.gz 27-Aug-2013 13:15 20.2M
pine_evgTv1pub130827_iu14rna_ncbisra.txt 20-Feb-2021 21:23 5k
pine_evgTv1pub130827_main.aa.gz 26-Mar-2014 13:30 26.8M
pine_evgTv1pub130827_readme.txt 21-Feb-2021 23:10 2k
pinetv1reclass2.info 27-Aug-2013 15:11 11k
This Evigene reconstruction of Loblolly pine tree genes, of 2013 august, is
from of multi-kmer, 4 assemblers, Illumina RNA-seq, tagged "Pta.Tra13Evg" for Pinus taeda
associated with genome project NCBI:PRJNA174450 of Pinus taeda (loblolly).
Source RNA-seq is at NCBI SRA, with SRA entries in pine_evgTv1pub130827_iu14rna_ncbisra.txt.
The gene sequences in evgTv1.mrna_pub.fa, with protein evgTv1.aa_pub.fa and CDS evgTv1.cds_pub.fa
contain a super-set of final gene transcripts: 458487 final kept set, and 413485 dropped as redundant (alternates).
The table of gene/transcript IDs and class, is in evgTv1.realt_pubids, with "drop" indicating discards.
Annotation table evgTv1.ann.txt includes protein name, size and homology database IDs.
The subset of primary (longest), kept proteins per gene loduc is in pine_evgTv1pub130827_main.aa
Further statistics are in pinetv1reclass2.info including this summary of classes kept and dropped:
publicset3/evgTv1.realt_pubids
Keep Drop
105310 althi 33299 dropalthi
171593 althi1 165542 dropalthi1
53546 altmid 34990 dropaltmid
2849 altmida2 1299 dropaltmida2
15835 altmidfrag 60080 dropaltmidfrag
392 altmidfraga2 4552 dropaltmidfraga2
77124 main 0
5874 maina2 0
62242 noclass 76801 dropnoclass
581 noclassa2 62 dropnoclassa2
---------------------------------------
495346 total 376625 droptotal
177982 aaref 44855 aaref
49%, 71228/146480 of loci have TAIR/CDD aaref
where locus primary transcripts are "main", "maina2", noclass and noclassa2 (noclass = no alternates,
main = has alternates, a2 = protein duplicate, ie paralog). "aaref" have measured protein homology.
|