Index of /EvidentialGene/plants/arabidopsis/evigene_tr2aacds_test
Name Last modified Size
Parent Directory 30-Oct-2021 16:34 -
arath_TAIR10_20101214up.cdna.gz 15-Apr-2013 14:03 24.2M
arath_TAIR10_20101214up.genesum.txt 30-Oct-2021 15:17 1k
arath_TAIR10_20101214up.trclass.sum.txt 30-Oct-2021 15:17 1k
evigene_tr2aacds_test2021.tar.gz 30-Oct-2021 15:46 323M
evigene_tr2aacds_test2021.tar.list.txt 30-Oct-2021 20:53 6k
run_tr2aacds.sh 30-Oct-2021 15:19 2k
Test case for EvidentialGene_trassembly_pipe (evigene/scripts/prot/tr2aacds.pl)
http://arthropods.eugenes.org/EvidentialGene/about/EvidentialGene_trassembly_pipe.html
TEST RUN small data set, outside of computer cluster batch to see if it works:
env trset=arath_TAIR10_20101214up.cdna datad=`pwd` ./run_tr2aacds.sh > & log.tr2ac1
NOTE: You need to edit the app paths in run_tr2aacds.sh for your system.
This requires bio-apps in Unix PATH : blastn fastanrdb cd-hit-est evigene
This runs in about 5 min on 2 cores of ordinary desktop or laptop computer.
TEST DATA: arabidopsis TAIR10 transcripts (headers regularized, though probably not needed)
curl -sR -o arath_TAIR10_20101214up.cdna \
ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/TAIR10_cdna_20101214_updated
OUTPUTS:
evigene_tr2aacds_test2021.tar.gz contains full output of this tr2aacds.pl run.
okayset/ contains validated transcript sequences, as mRNA, CDS and protein (name.okay.mrna,cds,aa),
with table of public IDs, original ID, coding statistics in name.pubids, and annotations in name.ann.txt
Note the name.okay sequences contain all alternate transcripts per gene locus, and by preference should
all be further analyzed, but the main/longest protein per locus can be extracted using evigene IDs or the
name.pubids table.
A short summary of gene/transcript counts is in name.genesum.txt. Rejected sequences in name.cull.mrna,cds,aa
are excess transcripts. A summary of tr2aacds classifications in name.trclass.sum.txt
Please see also EvidentialGene SRA2Genes pipeline, a full gene reconstruction pipeline
adding external evidence (ie protein homologies, conserved genes, genome assembly, etc),
non-coding gene reconstruction, and publication quality submission data sets:
http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/
Publications describing EvidentialGene include:
Gilbert, DG. (2019). Longest protein, longest transcript or most expression, for
accurate gene reconstruction of transcriptomes? bioRxiv 829184; https://doi.org/10.1101/829184
Gilbert, DG. (2018). Genes of the pig, Sus scrofa, reconstructed with EvidentialGene, doi: 10.7717/peerj.6374
#-----------------------------------------
|