Index of /EvidentialGene/other/evigene_old/sra2genes_testdrive
Name Last modified Size
Parent Directory 03-Dec-2021 14:36 -
sra2genes_test.readme.txt 16-May-2019 14:53 4k
sra2genes_start7test.tar.gz 07-May-2019 23:33 81.5M
sra2genes_start7test.list 08-May-2019 16:52 1k
sra2genes_finish7test.tar.gz 08-May-2019 16:49 301M
sra2genes_finish7test.list 08-May-2019 16:52 4k
sra2genes4v_testdrive/ 04-Dec-2021 15:09 -
run_evgsra2genes.sh 13-May-2019 15:02 3k
evigene_apps_linux_x86_64_19may14.tar.gz 13-May-2019 15:03 548M
evigene_apps_linux_x86_64_19may14.list 13-May-2019 15:10 1k
evigene19may14.tar 13-May-2019 15:40 8.0M
SRA2Genes Test Drive, 2019-May
for transcript assembly input, step7, to public gene set, step10
Here is an update in-progress, a replacement for the 'tr2aacds' component
of EvidentialGene. SRA2Genes does more than tr2aacds, which is included
as part of a full gene set reconstruction pipeline. tr2aacds reduces a
large over-assembly of transcripts by using only self-referential
coding-gene metrics. That is very useful but also fairly limited and rough,
in that it uses only the gene evidence from that transcript assembly.
The more complete gene reconstruction pipeline of SRA2Genes brings in external
gene evidence, notably the wealth of conserved gene information.
Genome biologists should consider using SRA2Genes in place of tr2aacds.
Test gene transcript set, Arabidopsis thaliana, Araport gene set of 2016
Araport11_genes.201606.mrna/cdna,aa,cds
from http://eugenes.org/EvidentialGene/plants/arabidopsis/evigene2017_arabidopsis/gene_models/
source https://www.araport.org/data/araport11/
Setup steps:
1. fetch software, data from sra2genes_testdrive/
http://eugenes.org/EvidentialGene/other/sra2genes_testdrive/
sra2genes_start7test.tar.gz contains starting data sets, trsets/ refset/ and genome/
evigene19may14.tar has updates to evgpipe_sra2genes.pl and components
2. Install evigene19may14.tar for updates to sra2genes that cure a few bugs in following usage:
cd your-path-to-scripts; tar -xf evigene19may14.tar
The pre-configured path for this is $HOME/bio/apps/evigene
2b. Install component applications for Linux OS (x86_64) if you use that and want:
fetch and extract with gtar -zxf evigene_apps_linux_x86_64_19may14.tar.gz
These fill in $HOME/bio/apps/ with ncbi/bin, exonerate/bin, cdhit/bin, etc..
3. Unpack starting data set;
cd your-test-drive-path
gtar -zxf sra2genes_start7test.tar.gz
cd sra2genes_start7test
4. Edit run_evgsra2genes.sh to set PATH for $HOME/bio/apps used by sra2genes
Install needed bioapps (ncbi/bin, exonerate/bin, cdhit/bin).
See run_evgsra2genes.sh for bioapp sources;
Now (not Later on) linux_X64 binaries are provided.
5. Drive thru steps 7 to 10 (public, annotated gene set)
evgpipe_sra2genes.pl generates unix shell scripts for each step that uses some cpu/memory.
You can run these from command line, or send to cluster batch system, as needed.
(set ncpu= maxmem= for what your system has, maxmem=megabytes of memory)
Rerun run_evgsra2genes.sh same way, after each "run_sNNN.sh" step to update following steps.
6. Compare your final result to contents of sra2genes_finish7test.tar
Should be same, look at summary of genes in arath16test.genesum.txt
7. Re-use with your data sets, replace contents of sra2genes_start7test
in trsets/input.cdna, refset/refgenes.aa and genome/chrassembly.fa (not required)
Test drive steps:
env name=arath16test species=Arabidopsis_thaliana runsteps=start7 ncpu=2 maxmem=8000 datad=`pwd` ./run_evgsra2genes.sh
./run_s7_tr2aacds.arath16test.sh >& log.tr2aa
env name=arath16test species=Arabidopsis_thaliana runsteps=start7 ncpu=2 maxmem=8000 datad=`pwd` ./run_evgsra2genes.sh
./run_s8_evgblastp.arath16test.sh >& log.blp
env name=arath16test species=Arabidopsis_thaliana runsteps=start7 ncpu=2 maxmem=8000 datad=`pwd` ./run_evgsra2genes.sh
./run_s9a_evgtrimvec.arath16test.sh >& log.vecs
env name=arath16test species=Arabidopsis_thaliana runsteps=start7 ncpu=2 maxmem=8000 datad=`pwd` ./run_evgsra2genes.sh
./run_s9b_gmapgenes.arath16test.sh >& log.gmap
env name=arath16test species=Arabidopsis_thaliana runsteps=start7 ncpu=2 maxmem=8000 datad=`pwd` ./run_evgsra2genes.sh
./run_s10_evgpubset.arath16test.sh >& log.pub
env name=arath16test species=Arabidopsis_thaliana runsteps=start7 ncpu=2 maxmem=8000 datad=`pwd` ./run_evgsra2genes.sh
./run_s11_evgpub2submit.arath16test.sh >& log.sub
** run_s11_evgpub2submit fails (missing config) **
touch Fixme.evgpub2submit
./run_s10b_evgclean.arath16test.sh >& clean.log
cd ../
mv sra2genes_start7test sra2genes_finish7test
gtar -X sra2genes_finish7test.xclude -cvf sra2genes_finish7test.tar sra2genes_finish7test
|