euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/plants/arabidopsis/evigene_tr2aacds_test

      Name                                    Last modified       Size  

[DIR] Parent Directory 30-Oct-2021 16:34 - [   ] arath_TAIR10_20101214up.cdna.gz 15-Apr-2013 14:03 24.2M [TXT] arath_TAIR10_20101214up.genesum.txt 30-Oct-2021 15:17 1k [TXT] arath_TAIR10_20101214up.trclass.sum.txt 30-Oct-2021 15:17 1k [   ] evigene_tr2aacds_test2021.tar.gz 30-Oct-2021 15:46 323M [TXT] evigene_tr2aacds_test2021.tar.list.txt 30-Oct-2021 20:53 6k [   ] run_tr2aacds.sh 30-Oct-2021 15:19 2k


Test case for EvidentialGene_trassembly_pipe (evigene/scripts/prot/tr2aacds.pl)
  http://arthropods.eugenes.org/EvidentialGene/about/EvidentialGene_trassembly_pipe.html

TEST RUN small data set, outside of computer cluster batch to see if it works:
  env trset=arath_TAIR10_20101214up.cdna datad=`pwd` ./run_tr2aacds.sh > & log.tr2ac1

NOTE: You need to edit the app paths in run_tr2aacds.sh for your system.
This requires bio-apps in Unix PATH : blastn fastanrdb cd-hit-est evigene
This runs in about 5  min on 2 cores of ordinary desktop or laptop computer.

TEST DATA: arabidopsis TAIR10 transcripts (headers regularized, though probably not needed)
  curl -sR -o arath_TAIR10_20101214up.cdna \
    ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/TAIR10_cdna_20101214_updated

OUTPUTS:
  evigene_tr2aacds_test2021.tar.gz contains full output of this tr2aacds.pl run.
  okayset/ contains validated transcript sequences, as mRNA, CDS and protein (name.okay.mrna,cds,aa),
    with table of public IDs, original ID, coding statistics in name.pubids, and annotations in name.ann.txt
  Note the name.okay sequences contain all alternate transcripts per gene locus, and by preference should 
  all be further analyzed, but the main/longest protein per locus can be extracted using evigene IDs or the
  name.pubids table.
  A short summary of gene/transcript counts is in name.genesum.txt. Rejected sequences in name.cull.mrna,cds,aa 
  are excess transcripts.  A summary of tr2aacds classifications in name.trclass.sum.txt

Please see also  EvidentialGene SRA2Genes pipeline, a full gene reconstruction pipeline
adding external evidence (ie protein homologies, conserved genes, genome assembly, etc),
non-coding gene reconstruction, and publication quality submission data sets:
  http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/

Publications describing EvidentialGene include:
  Gilbert, DG. (2019). Longest protein, longest transcript or most expression, for
   accurate gene reconstruction of transcriptomes?  bioRxiv 829184; https://doi.org/10.1101/829184
  Gilbert, DG. (2018). Genes of the pig, Sus scrofa, reconstructed with EvidentialGene, doi: 10.7717/peerj.6374
#-----------------------------------------



Developed at the Genome Informatics Lab of Indiana University Biology Department