euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/daphnia/daphnia_similoides

      Name                    Last modified       Size  

[DIR] Parent Directory 12-Jun-2023 21:42 - [DIR] evg1dapsim/ 30-May-2017 14:31 -


evg1dapsim
----------
Gene assembly for Daphnia_similoides water flea from RNA-Seq with EvidentialGene methods 
This is "reference-free", no chromosomes used nor are other species genes used to 
assemble daphnia genes.

RNA source is from NCBI SRA, listed in rnasra_daphsim16huau.csv

Four gene assembly runs, each multi-kmer, with velvet/oases and idba_tran, are done, 
summarized in rna_assemblies.aastat.txt

Those inputs of 3 million transcripts to tr2aacds.pl of evigene are reduced to
157459 non-redundant transcripts, comprising 46,000 putative coding gene loci.
tr2aacds result is the class table evg1dapsim.trclass.gz, and the intermediate 
classified gene set of okayset/, summarized in evg1dapsim.tr2aacds.info.

These are inputs then for two further steps, reference protein blast and public 
annotated sequence set, using evigene scripts run_evgaablast.sh and run_evgmrna2tsa.sh

evgmrna2tsa uses the reference protein scores and other coding metrics to reclassify
and remove some redundant or fragment transcripts, resulting in 
31,000 putative loci with alternates (main class), plus 95,000 alternates,
and 7,000 loci without alternates (noclass).  Removed were 23,000 
redundant/fragment/nohomology transcripts.

Run information is summarized in runevg.info, with scripts and logs in folder evigene_methods/
Final public sequence set, with names from reference protein blast, are in publicset/
Intermediate transcript assemblies are not provided here.

Developed at the Genome Informatics Lab of Indiana University Biology Department