euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/daphnia/daphnia_magna/Genes/gal_mag_plx

      Name                               Last modified       Size  

[DIR] Parent Directory 22-Sep-2016 14:57 - [TXT] daph3cdsaxt.align.tab 22-Sep-2016 14:27 3.0M [TXT] daph3spp.aastat 22-Sep-2016 14:29 1k [TXT] daph3sppkaks.readme.txt 22-Sep-2016 14:40 3k [TXT] daphmag2daphgal.cds.allkaks.tab 22-Sep-2016 13:07 1.1M [TXT] daphmag2daphplx10g.cds.allkaks.tab 22-Sep-2016 13:07 1.3M


Test of coding sequence conservation among 3 Daphnia species (magna, galeata and pulex)

Gene sets:  
  D. magna, evigene evg7 hybrid rna+dna, Gilbert D and colls, doi:10.1038/sdata.2016.30
  D. galeata, evigene rna assembled, Cordellier M and colls, doi:10.1093/gbe/evw221
  D. pulex, evigene dna predicted, 2010 beta3, Gilbert D and colls, 
Primary (longest) transcript coding sequence is used from each of ~30,000 loci/species.

Methods of coding conservation stats, done simply: 
  1. blastn of coding sequences of primary transcript between D. magna x (D.pulex, D.galeata)
      pt=daphmag2daphplx10gb.cds
      $nbin/blastn  \
        -evalue 1e-9 -task dc-megablast -template_type coding -template_length 18 \
        -db daphplx/daphnia_genes2010_beta3.cds -query daphmag/dmagset7fin.cds \
        -out $pt.dcblast1n 

  2. extract aligned sequences to axt align file,  with evigene/scripts/prot/blastxxx.pl
     cat $pt.dcblast1n | env skipalt=1 $evigene/scripts/prot/blastcds2axt.pl > $pt.dcblast1n.axt

  3. run thru KaKs_Calculator2.0 to tabulate gene scores
     bio/KaKs_Calculator2.0/KaKs_Calculator -m MYN  -i $pt.dcblast1n.axt -o $pt.kaks >& $pt.klog 

  4. pick out genes with significant p<=0.05 Ka/Ka

Result files:
  daph3cdsaxt.align.tab = pairwise alignment header from axt, with gene IDs, sizes, %align and spans
    use this for gene IDs.
  daphmag2daphgal.cds.allkaks.tab = output table for Daphmag x Daphgal of  KaKs_Calc, all gene aligns,
     trimmed to useful columns, with P-value column of significance. Lacks Dgal gene IDs.
  daphmag2daphplx10g.cds.allkaks.tab = output table ditto for Daphmag x Daphplx

Don Gilbert, 2016.Sep.22
----

Each species has about 30,000 primary transcripts, about 20,000 cds-align with blastn this way,
about 18,000 of those have signif. Ka/Ks with 1 or 2 relatives, averaging at 0.07 (0.10 Ka, 1.40 Ks)
over aligned coding spans of 1200 bases.   Of the 18,000 signif Ka/Ks, 1% are positive selection (>1),
and 99% are coding conservation (Ka/Ks << 1).  Keep in mind the above mega-blastn of cds is not 
as sensitive as blastp align of proteins, so some aligns are missing.

total cds-align of blastn, regardless of kaks 
-- top aligns to Dmagna only (mina=0.95) --
        Naln    Lenaln  %aln    NDmag   Len     %aln per NDmag
Dgal    8949    1404.8  82.2    20460   614.4   35.9
Dplx1g  11276   1315.2  80.8    20460   724.8   44.5

-- all aligns per source (mina=0.01) --
        Naln    Lenaln  %aln    NDmag   Len     %aln per NDmag
Dgal    17615   1215.5  71      20460   1046.5  61.1
Dplx1g  19079   1214    72.6    20460   1132    67.7
------

Summary of signif. kaks tables.  allkaks.tab and sigkaks.tab are same format as this.

==> daphmag2daphgal.cds.sigkaks.tab <==
Dapma7bEVm030749t1	Dgal	0.0173377	1.52132	0.0113965	7.13679e-86	1080	262.371	817.629
Dapma7bEVm030751t1	Dgal	0.136382	0.660938	0.206346	4.2627e-10	375	113.781	261.219
#total nt=16320, ave(ka,ks,kaks,ka/ks,tlen): 0.1058 1.4127 1.7454 0.0749 1271

==> daphmag2daphplx10g.cds.sigkaks.tab <==
Dapma7bEVm030749t1	Dplx1g	0.0203442	1.25545	0.0162048	5.00237e-73	1050	252.436	797.564
Dapma7bEVm030751t1	Dplx1g	0.111516	0.835212	0.133518	9.83288e-10	369	104.371	264.629
#total nt=17660, ave(ka,ks,kaks,ka/ks,tlen): 0.1046 1.3797 6.0293 0.0758 1266

------


Developed at the Genome Informatics Lab of Indiana University Biology Department