Index of /EvidentialGene/daphnia/daphnia_magna/Genes/gal_mag_plx
Name Last modified Size
Parent Directory 22-Sep-2016 14:57 -
daph3cdsaxt.align.tab 22-Sep-2016 14:27 3.0M
daph3spp.aastat 22-Sep-2016 14:29 1k
daph3sppkaks.readme.txt 22-Sep-2016 14:40 3k
daphmag2daphgal.cds.allkaks.tab 22-Sep-2016 13:07 1.1M
daphmag2daphplx10g.cds.allkaks.tab 22-Sep-2016 13:07 1.3M
Test of coding sequence conservation among 3 Daphnia species (magna, galeata and pulex)
Gene sets:
D. magna, evigene evg7 hybrid rna+dna, Gilbert D and colls, doi:10.1038/sdata.2016.30
D. galeata, evigene rna assembled, Cordellier M and colls, doi:10.1093/gbe/evw221
D. pulex, evigene dna predicted, 2010 beta3, Gilbert D and colls,
Primary (longest) transcript coding sequence is used from each of ~30,000 loci/species.
Methods of coding conservation stats, done simply:
1. blastn of coding sequences of primary transcript between D. magna x (D.pulex, D.galeata)
pt=daphmag2daphplx10gb.cds
$nbin/blastn \
-evalue 1e-9 -task dc-megablast -template_type coding -template_length 18 \
-db daphplx/daphnia_genes2010_beta3.cds -query daphmag/dmagset7fin.cds \
-out $pt.dcblast1n
2. extract aligned sequences to axt align file, with evigene/scripts/prot/blastxxx.pl
cat $pt.dcblast1n | env skipalt=1 $evigene/scripts/prot/blastcds2axt.pl > $pt.dcblast1n.axt
3. run thru KaKs_Calculator2.0 to tabulate gene scores
bio/KaKs_Calculator2.0/KaKs_Calculator -m MYN -i $pt.dcblast1n.axt -o $pt.kaks >& $pt.klog
4. pick out genes with significant p<=0.05 Ka/Ka
Result files:
daph3cdsaxt.align.tab = pairwise alignment header from axt, with gene IDs, sizes, %align and spans
use this for gene IDs.
daphmag2daphgal.cds.allkaks.tab = output table for Daphmag x Daphgal of KaKs_Calc, all gene aligns,
trimmed to useful columns, with P-value column of significance. Lacks Dgal gene IDs.
daphmag2daphplx10g.cds.allkaks.tab = output table ditto for Daphmag x Daphplx
Don Gilbert, 2016.Sep.22
----
Each species has about 30,000 primary transcripts, about 20,000 cds-align with blastn this way,
about 18,000 of those have signif. Ka/Ks with 1 or 2 relatives, averaging at 0.07 (0.10 Ka, 1.40 Ks)
over aligned coding spans of 1200 bases. Of the 18,000 signif Ka/Ks, 1% are positive selection (>1),
and 99% are coding conservation (Ka/Ks << 1). Keep in mind the above mega-blastn of cds is not
as sensitive as blastp align of proteins, so some aligns are missing.
total cds-align of blastn, regardless of kaks
-- top aligns to Dmagna only (mina=0.95) --
Naln Lenaln %aln NDmag Len %aln per NDmag
Dgal 8949 1404.8 82.2 20460 614.4 35.9
Dplx1g 11276 1315.2 80.8 20460 724.8 44.5
-- all aligns per source (mina=0.01) --
Naln Lenaln %aln NDmag Len %aln per NDmag
Dgal 17615 1215.5 71 20460 1046.5 61.1
Dplx1g 19079 1214 72.6 20460 1132 67.7
------
Summary of signif. kaks tables. allkaks.tab and sigkaks.tab are same format as this.
==> daphmag2daphgal.cds.sigkaks.tab <==
Dapma7bEVm030749t1 Dgal 0.0173377 1.52132 0.0113965 7.13679e-86 1080 262.371 817.629
Dapma7bEVm030751t1 Dgal 0.136382 0.660938 0.206346 4.2627e-10 375 113.781 261.219
#total nt=16320, ave(ka,ks,kaks,ka/ks,tlen): 0.1058 1.4127 1.7454 0.0749 1271
==> daphmag2daphplx10g.cds.sigkaks.tab <==
Dapma7bEVm030749t1 Dplx1g 0.0203442 1.25545 0.0162048 5.00237e-73 1050 252.436 797.564
Dapma7bEVm030751t1 Dplx1g 0.111516 0.835212 0.133518 9.83288e-10 369 104.371 264.629
#total nt=17660, ave(ka,ks,kaks,ka/ks,tlen): 0.1046 1.3797 6.0293 0.0758 1266
------
|