Evigene Methods for Orthology Assessment
for daph10_omcl gene clustering of Daphnia, Insects and Fishes (+human)
Method Steps:
1. collect species gene proteins, primary isoform only, for 10 species, in daph10omcl.aa
each header has species tag:ID prefix. Extract gene Names from aa headers for use below.
2. blastp -db daph10omcl.aa -query daph10omcl.aa -evalue 1e-5 -outfmt 7 -out daph10omcl.aa.blastp
3. tabulate daph10omcl.aa.blastp and run OrthoMCL (Evigene repackage), using MCL Markov clustering
run_evgomcl.sh script
4. outputs: daph10omcla.bpo,.gg are intermediate inputs to orthomcl_evg.pl,
all_orthomcl.out and tmp/ files are cluster outputs.
5. tabulate orthomcl.out to summary tables and annotated gene groups, with input gene.names (from aa headers)
run_evgomcltabn.sh script
6. summary and gene group outputs:
intermediate tables are daph10omcla_omclgn.tab (OMCL group per gene),
daph10omcla_omclgns2.tab (pairwise genes per omcl group),
daph10omcla_omclgn2sum.tab shorter of daph10omcla_omclgns2.
daph10omcla-orthomcl-count.tab table of species presence/gene count per group
daph10omcla-orthomcl-gclass.tab, gcommon.tab : brief summaries per species of group types (orlog,parlog,..)
daph10omcla-orthomcl-1to1n.tab, table of 1:1 ortholog species presence (from count.tab)
daph10omcla-orsinpar-count.tab and or1inpar-count tab, species matrix of ortholog counts shared by species pairs,
orsinpar = all orthologs, or1inpar = closest ortholog (orlog1) only. Matrix diagonal is paralog count
per species
daph10omcla_dpx17evg.orpar.tab, daph10omcla_dapmaevg14.orpar.tab : per species table of ortholog, paralog
pairings, from daph10omcla_omclgns2.tab that has all species
daph10omcla_genes.ugp.txt, daph10omcla_genes.ugp.xml : annotated gene groups, with consensus group name,
counts, and per-species genes listed
daph10omcla_genes.ugp_brief.txt : from ugp.txt, stripping long list of species genes (one line/group)