# kfish2rae5g_sum.txt
2013-Nov-12++
Killifish, Fundulus heteroclitus genome project
http://arthropods.eugenes.org/EvidentialGene/killifish/project/
Gene families,Fish orthology: Search at killifish/project/ (FISH11G)
TABLE G2a. Fish species average orthology gene groups
---Common Groups--- ----All Groups-----
Species cBits aaSize orMiss tBits orGroup Tiny
--------- -------------------- --------------------
killifish 803 50 18 585 17272 1.1%
maylandia 824 45 76 596 16469 1.1%
tilapia 822 6 223 568 14905 1.9%
platyfish 783 -12 118 549 15305 4.7%
zebrafish 711 -9 366 478 15190 4.8%
sticklebk 763 -42 342 509 14343 7.8%
catfish 725 21 729 470 14276 3.4%
medaka 743 -45 654 478 13541 9.7%
tetraodon 732 -50 658 473 13423 7.9%
spotgar -- -110 2588 329 10882 20.2%
human -- 27 -- 395 12606 1.7%
----------------------------------------------------
source: kfish2/prot/fish11c/fish11gor3, Dec11 .. fish11gor3 update
cBits = bitscore average for 8656 common fish gene groups
tBits = bitscore average for all ortholog groups
aaSize = average protein size difference from group median
orMiss = missing ortholog groups that are common to other 8 of 9 fish (-gar)
orGroup = number of ortholog gene groups in species
Tiny = percent species gene size outliers below 2sd of group median size
----------------------------------------------------
TABLE G2b. Fish gene orthology categories (using OrthoMCL)
----------- GENES ----------- ------ GROUPS ----- Ortho to Kfish
nGene Orlog Inpara Uniq1 UDup OrGrp OrMis1 UniGrp Shared Best
----------------------------- ------------------- -------------
killifish 34931 21133 3672 7694 2432 17272 10 682 --- ---
maylandia 23194 21021 879 1171 122 16469 46 52 15468* 5159
tilapia 21437 19461 975 810 189 14904 135 78 14019 954
platyfish 20366 19483 214 641 29 15307 54 14 14609* 13352
zebrafish 26190 18465 4089 2202 1439 15187 188 226 13286* 1363
stickleback 20787 17954 1254 1396 181 14344 180 44 13317* 636
catfish 43671 17279 2964 15470 7938 14246 407 1561 12799* 1208
medaka 19686 16881 912 1535 361 13542 366 106 12670* 1071
tetraodon 19602 16814 904 1796 176 13425 360 67 12475* 696
spotgar 15734 11841 2655 1009 230 10880 1514 56 9492 116
human 39357 12699 7758 12497 6402 12608 420 2221 11265* 1182
-------------------------------------------------------------------------------
source: kfish2/prot/fish11c/fish11gor3, Dec11 .. gor3 upd Dec27
nGene = count of input genes, excludes alternate isoforms/locus.
Orlog = Orthologous genes (one-to-one matches among species)
Inpara = Inparalogs (recent ortholog duplicates) of orthologous genes
Uniq1,UDup = single-copy and duplicated species-unique genes
OrGrp,UniqGrp = orthologous and species-unique groups
OrMis1 = groups missing in species that all other species have
Ortho to Kfish, Shared= count of ortho groups shared with killifish,
Best = count of groups with closest homolog,
Shared* = maximum shared of 10 choices, tilapia shares more with maylandia,
and spotgar with zebrafish.
------------------------------------------------------------------
TABLE G2c. Fish Taxonomy with Human gene alignment stats
Human genes Fish Taxonomy
-------------------------------------------------------------------------------
Nhuman Ident% Align% Neopterygii
+ Teleostei
+ + + Euteleostei
+ + + + + + Pseudocrenilabrinae
14822 66 71 :+ + + + + + + + Maylandia zebra # african cichlid Zebra Mbuna, NCBI : KF2
14181 65 70 :+ + + + + + + + Oreochromis niloticus # Tilapia, Ensembl : KF1,2
+ + + + + Smegmamorpha
14893 64 70 :+ + + + + + + Gasterosteus aculeatus # Stickleback, Ensembl? : KF1,2
+ + + + + + Atherinomorpha
14478 64 67 :+ + + + + + + + Oryzias latipes # Medaka, Ensembl? : KF1,2
+ + + + + + + Cyprinodontiformes
15033 64 71 :+ + + + + + + + + + Xiphophorus maculatus # Platyfish, Consortium : KF2
17072 65 76 :+ + + + + + + + + + Fundulus heteroclitus # Killifish, Evigene : KF1,2
+ + + Otocephala
16127 65 70 :+ + + + + + Ictalurus punctatus # Catfish, Evigene : KF2
16871 64 73 :+ + + + + + Danio rerio # Zebrafish, Consortium/Ensembl : KF1,2
+ Semionotiformes
13081 70 54 :+ + Lepisosteus oculatus # Spotted gar, Draft Ensembl : KF2
-------------------------------------------------------------------------------
for N =26859 human genes, nc=16555 common to 7+ fish for align score
Notes:
-------------
Killifish, Maylandia and Tilapia form a good gene methods/results
comparison, as top-scored gene sets, recently built by 3 groups with
"good" gene construction pipelines. Some artifacts of methods may be
found. Tilapia has Ensembl:genewise+exonerate models from mix of
rna-seq + uniprot prots. Mayzebr is NCBI Gnomon annotate, also mix of
RNA-seq and related proteins, Kfish2 is Evigene annotate, mRNA-gene
strong but also using related species proteins. Other two do use
RNA-seq assembly, but not as extensively or carefully, and rely on
mapping to genome assembly. Ensembl-genewise-protein mapping has
potential to add artifacts of homolog models. NCBI Gnomon now uses
RNA-seq more carefully than in past, and better than Ensemble I think.
Killifish and Platyfish form another useful comparison, platyfish being closest
relative, and also a recent genome product built with current data and software.
Differences that can be highlighted:
1. "The quality of a gene set is dependent on the quality of the genome assembly"
(from Ensembl platyfish gene build document). This also can be derived from methods
of platyfish genome paper, e.g. the methods included discarding mRNA assemblies
that did not map well to genome assembly).
In contrast, killifish genes v2 are not dependent on quality of genome assembly,
merging both mRNA-assembly and genome-mapped methods to pick best set from both.
2. The human gene orthology stats indicate killifish surpases platyfish in
completeness of genes.
Killifish, Maylandia and Catfish form a third special comparison to other
fish genes. You will find in the Orthology search that these three share
more gene families that are missed in the other fish, than any other 3-fish
comparison, by about 100 families. This is I think an effect of (a) mRNA
assembly independent of genome genes used for Killifish and Catfish, and (b)
for Maylandia, the NCBI has improved its mRNA evidence use enough to be
roughly equivalent to discovering genes that may be poorly modelled on genome
assembly.
Find families shared by just these 3 fish,
http://arthropods.eugenes.org/lucegene_arthropod/search?q=fish11xml-all:geneid+AND+Killifish:[1+TO+999]+AND+Catfish:[1+TO+999]+AND+Maylandia:[1+TO+999]+AND+Medaka:0+AND+Stickleback:0+AND+Tetraodon:0+AND+Tilapia:0+AND+Zebrafish:0+AND+Platyfish:0
One of these is http://arthropods.eugenes.org/genepage/fish11xml/FISH11G_G18567
FISH11G_G18567 : new D-tyrosyl-tRNA(Tyr) deacylase, one of 3 same named families,
in killifish, catfish and maylandia only, maylandia: XP_004554729.1/LOC101478506, 1035 aa
FISH11G_G1773 : D-tyrosyl-tRNA deacylase member 2, all but killifish, various number of genes (Tetraodon has 10)
maylandia: XP_004554728.1/LOC101478225, 168 aa
G1773 and G18567 are related in that G1773 shorter protein aligns to longer G18567,
both have same CDD:202294 domain. In maylandia, these are tandem genes.
In killifish, missing shorter one would be where genome gap exists (mRNA assembly may or
may not have partial version)
FISH11G_G5675 : D-tyrosyl-tRNA deacylase member 1, one gene in all 11 species