euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Search Fish Orthology Gene Groups : Killifish genome project

Search Fish gene families
FISH12: (2014.03) .. FISH11: (2013.12) .. FISH8: (2012.08)

for:  

>Limit to genes in these taxa
>Limit to genes in these species
Killifish Maylandia Platyfish
Medaka Stickleback Catfish
Tetraodon Tilapia Zebrafish
Spottedgar Human
      Name                     Last modified       Size  

[DIR] Parent Directory 10-Jan-2014 14:45 - [DIR] fish11_blastp/ 05-Aug-2015 12:33 - [DIR] fish11_omcl/ 04-Jan-2014 23:16 - [TXT] fish11_orthology_sum.txt 08-Feb-2014 23:03 8k [DIR] fish11_proteins/ 04-Jan-2014 16:11 - [DIR] fish12_omcl/ 23-Jan-2015 14:33 - [DIR] fish12_proteins/ 23-Mar-2014 20:47 - [TXT] orthology_search.html 05-Jan-2014 21:53 9k [TXT] orthology_search12.html 21-Mar-2014 16:31 9k [DIR] summary/ 05-Jan-2014 21:00 -

# kfish2rae5g_sum.txt
2013-Nov-12++

Killifish, Fundulus heteroclitus genome project
http://arthropods.eugenes.org/EvidentialGene/killifish/project/
Gene families,Fish orthology:  Search at killifish/project/ (FISH11G)
    
TABLE G2a.  Fish species average orthology gene groups 
           ---Common Groups---   ----All Groups-----
Species    cBits aaSize orMiss   tBits orGroup Tiny
--------- --------------------  --------------------
killifish  803     50     18     585   17272   1.1%
maylandia  824     45     76     596   16469   1.1% 
tilapia    822      6    223     568   14905   1.9% 
platyfish  783    -12    118     549   15305   4.7% 
zebrafish  711     -9    366     478   15190   4.8% 
sticklebk  763    -42    342     509   14343   7.8%
catfish    725     21    729     470   14276   3.4% 
medaka     743    -45    654     478   13541   9.7%
tetraodon  732    -50    658     473   13423   7.9%
spotgar    --    -110   2588     329   10882  20.2%
human      --      27    --      395   12606   1.7%
----------------------------------------------------
  source: kfish2/prot/fish11c/fish11gor3, Dec11 .. fish11gor3 update
  cBits = bitscore average for 8656 common fish gene groups
  tBits = bitscore average for all ortholog groups
  aaSize = average protein size difference from group median
  orMiss = missing ortholog groups that are common to other 8 of 9 fish (-gar)
  orGroup = number of ortholog gene groups in species
  Tiny  = percent species gene size outliers below 2sd of group median size  
----------------------------------------------------

TABLE G2b.  Fish gene orthology categories (using OrthoMCL)
            ----------- GENES -----------     ------ GROUPS -----  Ortho to Kfish
            nGene Orlog Inpara Uniq1 UDup     OrGrp OrMis1 UniGrp  Shared  Best 
            -----------------------------     -------------------  -------------  
killifish   34931 21133   3672  7694 2432     17272    10    682    ---     --- 
maylandia   23194 21021    879  1171  122     16469    46     52   15468*  5159 
tilapia     21437 19461    975   810  189     14904   135     78   14019    954 
platyfish   20366 19483    214   641   29     15307    54     14   14609* 13352 
zebrafish   26190 18465   4089  2202 1439     15187   188    226   13286*  1363 
stickleback 20787 17954   1254  1396  181     14344   180     44   13317*   636 
catfish     43671 17279   2964 15470 7938     14246   407   1561   12799*  1208 
medaka      19686 16881    912  1535  361     13542   366    106   12670*  1071 
tetraodon   19602 16814    904  1796  176     13425   360     67   12475*   696 
spotgar     15734 11841   2655  1009  230     10880  1514     56    9492    116 
human       39357 12699   7758 12497 6402     12608   420   2221   11265*  1182 
-------------------------------------------------------------------------------
  source: kfish2/prot/fish11c/fish11gor3, Dec11 .. gor3 upd Dec27
  nGene = count of input genes, excludes alternate isoforms/locus.
  Orlog = Orthologous genes (one-to-one matches among species)
  Inpara = Inparalogs (recent ortholog duplicates) of orthologous genes
  Uniq1,UDup  = single-copy and duplicated species-unique genes
  OrGrp,UniqGrp = orthologous and species-unique groups
  OrMis1  = groups missing in species that all other species have
  Ortho to Kfish, Shared= count of ortho groups shared with killifish, 
     Best    = count of groups with closest homolog,
     Shared* = maximum shared of 10 choices, tilapia shares more with maylandia, 
             and spotgar with zebrafish.
------------------------------------------------------------------

TABLE G2c.  Fish Taxonomy with Human gene alignment stats 
    Human genes       Fish Taxonomy
-------------------------------------------------------------------------------
Nhuman Ident% Align%  Neopterygii
                      + Teleostei
                      + + + Euteleostei
                      + + + + + + Pseudocrenilabrinae
14822   66     71    :+ + + + + + + + Maylandia zebra # african cichlid Zebra Mbuna, NCBI : KF2
14181   65     70    :+ + + + + + + + Oreochromis niloticus # Tilapia, Ensembl : KF1,2
                      + + + + + Smegmamorpha
14893   64     70    :+ + + + + + + Gasterosteus aculeatus # Stickleback, Ensembl? : KF1,2
                      + + + + + + Atherinomorpha
14478   64     67    :+ + + + + + + + Oryzias latipes # Medaka, Ensembl? : KF1,2
                      + + + + + + + Cyprinodontiformes
15033   64     71    :+ + + + + + + + + + Xiphophorus maculatus # Platyfish, Consortium : KF2
17072   65     76    :+ + + + + + + + + + Fundulus heteroclitus # Killifish, Evigene : KF1,2
                      + + + Otocephala
16127   65     70    :+ + + + + + Ictalurus punctatus # Catfish, Evigene : KF2
16871   64     73    :+ + + + + + Danio rerio # Zebrafish, Consortium/Ensembl : KF1,2
                      + Semionotiformes
13081   70     54    :+ + Lepisosteus oculatus # Spotted gar, Draft Ensembl : KF2
-------------------------------------------------------------------------------
  for N =26859 human genes, nc=16555 common to 7+ fish for align score

  
Notes: 
-------------
Killifish, Maylandia and Tilapia form a good gene methods/results
comparison, as top-scored gene sets, recently built by 3 groups with
"good" gene construction pipelines. Some artifacts of methods may be
found.  Tilapia has Ensembl:genewise+exonerate models from mix of
rna-seq + uniprot prots.  Mayzebr is NCBI Gnomon annotate, also mix of
RNA-seq and related proteins, Kfish2 is Evigene annotate, mRNA-gene
strong but also using related species proteins.  Other two do use
RNA-seq assembly, but not as extensively or carefully, and rely on
mapping to genome assembly. Ensembl-genewise-protein mapping has
potential to add artifacts of homolog models. NCBI Gnomon now uses
RNA-seq more carefully than in past, and better than Ensemble I think.

Killifish and Platyfish form another useful comparison, platyfish being closest
relative, and also a recent genome product built with current data and software.
Differences that can be highlighted:
1.  "The quality of a gene set is dependent on the quality of the genome assembly"
  (from Ensembl platyfish gene build document). This also can be derived from methods
  of platyfish genome paper, e.g. the methods included discarding mRNA assemblies
  that did not map well to genome assembly).
  In contrast, killifish genes v2 are not dependent on quality of genome assembly,
  merging both mRNA-assembly and genome-mapped methods to pick best set from both.
2.  The human gene orthology stats indicate killifish surpases platyfish in
  completeness of genes.    

Killifish, Maylandia and Catfish form a third special comparison to other
fish genes.  You will find in the Orthology search that these three share
more gene families that are missed in the other fish, than any other 3-fish
comparison, by about 100 families.   This is I think an effect of (a) mRNA
assembly independent of genome genes used for Killifish and Catfish, and (b)
for Maylandia, the NCBI has improved its mRNA evidence use enough to be
roughly equivalent to discovering genes that may be poorly modelled on genome
assembly.


Find families shared by just these 3 fish,
http://arthropods.eugenes.org/lucegene_arthropod/search?q=fish11xml-all:geneid+AND+Killifish:[1+TO+999]+AND+Catfish:[1+TO+999]+AND+Maylandia:[1+TO+999]+AND+Medaka:0+AND+Stickleback:0+AND+Tetraodon:0+AND+Tilapia:0+AND+Zebrafish:0+AND+Platyfish:0

One of these is http://arthropods.eugenes.org/genepage/fish11xml/FISH11G_G18567
FISH11G_G18567  : new D-tyrosyl-tRNA(Tyr) deacylase, one of 3 same named families,
  in killifish, catfish and maylandia only, maylandia: XP_004554729.1/LOC101478506, 1035 aa 

FISH11G_G1773   : D-tyrosyl-tRNA deacylase member 2, all but killifish, various number of genes (Tetraodon has 10)
  maylandia: XP_004554728.1/LOC101478225, 168 aa
G1773 and G18567 are related in that G1773 shorter protein aligns to longer G18567, 
  both have same CDD:202294 domain. In maylandia, these are tandem genes.  
  In killifish, missing shorter one would be where genome gap exists (mRNA assembly may or 
  may not have partial version)

FISH11G_G5675	  : D-tyrosyl-tRNA deacylase member 1, one gene in all 11 species
  


Developed at the Genome Informatics Lab of Indiana University Biology Department