euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/vertebrates

      Name                      Last modified       Size  

[DIR] Parent Directory 06-Jan-2017 16:28 - [DIR] catfish/ 10-Jun-2013 15:52 - [TXT] fish_geneset_qual2016.txt 22-Feb-2016 15:29 4k [DIR] killifish/ 14-Jul-2015 14:16 -


Quality effect of gene set construction methods for vertebrate fish,
comparing methods NCBI Eukaryote annotation (nc), EvidentialGene (evg),
MAKER2 (mk), EvidenceModeller (em), within and across species.
Similar method quality rankings are obtained in gene sets of plants and arthropods.
- Don Gilbert, 2016 Feb.

      Highly conserved ortholog genes (BUSCO vertebrate set)  
      Expectation ~ 99% found, ~90% align, ~ <1%  tiny or big extremes 
Geneset       nGroup nFound %Found %Align %Tiny %Big  
-----------  -------------------------------------------------------
kfish.evg     4097    4045   98.7   91.5  0.3   0.5    * high accuracy
kfish.nc      4097    4031   98.4   89.5  1.3   0.7    
notfur.em     4097    3996   97.5   87.5  2.9   1.1    
notfur.mk     4097    3726   90.9   76.6  5.2   2.2    - low accuracy
pike.nc       4097    4060   99.1   93.2  0.6   0.6    * high accuracy
pike.mk       4097    3114   76.0   56.5  7.8   0.2    - low accuracy
amolly.nc     4097    4050   98.9   92.2  0.8   0.8    * high accuracy
guppy.nc      4097    4048   98.8   91.5  0.8   1.0    * high accuracy
---------------------------------------------------------------------
      Ortholog groups common to 3+ reference fish  
      Expectation ~ 90% found, ~85% align, ~ 1-3% tiny or big extremes
Geneset       nGroup nFound %Found %Align %Tiny %Big  
-----------  -------------------------------------------------------
kfish.evg     17904  16345   91.3    86.6  0.8   2.5   * high accuracy
kfish.nc      17904  15701   87.7    82.9  1.8   1.9    
notfur.em     17904  15277   85.3    78.8  2.9   2.7   
notfur.mk     17904  13706   76.6    67.4  5.8   6.9   - low accuracy
pike.nc       17904  15726   87.8    82.5  1.0   2.4    
pike.mk       17904  11676   65.2    51.9  16.6  1.8   - low accuracy
amolly.nc     17904  15888   88.7    85.6  0.9   2.4   * high accuracy 
guppy.nc      17904  15797   88.2    84.6  1.2   2.6   * high accuracy 
zebrafish     17904  15050   84.1    77.6  2.4   1.3   
---------------------------------------------------------------------
Gene set method suffix: nc= NCBI Eukaryote annotation, 
      evg=EvidentialGene, mk= MAKER2, em = EvidenceModeller

Teleostei fish taxonomy tree
 Euteleosteomorpha
 + Neoteleostei
 + + + + + + Haplochromini
 + + + + + + + Maylandia zebra        # mayzebr = african cichlid Zebra Mbuna
 + + + + + Cyprinodontiformes
 + + + + + + + Nothobranchius furzeri   # notfur = african turquoise killifish
 + + + + + + + Poeciliidae
 + + + + + + + + + + Xiphophorus maculatus # platyfish
 + + + + + + + + + + Poecilia formosa      # amolly = amazon molly
 + + + + + + + + + + Poecilia reticulata   # guppy
 + + + + + + + Fundulidae
 + + + + + + + + Fundulus heteroclitus # kfish = atlantic killifish
 + Protacanthopterygii
 + + Esox lucius              # northern pike
 Otomorpha
 + + Cypriniphysae
 + + + Danio rerio            # Zebrafish
---------------------------------------------------------------------

Fish comparison gene sets:
  kfish.evg   Fundulus heteroclitus, Evigene 2014, eugenes.org/EvidentialGene/killifish/Genes/
  kfish.nc    Fundulus heteroclitus, NCBI 2015, ftp.ncbi.nih.gov/genomes/all/GCF_000826765.1_Fundulus_heteroclitus-3.0.2
  notfur.em   Nothobranchius furzeri, EvidenceModeller, doi:10.1016/j.cell.2015.10.071
  notfur.mk   Nothobranchius furzeri, MAKER2, doi:10.1016/j.cell.2015.11.008
  pike.nc     Esox lucius, NCBI,  ftp.ncbi.nih.gov/genomes/all/GCF_000721915.2_ASM72191v2
  pike.mk     Esox lucius, MAKER2, doi:10.1371/journal.pone.0102089
  amolly.nc   Poecilia formosa, NCBI, ftp.ncbi.nih.gov/genomes/all/GCF_000485575.1_Poecilia_formosa-5.1.2
  guppy.nc    Poecilia reticulata, NCBI,  ftp.ncbi.nih.gov/genomes/all/GCF_000633615.1_Guppy_female_1.0_MT

Orthology reference set of 10 fish + human: 
  23042 Astyanax_mexicanus, 46251 catfish, 23194 Maylandia_zebra, 19686 medaka, 20366 platyfish, 
  18341 spotted gar, 20787 stickleback, 19602 tetraodon, 21437 tilapia, 26247 zebrafish, 39357 human

Refs:
 pmk: Rondeau EB, Minkley DR, Leong JS, Messmer AM, Jantzen JR, et al. (2014) doi:10.1371/journal.pone.0102089
 nfmk: Valenzano et al., 2015, Cell 163, 1539-1554 (2015), doi:10.1016/j.cell.2015.11.008
 nfem: Reichwald et al., 2015, Cell 163, 1527-1538 (2015), doi:10.1016/j.cell.2015.10.071
 

Developed at the Genome Informatics Lab of Indiana University Biology Department