euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

EvidentialGene : zebrafish gene improvements

Summary of conserved vertebrate genes in 3 zebrafish gene sets
Gene setAlignCompleteFragmentMissing
Evigene17443.1257259
NCBI16433.825541319
Ensembl17426.825104729

Examples of zebrafish genes improved in Evigene reconstruction, versus NCBI EGAP RefSeq gene set of 2016, and Ensembl gene set of 2017 (ZFIN uses this). The first set shows conserved single-copy vertebrate gene families measured with BUSCO OrthoDB v9. The following set shows (non-single-copy) fish genes conserved in 2+ related species.

For several of the missing conserved genes, there exists a gene model, that either has an abnormally short protein, or no protein (i.e. non-coding transcript models). For some cases, and others not shown, the Evigene complete ortholog gene is poorly mapped, or unmapped, to zfish GRCz10 chromosome assembly. For other cases, the Evigene model is fully located on chromosome assembly.


Example gene model map views

Order of gene set tracks is mRNA coding gene models of (1) DrEvigene17, (2) DrNCBi, (3), DrZFIN/Ensembl at top, followed by alternate transcripts and non-coding models.

Conserved Genes missed by Ensembl 2017

busco_ensmiss_chr8_Danrer5mEVm009964t2 = Coiled-coil domain-containing protein 112,zfish:XP_017212878.1,EOG090B07I8,Complete,418.aa

busco_ensmiss_chr13_Danrer5mEVm004445t3 = Molybdenum cofactor sulfurase,zfish:NP_001014388.2,EOG090B024W,Complete,586.aa

busco_ensmiss_chr14_Danrer5mEVm009494t2 = BEN domain-containing protein 4,zfish:XP_695315.5,EOG090B06KW,Complete,291.aa

busco_ensmiss_chr19_Danrer5mEVm001891t1 = Claspin-like protein,cavefish:XP_016297446.1,EOG090B01JZ,Complete,1011.aa

busco_ensmiss_chr24_Danrer5mEVm007819t1 = MCM domain-containing protein 2,zfish:NP_001311369.1,EOG090B034L,Complete,512.aa

busco_ensmiss_na834_Danrer5mEVm001318t3 = RAB6A-GEF complex partner protein 1,cavefish:XP_016325612.1,EOG090B00KE,Complete,1228.aa


Conserved Genes missed by NCBI RefSeq 2016

busco_ncbmiss_chr6_Danrer5mEVm001890 = Methyl-CpG-binding domain protein 5,cavefish:XP_016349756.1,EOG090B00SR,Complete,1081.aa

busco_ncbmiss_chr7_Danrer5mEVm001611 = Ubiquitin carboxyl-terminal hydrolase 47-like protein,cavefish:XP_016339416.1,EOG090B00QN,Complete,1159.aa

busco_ncbmiss_chr10_Danrer5mEVm013365 = NHL repeat-containing protein 3,catfish:XP_017343766.1,EOG090B080R,Complete,268.aa

busco_ncbmiss_chr15_Danrer5mEVm001846t1 = Fibronectin type-III domain-containing protein,cavefish:XP_016318615.1,EOG090B00X6,Complete,1178.aa

busco_ncbmiss_chr19_Danrer5mEVm014755t1 = Tumor necrosis factor receptor superfamily,cavefish:XP_016316730.1,EOG090B06GQ,Fragmented,200.aa

busco_ncbmiss_chr19_Danrer5mEVm069575t1 = Lysine-specific histone demethylase 1B,catfish:XP_017341166.1,EOG090B0202,Complete,693.aa


Conserved Fish Genes, Evigene models missed by others

ref2fish_ncmiss_Danrer5mEVm000084t1 = Zinc finger homeobox 3-like protein,cavefish:XP_016331104.1,3843.aa

ref2fish_ncmiss_Danrer5mEVm000414t1 = Protein KIAA0100-like,cavefish:XP_016321029.1,2225.aa

ref2fish_ncmiss_Danrer5mEVm000733t1 = Trinucleotide repeat-containing gene 6A,carp:XP_018937116.1,1852.aa

ref2fish_partmap_Danrer5mEVm000062t2 = Protocadherin Fat 4-like protein,cavefish:XP_016349626.1,4297.aa

ref2fish_partmap_Danrer5mEVm000080t1 = Vacuolar protein sorting-associated protein,cavefish:XP_016320295.1,3940.aa

ref2fish_partmap_Danrer5mEVm000248t1 = Protein unc-79,carp:XP_018977020.1,2647.aa

ref2fish_partmap_Danrer5mEVm000354t5 = DnaJ subfamily C member 13,catfish:XP_017308998.1,1945.aa

ref2fish_partmap_Danrer5mEVm000517t1 = Rootletin,cavefish:XP_016344398.1,2055.aa

ref2fish_partmap_Danrer5mEVm000702t1 = Zinc finger C3H1 domain-containing protein,cavefish:XP_016340693.1,1883.aa

Developed at the Genome Informatics Lab of Indiana University Biology Department