euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Gnodes/Genome Depth Estimator
EvidentialGene package for genome coverage depth estimation for animal & plant genomes

Gnodes is a Genome Depth Estimator for animal and plant genomes, also a genome size estimator. It calculates genome sizes based on DNA coverage of assemblies, using unique, conserved gene spans for its standard depth. Results of this tool match the independent measures from flow cytometry of genome size quite well in tests with plants and animals. Tests on a range of model and non-model animal and plant genome assemblies give reliable and accurate results, in contrast to unreliable K-mer histogram methods.

Gnodes draft publication with supplemental data is now available (2022 May), with extensive results (Abstract text and Document PDF).

See also Gnodes DNA Depth Deficit Analyses for a synopsis of missing matter in assemblies and gene copy numbers. Chromosome plots of DNA-Depth with major components are useful to resolve deficits. Also see How to Use Gnodes .


Boxplots (median, range) of Estimators, for equivalence to Flow Cytometry (FC) measured genome sizes. Gnodes is very accurate, whereas K-mer histogram methods (GenoScope, covest, findGSE) are rather inaccurate, with a wide range of estimates. Assembly sizes are typically below FC measured sizes.

Estimations relative to FC value, for measured animal and plant genomes, with median, range and values from three estimators: Assembly, Gnodes, and GenoScope. Flow cytometry sizes in megabases are given, ranging from 160 Mb (plant) to 3400 Mb (human).

Genome reconstruction is a Goldilocks problem: answers are often too hot, or too cold; the just-right solution takes effort to discriminate among these outcomes. Gnodes provides a measuring stick for too hot and too cold genome assemblies. When used to compare several assemblies of one organism, it spots over- and under-assembled portions, relative to its unique gene DNA depth measure. It can be used to estimate genome size from only gene coding sequences mapped with genomic DNA, and these tests show it is reliable for that. Gnodes is now a component of the EvidentialGene package: evigene/scripts/genoasm/gnodes_pipe.pl

Gnodes resolves a few discrepancies, such as Daphnia water flea genome assemblies that are only 1/2 size of flow cytometry measured size, and the well-known 40 megabase discrepancy in Arabidopsis (Bennett et al 2003).

Extensive gene coding sequence duplication is a likely reason that assemblies of Daphnia genomes have faltered at half-size. Half of Daphnia genomic DNA aligns to genes coding sequence, much more than the 10-20% of measured insects and vertebrates, or 25% in measured plants.


Developed at the Genome Informatics Lab of Indiana University Biology Department