Gnodes/Genome Depth Estimator
EvidentialGene package for genome coverage depth estimation for animal & plant genomes
Gnodes is a Genome Depth Estimator for animal and plant genomes, also
a genome size estimator. It calculates genome sizes based on DNA
coverage of assemblies, using unique, conserved gene spans for its
standard depth. Results of this tool match the independent measures from flow
cytometry of genome size quite well in tests with plants and animals.
Tests on a range of model and non-model animal and plant genome assemblies
give reliable and accurate results, in contrast to unreliable K-mer histogram methods.
Gnodes draft publication with supplemental data
is now available (2022 May), with extensive results
(Abstract text
and Document PDF).
See also Gnodes DNA Depth Deficit Analyses
for a synopsis of missing matter in assemblies and gene copy numbers.
Chromosome plots of DNA-Depth with major components are
useful to resolve deficits.
Also see How to Use Gnodes .
Boxplots (median, range) of Estimators, for equivalence to Flow
Cytometry (FC) measured genome sizes. Gnodes is very accurate, whereas
K-mer histogram methods (GenoScope, covest, findGSE) are rather
inaccurate, with a wide range of estimates. Assembly sizes are
typically below FC measured sizes.
|
Estimations relative to FC value, for measured animal and plant
genomes, with median, range and values from three estimators: Assembly,
Gnodes, and GenoScope. Flow cytometry sizes in megabases are
given, ranging from 160 Mb (plant) to 3400 Mb (human).
|
Genome reconstruction is a Goldilocks problem: answers are
often too hot, or too cold; the just-right solution takes effort to
discriminate among these outcomes. Gnodes provides a measuring stick
for too hot and too cold genome assemblies. When used to
compare several assemblies of one organism, it spots over- and
under-assembled portions, relative to its unique gene DNA depth
measure. It can be used to estimate genome size from only gene coding
sequences mapped with genomic DNA, and these tests show it is reliable
for that.
Gnodes is now a component of the EvidentialGene package:
evigene/scripts/genoasm/gnodes_pipe.pl
Gnodes resolves a few discrepancies, such as Daphnia water flea genome
assemblies that are only 1/2 size of flow cytometry measured size, and the
well-known 40 megabase discrepancy in Arabidopsis (Bennett et al 2003).
Extensive gene coding sequence duplication is a likely reason that assemblies
of Daphnia genomes have faltered at half-size.
Half of Daphnia genomic DNA aligns to genes coding sequence,
much more than the 10-20% of measured insects and vertebrates,
or 25% in measured plants.
|
|
|