Estimate of polymorphic (spurious) scaffolds in new genome assemblies.
2008 Sept 23, D. Gilbert
1. Select subset of scaffolds under 10Kb, as these are often ones with partial coverage,
inability to assemble due to redundancy with existing larger scaffolds. For Daphnia
with scaffold read coverage stats, these are almost all < 4x (many 1x) of the 8x coverage.
2. Megablast match small scaf to full genome assembly
3. Combine HSP/scaffold and select those small scaf with > 80% overall identity to larger scaffold
(other criteria could be used).
Genomes tested: Acyr. pisum, Nasonia vit., Daphnia pulex,
Scaffold counts of putative spurious (polymorphic) scaffolds
Gene counts on putative spurious (polymorphic) scaffolds
These are likely underestimates: criteria selected <10kb scaffold size (as most likely
partial-coverage heterozygous assemblies), and >80% total scaffold identity (where
high identity on part of scaffold may indicate a heterozygous contig).