euGenes/Arthropods About EvidentialGene BLAST Gene Search Maps Data DroSpeGe

Index of /arthropods/data/aphid/pea_aphid1/dupgenes

      Name                                      Last modified       Size  Description

[DIR] Parent Directory 16-Aug-2011 12:33 - [TXT] acyr1_gnomon.genes.mblastdupsc-pi80.genes 23-Sep-2008 19:26 200k [TXT] acyrgeno-smallsc-mblastdupsc-pi80.ids 22-Sep-2008 00:25 89k [TXT] acyrgeno-smallsc.mblast.stats 22-Sep-2008 15:03 951k [TXT] arp-acyr1_gnomon-pi80dupgene.tab 23-Sep-2008 20:39 11k


Estimate of polymorphic (spurious) scaffolds in new genome assemblies.
Re http://insects.eugenes.org/arthropods/data/aphid/dupgenes/ 
2008 Sept 23, D. Gilbert

Method:
  1. Select subset of scaffolds under 10Kb, as these are often ones with partial coverage,
     inability to assemble due to redundancy with existing larger scaffolds. For Daphnia
     with scaffold read coverage stats, these are almost all < 4x (many 1x) of the 8x coverage.
  2. Megablast match small scaf to full genome assembly
  3. Combine HSP/scaffold and select those small scaf with > 80% overall identity to larger scaffold
     (other criteria could be used).

Genomes tested: Acyr. pisum, Nasonia vit., Daphnia pulex,

Scaffold counts of putative spurious (polymorphic) scaffolds
  6197 acyr1-geno-smallsc-mblastdupsc-pi80.ids
   821 nasonia1-geno-smallsc-mblastdupsc-pi80.ids
   233 dpulex1-geno-smallsc-mblastdupsc-pi80.ids

Gene counts on putative spurious (polymorphic) scaffolds
  1866 acyr1_gnomon.genes.mblastdupsc-pi80.genes  
   208 nasonia1_gnomon.genes.mblastdupsc-pi80.genes
    89 dpulex1_gnomon.genes.mblastdupsc-pi80.genes

These are likely underestimates: criteria selected <10kb scaffold size (as most likely
partial-coverage heterozygous assemblies), and >80% total scaffold identity (where
high identity on part of scaffold may indicate a heterozygous contig).


Developed at the Genome Informatics Lab of Indiana University Biology Department