Daphnia magna genome assembly 2.4 (date 2010.04.22) genomic 454 reads mapped to assembly, paired (_1) and unpaired (_u) of read files pairs.*.sff and GBUN2NT*.sff listed in dmagna20100422assembly.summary, converted to fastq with SffToCA of Celera assembler, mapped to genome with bowtie2. Subset label Read files a. dmag24gflxs13r2 pairs.s1[0123]r[12].sff b. dmag24gflxs9r2 pairs.s[789]r[12].sff c. dmag24gtitGBUN2NT02 GBUN2NT0[12].sff d. dmag24gtits14l pairs.s14l*.sff ==> bowgnloc/dmagna20100422assembly-dmag24gflxs13r2_1.fstat <== 881680 + 0 in total (QC-passed reads + QC-failed reads) 864272 + 0 mapped (98.03%:-nan%) 84004 + 0 properly paired (9.53%:-nan%) 861642 + 0 with itself and mate mapped 408314 + 0 with mate mapped to a different chr ==> bowgnloc/dmagna20100422assembly-dmag24gflxs13r2_u.fstat <== 1623266 + 0 in total (QC-passed reads + QC-failed reads) 1592735 + 0 mapped (98.12%:-nan%) ==> bowgnloc/dmagna20100422assembly-dmag24gflxs9r2_1.fstat <== 578384 + 0 in total (QC-passed reads + QC-failed reads) 566905 + 0 mapped (98.02%:-nan%) 54798 + 0 properly paired (9.47%:-nan%) 565086 + 0 with itself and mate mapped 268946 + 0 with mate mapped to a different chr ==> bowgnloc/dmagna20100422assembly-dmag24gflxs9r2_u.fstat <== 1141777 + 0 in total (QC-passed reads + QC-failed reads) 1120693 + 0 mapped (98.15%:-nan%) ==> bowgnloc/dmagna20100422assembly-dmag24gtitGBUN2NT02_1.fstat <== 738358 + 0 in total (QC-passed reads + QC-failed reads) 730933 + 0 mapped (98.99%:-nan%) 249144 + 0 properly paired (33.74%:-nan%) 729458 + 0 with itself and mate mapped 293658 + 0 with mate mapped to a different chr ==> bowgnloc/dmagna20100422assembly-dmag24gtitGBUN2NT02_u.fstat <== 711927 + 0 in total (QC-passed reads + QC-failed reads) 703539 + 0 mapped (98.82%:-nan%) ==> bowgnloc/dmagna20100422assembly-dmag24gtits14l4r8_1.fstat <== 1279698 + 0 in total (QC-passed reads + QC-failed reads) 891073 + 0 mapped (69.63%:-nan%) 99038 + 0 properly paired (7.74%:-nan%) 883424 + 0 with itself and mate mapped 512354 + 0 with mate mapped to a different chr ==> bowgnloc/dmagna20100422assembly-dmag24gtits14l4r8_u.fstat <== 1502088 + 0 in total (QC-passed reads + QC-failed reads) 1271784 + 0 mapped (84.67%:-nan%) ================================ Note that d. dmag24gtits14l has much lower genome map rate than others, unsure why. This is early 454 titanium (longer) read set than GBUN set. May include greater read error rate (400-1000bp reads), and has longest pair insert sizes 17kb, versus 2kb insert, 200-300bp reads of pairs flx set or 8kb inserts, 400-1000bp reads of GB titan set. Note that while the reads are mate paired on input, resulting machine output with errors limits detection of mate linker sequences of Roche 454 to under 50% by the SffToCA software (Roche's Nimblegen likely does better). Eg. for input sff GBUN2NT01.sff output fragments dmag24gtitGBUN2NT01.frg linker TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG (Titanium) INPUT numReadsInSFF 568937 ------- LINKER not examined 5217 none detected 156046 inconsistent 28678 partial 196240 good 182756 ------- OUTCOME fragment 352286 mate pair 182756 deleted inconsistent 28678 deleted duplicate 5209 deleted too short 8 deleted N not allowed 0 ------- 568937 ================================= sff conversion: http://wgs-assembler.sourceforge.net/wiki/index.php/SffToCA needed options: -linker titanium|flx ; need both for diff libs linker requires -insertsize per lib, see dmag24sff.pairins from assembly.summary -trim chop is recommended, default used 1st pass (-trim hard) caopts="-linker flx -linker titanium -trim chop" $cabin/sffToCA $caopts -libraryname dmag24g -output dmag24g$pt -insertsize $pins $pt.sff then convert frg to fastq parts: frg2fq.pl *.frg env lib=dmag24g cabin=$cabin perl -ne \ 'BEGIN{ $libna=$ENV{lib}||"dmag24g"; $outna=$ENV{out}||$libna; $caopts=$ENV{caopts}||"-linker titanium -linker flx -trim chop"; $cabin=$ENV{cabin}||""; } if(m/^libraryName\s+(\S+)/) { $ln=$1; } elsif($ln and ($t,$v)=m/^pairDistance(\w+)\s(\d+)/) { push @{$lnv{$ln}},$v; } BEGIN{ $libna=$ENV{lib}||"dmag24g"; $outna=$ENV{out}||$libna; } sub runsff2ca { my($pt,$libfile,$insa,$insd)=@_; my $cmd="${cabin}sffToCA $caopts -libraryname $libna -output $outna$pt -insertsize $insa $insd $libfile"; warn("# $cmd\n"); if(-f $libfile) { $ok= system($cmd); } else { warn "#err in params\n"; $ok=0; } return $ok; } END{ for $ln (sort keys %lnv) { $pt=$ln; $pt=~s/.sff//; $pt=~s/pairs.//; @ins= @{$lnv{$ln}}; print "pt=$pt; pins=\"@ins\"; \n"; runsff2ca($pt,$ln,@ins) if($cabin); } }' \ dmagna20100422assembly.summary \ > dmag24sff.pairins