Choice of gene sets analysed for athropod gene structure are based on expert decision on best available gene sets at the time of analysis. Rather provide analyses of all species in a related group, a best representative is chosen. For Drosoophila, the model D. melanogaster has far more EST and experimental evidence than other species. For mosquitoes, the choice is less obvious. Genome annotations are more like computers than they are like vintage wines, i.e. the newer models are generally better, they do not improve as much with age. Use of Culex for a representative mosquito is based on its newer, more extensive gene annotation pipeline, and large EST data set coupled with a robust EST assembly/integration into gene set. This choice is also supported by the higher average identity of Culex genes to other arthropod gene sets. Newer gene models are using improved software, and more software programs, which has been shown to improve results over fewer (gene finding errors average out). There is also a hard to quantify effect of conservation for older genome annotations. Older genome annotations are updated from time to time, but experience says there is a degree of resistance to large changes suggested by new annotation piplelines. Rather new automatic annotations are merged with older ones in a conservative way. - Don Gilbert, 2009 Sept., Mosquito gene sets used by euGenes/Arthropods (2008 August) Culex quinquefasciatus ESTs 205275 Genome sequencing status: 04/19/2007, draft assembly available gene set: Dec 2008, cpipiens.Cpip1.2 Gene finding quality report http://www.broadinstitute.org/annotation/genome/culex_pipiens.4/GeneFinding.html Anopheles gambiae ESTs 153269 Genome sequencing status: 03/22/2002, draft assembly available gene set: Dec 2008, agambiae.AgamP3.4 Aedes aegypti ESTs 301596 Genome sequencing status: 02/11/2005, draft assembly available gene set: Jan 2008, aaegypti.AaegL1.1.gff3 Culex gene finding has these improvements to other species: 1. newer and more gene predictor methods (Augustus, Snap, PHAT, GeneID, FGeneSH) 2. newer EST assembly methods (PASA,nap) 3. Guidance from prior species gene models Gene set quality can also be measured from average protein identity to 11 other arthropod gene sets. ngroup ave-PI se-PI Aedes 10664 42.76 0.21 Anopheles 9106 35.41 0.23 Culex 11038 44.27 0.20 Data source: http://insects.eugenes.org/arthropods/orthologs/ Total gene groups: 12582 The highest identity is found for Culex, with Aedes a close second. This can reflect taxonomy, but also the computational gene prediction quality when measured on full gene sets of these related species. Culex annot notes: http://cpipiens.vectorbase.org/Genome/GeneView/?gene=CPIJ012099;db=core Genes were annotated by merging VectorBase, JCVI and BROAD predictions sets. VectorBase automatic analysis pipeline using either a GeneWise/Exonerate model from a database protein or a set of aligned cDNAs/ESTs followed by an ORF prediction. GeneWise/Exonerate models are further combined with available aligned cDNAs/ESTs to annotate UTRs (For more information see V.Curwen et al., Genome Res. 2004 14:942-50). JCVI annotations were modeled based on a combination of Augustus (M.Stanke), GlimmerHMM (M.Pertea), Twinscan (M.Brent), Snap (I.Korf), and PHAT (S.Cawley) coupled with protein and EST genome alignments (PASA (M.Campbell), Genewise (E.Birney) and nap (X.Huang)).BROAD annotations were generated based on a combination of Augustus, GeneID (R.Guigo), FGeneSH (A.Salamov), GeneWise and SNAP and completed by species specific protein alignments (BLAST). Anogam annot notes: http://agambiae.vectorbase.org/Genome/GeneView/?gene=AGAP009325;db=core Genes were annotated by merging Ensembl and TIGR predictions sets. Ensembl automatic analysis pipeline using either a GeneWise/Exonerate model from a database protein or a set of aligned cDNAs/ESTs followed by an ORF prediction. GeneWise/Exonerate models are further combined with available aligned cDNAs/ESTs to annotate UTRs (For more information see V.Curwen et al., Genome Res. 2004 14:942-50). TIGR annotations were modeled based on a combination of Augustus (M. Stanke), GlimmerHMM (M. Pertea), Genie (D. Kulp), and Twinscan (M. Brent) gene predictions, coupled with protein and EST genome alignments. Aedes annot notes: http://aaegypti.vectorbase.org/Genome/GeneView/?gene=AAEL000078 Genes were annotated by merging Ensembl and TIGR predictions sets. Ensembl automatic analysis pipeline using either a GeneWise/Exonerate model from a database protein or a set of aligned cDNAs/ESTs followed by an ORF prediction. GeneWise/Exonerate models are further combined with available aligned cDNAs/ESTs to annotate UTRs (For more information see V.Curwen et al., Genome Res. 2004 14:942-50). TIGR annotations were modeled based on a combination of Augustus (M. Stanke), GlimmerHMM (M. Pertea), Genie (D. Kulp), and Twinscan (M. Brent) gene predictions, coupled with protein and EST genome alignments.