EvidentialGene for Bemisia tabaci whitefly, 2016/17 The EvidentialGene assembly of Bemisia tabaci whitefly genes, is updated from one I did in 2012. This update is more accurate and complete by objective orthology measures than recently available genome gene models for Bemisia tabaci. This Evigene assembly is "reference free", assembled directly from RNA-seq without using chromosomes for modeling genes. Bemisia tabaci whitefly (cotton/crop plant pest) Gene sets compared for reference proteins & expression Reference: Pea aphid Fruit fly RNA-Introns Geneset Found% AlnT% Found% AlnT% Found% BtEvigene 81.2 88.0 74.1 74.9 68.5 BtNCBI 79.7 82.3 73.4 71.6 69.4 BtMaker 77.4 73.8 72.1 66.0 57.7 BtTrinity 73.5 59.2 68.0 53.2 50.5 ---------------------------------------------- http://arthropods.eugenes.org/EvidentialGene/arthropods/whitefly/whitefly3evigene/ Evigene methods are doing very well in comparison to current popular gene reconstruction methods: Trinity-only RNA assembly, PacBio RNA assembly, MAKER genome gene modeling, NCBI EGAP/RefSeq modeling, Ensembl gene modeling, for animals and plants including arabidopsis, maize, pine trees, mosquitos, honey bee, beetles, water fleas, ticks, fishes, and mice. Reconstruction from RNA only provides independent gene evidence, free of errors and biases from chromosome assemblies and other species gene sets. Not only are the easy, well known ortholog genes reconstructed well, but harder gene problems of alternate transcripts, paralogs, and complex structured genes are usually more complete from Evigene methods. See this recent work at http://arthropods.eugenes.org/EvidentialGene/ For the genome-sleuths among you, here is a puzzle: There are scores of whitefly RNA-expressed genes with near perfect nucleotide identity to some plant genomes, including cotton plant, yet about 30 have good protein alignment to pea aphid and other insect genes. Most are fully located on both whitefly genome and plant genome assemblies. For example this one is found in both whitefly and cotton genomes with high identity, and has pea aphid homolog: Bemtab3dEVm002101t1 931 aa, transcript aligns 99% to whitefly chromosomes, 66% aligns at 99% identity to Gossypium hirsutum chromosomes, and protein aligns 92% to pea aphid ncbi:XP_008188404.1, a zinc finger prot. Who should consider EvidentialGene for gene reconstruction? * genomicists who want accurate, complete and objectively reconstructed genes, including those of you who may not believe my claims, but will look at objective results on this. * model and well-supported genome projects, where curators can use these to improve precision of high value gene information. * new species genomes, use as a primary gene set, with alternate transcripts, and/or assess gene predictions, chromosome assemblies for accuracy. * gene/genome improvement projects, to add alternate transcripts, un-discovered and fragmented gene models. * transcriptome and expression projects for more accurate genes. One of my goals with this work is to reconstruct many high-value (model, otherwise) animal and plant gene sets in coming years. I welcome collaborations, especially from groups with genomics + informatics expertise. This methodology is highly automatable (think BIG DATA), but still wants improvements. Species genes built with Evigene by independent authors include a range of plants and animals, and several of these papers provide independent reviews of Evigene versus other methods. -- Don Gilbert, 2017 june gilbertd at indiana.edu