| ||||||||
EvidentialGene gene construction for Zea mays corn plantEvidentialGene gene set for maize is more accurate and complete than other published maize gene sets, measured by orthology. A quality comparison of Evigene genes ranks these above corn gene sets of Gramene (Ensembl/Maker models) and PacBio gene assemblies (2016), NCBI gene models, and JGI gene assemblies, for primary ortholog loci, and also for alternate transcripts and duplicate genes. See evigene_maize_info/corngenes_qualsum/. These Evigene sets are in-progress draft status on 2016-10, and may be improved. .. Don Gilbert, 18 Oct 2016, gilbertd at indiana.edu
Maize corn plant genes reconstructed with EvidentialGeneEvidentialGene is a genome informatics project/pipeline for gene set construction that has a measurably high accuracy and completeness rate, compared with other gene informatics methods used for animals and plants. See eugenes.org/EvidentialGene/ or sourceforge.net/projects/evidentialgene/Gene orthology accuracy and completeness, measured with protein homology to reference species genes, for gene sets of Zea mays (corn plant) are summarized here for EvidentialGene in comparison with other gene sets of maize gene/genome Zea_mays.AGPv4 by Gramene/ENSembl, B73_RefGen_v3 by NCBI, and genes assembled with Rnnotator by JGI. EvidentialGene gene assembly uses the same published RNA-seq, assembles it with 4 gene assemblers, then reduces to a concise and accurate locus/alternate gene set. In these tests, Evigene produced the more accurate gene sets, with minimal time and effort.
Zea_mays gene sets comparedZea_mays x REFERENCE Arabidopsis thal. model (Araport 2015 version, ngene=28902) Evigene5 Gramene4 NCBIRef3 JGI14denovo Found 80.9% 80.4% 80.6% 79.2% Align 91.7% 89.3% 89.0% 84.4% AlignAA 428 412 405 388 Zea_mays x REFERENCE Sorghum (Sbicolor_313 v3.1 of JGI Phytozome, ngene=31054) Evigene5 Gramene4 NCBIRef3 JGI14denovo Found 83.4% 82.0% 80.0% 77.3% Align 93.8% 91.7% 89.7% 82.4% AlignAA 436 419 409 381 Component assemblers used for Evigene x Sorghum REFERENCE Velv/Oases idba_tran SOAPtrans Trinity Found 80.0% 78.6% 78.0% 77.8% Align 88.6% 86.0% 86.0% 83.7% AlignAA 413 398 400 388 Maize gene sets compared: a. Evigene5 evg5corn, genes de-novo assembled and classified with Evigene methods using four gene assemblers and 3 Illumina RNA-seq sets (JGI-2014 PRJNA168080, CSHL-2016 PRJEB10406, UCBerkeley-2016 PRJNA306885) The ohnolog/paralog loci are resolved with locations on chromosome assembly of (c). Gene loci=50963, mRNA=231177 (a0. evg4corn loci=42597) b. Gramene4 = gene set Zea_mays.AGPv4.32 from Gramene/Ensembl, 2016, MAKER modelled on chr assembly. Gene loci=39310, mRNA=149669 c. NCBIRef3, genes/genome release B73_RefGen_v3, 2013, from NCBI reference genomes. Gene loci=39873, mRNA=58277 d. JGI14_denovo_maize, genes assembled with Rnnotator from JGI, doi:10.1038/srep04519, 2014, RNA-seq from maize seedling, 250 M Illumina pairs. Gene loci=133756, mRNA=187045
Statistics:
Evigene ref: Gilbert, Donald (2013) Gene-omes built from mRNA seq not genome DNA.
Case: Duplicate genes from chromosome duplicationReliable homeologous genes (ohnologs) in maize that are conserved with single loci in rice, sorhgum and Arabidopsis are identifed by Schnable et al. (doi:10.1073/pnas.1101368108). These are 1750 paired-loci, each of pair on a separate chromosome (3500 loci). Of these, 1661 paired-loci are identified in corn gene sets via alignment to sorhgum loci. See further Details in corn ohnolog and alternate transcript reconstruction.Zea_mays Ohnologs x REFERENCE Sorghum Evigene NCBIv3 JGI14 Found 3201 3218 3111 Miss1 25 66 83 Mixup 26 0 200 Align 86.8% 87.9% 83.7%Sorghum n=1661, corn loci n=3322 Found = contains ohnolog gene model (align >= 25%) Miss1 = missing locus model that other two gene sets contain Mixup = transcripts on separate chromosomes classed as alternates of one locus Align = % alignment of target proteins sets to reference proteins
Case: Mediator of RNA polymerase II transcription subunit gene familyMediator of RNA polymerase II transcription subunit genes are a well-conserved, animal and plant gene family of 25 to 30 loci, ranging in size from 2000 aa to 100 aa. Arabidopsis ref contains 44 of these loci, though several are of uncertain or weak association. Of those, 36 are found in Sorghum reference set, and 36 across all Maize gene sets, at >= 25% protein alignment identity. Generally these are all well-expressed, housekeeping genes, and all gene sets should be able to find and assemble/model them. In gene set comparisons, this is not always so, modellers or assemblers are prone to miss some. The shorter ones may be joined to ends of other genes, the longer may be partly assembled/modelled.
------ Gene Sets ----- --- Gene Assemblers --- Stat. Evigene NCBIv3 JGI14 Oases IDBA SOAP Trinity Found 36 31 32 36 36 34 34 Align% 92 82 79 92 86 86 83 AlignAA 504 423 422 500 478 480 456
-- Don Gilbert, gilbertd at_indiana_edu |