Duplicate Genes from chromosome duplication

Reliable homeologous genes (ohnologs) in maize that are conserved with single loci in rice, sorhgum and Arabidopsis are identifed by Schnable et al 2011 (doi:10.1073/pnas.1101368108). These are 1750 paired-loci, each of pair on a separate chromosome (3500 loci). Of these, 1661 paired-loci are identified in corn gene sets via alignment to sorhgum loci.

Quality Summary of Gene Reconstruction Methods for Maize Gene Duplicates (Ohnologs)

KMER Effect on Reconstruction of Maize Gene Duplicates (Ohnologs), 4 Qualities

Alternate Transcripts of Maize Genes

Alternate transcripts may be more accurately reconstructed using several methods, as each locus and alternate set have differing properies (size, complexity, amount of shared exons, expression levels, ..) Kmer sizes for assembler have an effect on both sequence accuracy (measured by alignment identity to reference alternate transcripts) and number found (measured by found alignments to distinct reference alternate transcripts). Alternate transcript and gene duplicate reconstruction share similarities, in the problem of high identity duplicated sequences with variable expression, and in greater accuracy of large Kmer assemblies. They also differ in aspects, where alternates share large sequence spans of common exons.

For the five maize gene sets, the Evigene5 gene assembly of short Illumina reads has reconstructed substantially more reference alternate isoforms, as well as having greater alignment identity to reference proteins, than the other 4 gene sets including those modeled on chromosomes by Gramene/Ensembl and by NCBI, and genes assembled from PacBio long reads as well as JGI gene assembly of short reads.

Summary for Gene Sets in Reconstruction of Maize Gene Alternates, Sorghum ref

Summary for Gene Sets in Reconstruction of Maize Gene Alternates, Arabidopsis ref

KMER Effect on Reconstruction of Maize Gene Alternates, 4 Qualities

Conserved Ohnologs found, nRef=1697, nOhnoHit2=1453, nOhnoref=3150,
Alternates found, nRefLoci=6165 (Hit1), nAlternates=2251 (Hit2) for Sorghum reference (Sbicolor_313)
Alternates found, nRefLoci=9290 (Hit1), nAlternates=3841 (Hit2) for Arabidopsis reference (arath15)

Qualities for matching conserved ohnologs and alternates
nHit = number ohno loci found (1 or 2 copies), pHit1 = %found 1st copy, pHit2 = %found 2nd (independent) copy,
or pHit2 = %found alternates, pHit1 primary isoform, for alternate summary,
Kmer is assembler read-shred size but for combined gene sets
pIg = % identity to maize V4 chr assembly, pIr = % identity to Sorghum ref gene CDS, for found loci (nHit),
pIgr = multiple of pIg x pIr / 100, i.e. % identity on both dimensions reference genes and maize chromosomes
rpIg, rpIr, rpIgr = same as above, but relative to all ohno loci (nOhnoref).
sumPIGR = sum over genes of pIgr metric (sumPIGR/nHit = pIgr, sumPIGR/nOhnoref = rpIgr)
Perfect quality score would be 100%, except maximal %identity to Sorghum genes CDS is ~90%
Overall quality metric is rpIgr: % identity on both dimensions of ref genes and chromosomes for all conserved ohnologs

Source gene sets
Complete maize gene sets:
    Evigene5R=evg5corn (Sep.2016), NCBIv3G= NCBI-EGAP on V3chr (2014), JGI14v3R = JGI gene assembly on V3chr (2014)
    CSHL6EnsG/CSHL32s4v=Gramene/Ensemble_plants v32 (Sep.2016) on V4chr,
    CSHL6PacR=Gramene PacBio gene assembly (June.2016) on V4chr,
Evigene subset assemblies:
    idba=cornhi12m3idba, soap=cornhi8m4msoap, trin=cornhi8mtrin, velv=cornhi8m9agvelv

