euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/plants/pine

      Name                        Last modified       Size  

[DIR] Parent Directory 16-Dec-2016 22:19 - [TXT] evigene_pinetree_genes.html 21-Dec-2016 14:55 7k [DIR] pine_evigene201308/ 16-Dec-2016 22:15 - [DIR] refprot_align/ 20-Dec-2016 13:10 -

Pine tree genes, EvidentialGene and Other methods

Pine tree gene sets, EvidentialGene compared with other methods

The EvidentialGene recipe for reconstructing accurate gene sets works well in comparison to these other ways: genes modeled on genome, genes assembled with Trinity from Illumina read pairs, genes assembled from PacBio longer, but less accurate reads, for this example of pine trees. Evigene methods in summary: start with >=100 mill. Illumina read pairs, use several gene assemblers, varying k-mer and other options, then reduce this over-assembly to a species-accurate gene set with Evigene-R pipeline. See eugenes.org/EvidentialGene/ or sourceforge.net/projects/evidentialgene/

Conserved gene accuracy and completeness is measured with protein homology to reference species genes, for gene sets of Pine tree species (loblolly, sugar and 3 other pine trees), and summarized here for EvidentialGene methods in comparison with transcript assembly and genome predicted gene sets of treegenesdb.org, and from GenBank-TSA.

Pine tree gene sets compared

      REFERENCE       Arabidopsis           Grape (Vitus vin.)
Geneset      Rank Found% Align% AlignAA   Found% Align% AlignAA  Source/Method
Pta.Tra13Evg   1   97.8   80.9   374.1     98.5   82.7   428.2   Evigene of multi-kmer, 4 assemblers, Illumina, 2013 
Pta.Gmod1v     5   89.5   70.5   318.0     89.3   71.0   361.6   Genome gene models, 2016
Pta.Tra16Pb    8   85.4   67.1   289.3     84.5   66.7   314.2   PacBio asm, 2016
Pla.Tra16IlPb  3   88.7   75.3   351.7     90.2   76.7   396.1   mix of Trinity/Illumina and PacBio asm, 2016 
Pla.Gmod1v     5   87.5   70.1   317.4     86.9   70.0   356.4   Genome gene models, 2016
Ppa.Tra15Evg   1   95.5   80.4   375.6     96.9   82.4   426.8   Evigene of multi-kmer, 3 assemblers, Illumina, 2015  
Pca.Tra15Il    3   92.0   74.7   344.6     94.7   77.1   393.5   Trinity of Illumina, 2015
Pal.Tra15Il    7   81.3   67.6   344.5     84.7   70.8   397.5   Trinity of Illumina 

Source/Method Ranked by gene set completeness

Geneset     Rank DiffAln%    Source/Method
Pta.Tra13Evg   1     0   Evigene of multi-kmer, 4 assemblers, Illumina; LPG.2013
Ppa.Tra15Evg   1     0   Evigene of multi-kmer, 3 assemblers, Illumina; TSA.GECO 2015
Pca.Tra15Il    3    -6   Trinity 1-kmer asm of Illumina pairs; TSA.GBLJ 2015
Pla.Tra16IlPb  3    -6   mix of Trinity and PacBio asms; TSA.GEUZ, SPG.2016, 
Pta.Gmod1v     5   -11   Genome gene models, LPG.2015, v1.1
Pla.Gmod1v     5   -11   Genome gene models, SPG.2016, v1
Pal.Tra15Il    7   -13   Trinity 1-kmer asm of Illumina, TSA.GDQR 2015
Pta.Tra16Pb    8   -15   PacBio asm, LPG.2016,  
Statistics:
Found = % reference proteins with significant alignment to test gene sets
Align = % alignment of target proteins sets to reference proteins
AlignAA = average alignment size (in aminos) to reference proteins
DiffAln = Difference in % alignment from Rank 1

Species Geneset key:

  • Pta = Pinus taeda (loblolly); PRJNA174450 for genome annotation;
    Pta.Tra13Evg = Evigene of multi-kmer (Oases,Soap) + Trinity assemblies of Pinus taeda, 2013 august at /eugenes.org/EvidentialGene/plants/pine/pine_evigene201308/publicset/
    Pta.Gmod1v = Maker gene models of Pinus taeda, 2015, v1.1 at /treegenesdb.org/ftp/Genome_Data/genome/pinerefseq/Pita/
    Pta.Tra16Pb = PacBio sequences + assemblies of Pinus taeda genes, 2016
  • Pla = Pinus lambertiana (sugar);
    Pla.Tra16IlPb = TSA.GEUZ PRJNA174450 2016; project mix of Illumina/Trinity 1-kmer and PacBio asm
    Pla.Gmod1v = Maker gene models of Pinus lamb. (sugar), 2016 at /treegenesdb.org/ftp/Genome_Data/genome/pinerefseq/Pila/
  • Ppa = Pinus patula; TSA.GECO PRJNA301922 2015; Evigene of multi-kmer, 3 assemblers, Illumina pairs from GenBank-TSA; doi:10.1186/s12864-015-2277-7
  • Pca = Pinus canariensis; TSA.GBLJ PRJNA255888 2015; Trinity 1-kmer asm of Illumina pairs from GenBank-TSA
  • Pal = Pinus albicaulis; TSA.GDQR PRJNA294917 2015; Trinity 1-kmer asm of Illumina pairs from GenBank-TSA
  • REFERENCES
    Arabidopsis thaliana model plant, Araport 2015 version, nprotein=nnnnn, nloci=28902
    Vitis vinifera grape, NCBI RefSeq 2014 version, nprotein=35618, nloci=nnnnn
Don Gilbert, gilbertd at_indiana_edu
update: 18 Dec 2016


Developed at the Genome Informatics Lab of Indiana University Biology Department