Index of /EvidentialGene/plants/pine/pine_evigene201308
Name Last modified Size
Parent Directory 20-Dec-2016 12:57 -
pine9genesets_homolstats.txt 09-May-2018 15:19 4k
pine9set.info 16-Dec-2016 21:50 1k
pine9set.stats 16-Dec-2016 22:09 6k
pine_evgTv1pub130827_readme.txt 21-Feb-2021 23:10 2k
publicset/ 21-Feb-2021 23:10 -
Pine tree gene sets, EvidentialGene compared with other methods
This Evigene reconstruction of Loblolly pine tree genes, of 2013 august
is pine_evigene201308/publicset/
The EvidentialGene recipe for reconstructing accurate gene sets works well in comparison to
these other ways: genes modeled on genome, genes assembled with Trinity from Illumina read pairs,
genes assembled from PacBio longer, but less accurate reads, for this example of pine trees.
Evigene methods in summary: start with >=100 mill. Illumina read pairs, use several gene assemblers,
varying k-mer and other options, then reduce this over-assembly to a species-accurate gene set
with Evigene-R pipeline.
Conserved gene accuracy and completeness is measured with protein
homology to reference species genes, for gene sets of Pine tree species
(loblolly, sugar and 3 other pine trees), and summarized here for
EvidentialGene methods in comparison with transcript assembly and genome
predicted gene sets of treegenesdb.org, and from GenBank-TSA.
Pine tree gene sets compared
REFERENCE Arabidopsis Grape (Vitus vin.)
Geneset Rank Found% Align% AlignAA Found% Align% AlignAA Source/Method
Pta.Tra13Evg 1 97.8 80.9 374.1 98.5 82.7 428.2 Evigene of multi-kmer, 4 assemblers, Illumina, 2013
Pta.Gmod1v 5 89.5 70.5 318.0 89.3 71.0 361.6 Genome gene models, 2016
Pta.Tra16Pb 8 85.4 67.1 289.3 84.5 66.7 314.2 PacBio asm, 2016
Pla.Tra16IlPb 3 88.7 75.3 351.7 90.2 76.7 396.1 mix of Trinity/Illumina and PacBio asm, 2016
Pla.Gmod1v 5 87.5 70.1 317.4 86.9 70.0 356.4 Genome gene models, 2016
Ppa.Tra15Evg 1 95.5 80.4 375.6 96.9 82.4 426.8 Evigene of multi-kmer, 3 assemblers, Illumina, 2015
Pca.Tra15Il 3 92.0 74.7 344.6 94.7 77.1 393.5 Trinity of Illumina, 2015
Pal.Tra15Il 7 81.3 67.6 344.5 84.7 70.8 397.5 Trinity of Illumina
Source/Method Ranked by gene set completeness
Geneset Rank DiffAln% Source/Method
Pta.Tra13Evg 1 0 Evigene of multi-kmer, 4 assemblers, Illumina; LPG.2013
Ppa.Tra15Evg 1 0 Evigene of multi-kmer, 3 assemblers, Illumina; TSA.GECO 2015
Pca.Tra15Il 3 -6 Trinity 1-kmer asm of Illumina pairs; TSA.GBLJ 2015
Pla.Tra16IlPb 3 -6 mix of Trinity and PacBio asms; TSA.GEUZ, SPG.2016,
Pta.Gmod1v 5 -11 Genome gene models, LPG.2015, v1.1
Pla.Gmod1v 5 -11 Genome gene models, SPG.2016, v1
Pal.Tra15Il 7 -13 Trinity 1-kmer asm of Illumina, TSA.GDQR 2015
Pta.Tra16Pb 8 -15 PacBio asm, LPG.2016,
Statistics:
Found = % reference proteins with significant alignment to test gene sets
Align = % alignment of target proteins sets to reference proteins
AlignAA = average alignment size (in aminos) to reference proteins
DiffAln = Difference in % alignment from Rank 1
Species Geneset key:
Pta = Pinus taeda (loblolly); PRJNA174450 for genome annotation;
Pta.Tra13Evg = Evigene of multi-kmer (Oases,Soap) + Trinity assemblies of Pinus taeda, 2013 august
at /eugenes.org/EvidentialGene/plants/pine/pine_evigene201308/publicset/
Pta.Gmod1v = Maker gene models of Pinus taeda, 2015, v1.1
at /treegenesdb.org/ftp/Genome_Data/genome/pinerefseq/Pita/
Pta.Tra16Pb = PacBio sequences + assemblies of Pinus taeda genes, 2016
Pla = Pinus lambertiana (sugar);
Pla.Tra16IlPb = TSA.GEUZ PRJNA174450 2016; project mix of Illumina/Trinity 1-kmer and PacBio asm
Pla.Gmod1v = Maker gene models of Pinus lamb. (sugar), 2016
at /treegenesdb.org/ftp/Genome_Data/genome/pinerefseq/Pila/
Ppa = Pinus patula; TSA.GECO PRJNA301922 2015; Evigene of multi-kmer, 3 assemblers, Illumina pairs from GenBank-TSA;
doi:10.1186/s12864-015-2277-7
Pca = Pinus canariensis; TSA.GBLJ PRJNA255888 2015; Trinity 1-kmer asm of Illumina pairs from GenBank-TSA
Pal = Pinus albicaulis; TSA.GDQR PRJNA294917 2015; Trinity 1-kmer asm of Illumina pairs from GenBank-TSA
REFERENCES
Arabidopsis thaliana model plant, Araport 2015 version, nprotein=nnnnn, nloci=28902
Vitis vinifera grape, NCBI RefSeq 2014 version, nprotein=35618, nloci=nnnnn
Don Gilbert, gilbertd at_indiana_edu
update: 18 Dec 2016
|