Index of /EvidentialGene/vertebrates/pig/pig18evigene/publicset
Name Last modified Size
Parent Directory 17-May-2019 15:32 -
pig18evigene_m4wf.ann.txt.gz 28-Jul-2018 15:44 36.3M
pig18evigene_m4wf.genesum.txt 30-Jul-2018 19:49 2k
pig18evigene_m4wf.mainalt.tab.gz 30-Jun-2018 15:01 7.4M
pig18evigene_m4wf.mrna.gmap.gff.gz 02-Nov-2018 14:38 148M
pig18evigene_m4wf.pubids.gz 28-Jul-2018 15:52 31.3M
pig18evigene_m4wf.public_aa.fa.gz 28-Jul-2018 00:18 109M
pig18evigene_m4wf.public_cds.fa.gz 28-Jul-2018 00:18 193M
pig18evigene_m4wf.public_mrna.fa.gz 28-Jul-2018 00:18 280M
pig18evigene_m4wf.tsadesc.cmt 03-Jul-2018 21:10 1k
pig18evigene_m4wf.xcull_aa.fa.gz 30-Jun-2018 15:10 32.7M
pig18evigene_m4wf.xcull_cds.fa.gz 30-Jun-2018 15:12 52.1M
pig18evigene_m4wf.xcull_mrna.fa.gz 30-Jun-2018 15:07 79.3M
pig18evigene_readme.txt 31-Jul-2018 13:35 3k
Pig gene set improvement with EvidentialGene using its new SRA2Genes pipeline.
This SRA2Genes pipeline collects several EvidentialGene methods into a
complete, automated gene set reconstruction pipeline for fetching
public RNA-seq gene pieces from NCBI SRA, over-assembling that into many
millions of gene models, varying assembly methods and data slices, then
reducing the over-assembly by to its most accurate non-redundant coding
gene loci and alternates, followed by annotation with reference/related
species proteins and gene names, with checks for contaminants, and
formatting of gene sequence sets to publication quality for public database
submission.
Preliminary pig18evigene gene set info is at
http://eugenes.org/EvidentialGene/vertebrates/pig/
The Evigene software package including omnibus evgpipe_sra2genes.pl is available at
http://arthropods.eugenes.org/EvidentialGene/other/evigene_old/
Completeness and accuracy comparisons are to NCBI RefSeq gene set of
the pig, modeled on chromosome assembly. Evigene set is built from
RNA assembly only, without using chromosomes or other species genes to
reconstruct. Those gene evidences are used for validating and
reclassifying the RNA constructs.
TABLE G3. Sus_scrofa gene sets compared for gene evidence recovery
G3a. Conserved vertebrate genes in pig gene sets (BUSCO v9)
Geneset Align Full Frag Miss Best
-------------------------------------------
Evigene 447 2568 10 8 776 (30%), 1730 same (67%)
NCBI 440 2567 2 17 80 ( 3%)
Ensembl 431 2552 20 14 na
---------------------------------------------
G3b. Reference Human (Homo_sapiens, NCBI 2018 RefSeq)
Geneset Found Align Frag Best
------------------------------------------
Evigene 99.3% 96.0 1.7 20 55% equal
NCBI 99.3% 97.2 0.6 25
------------------------------------------
for 37,883 human protein isoforms that are uniquely found in either pig gene set
The G3a scores are measured against BUSCO verebrate subset of OrthoDB v9. The Align
score is average alignment to conserved (ancestral) proteins, and
Compl/Frag/Miss are complete, fragment and missing statistics from BUSCO
calculation of HMM search for those anscestral vertebrate one-copy genes. Align = average
alignment (aa) to ref proteins, Full = Complete align to conserved proteins, Frag =
fragment alignment, Miss = no alignment, Best = percentage of best alignments per gene set
in pairwise matches to each reference gene.
A more complete orthology assessement (G3b) is done using 4 vertebrates,
human, mouse, cow and zebrafish, all drawn from NCBI's RefSeq models. Although
any single gene set can be presumed to have mistakes, cross-species
alignments infer the biological accuracy, there should be no correlation
between species for the errors, esp. for the Evigene set that did not use
any cross-species models for reconstruction.
|