Index of /EvidentialGene/plants/cacao/genes
Name Last modified Size
Parent Directory 07-Apr-2019 18:39 -
TCacao_Genome_data.readme.txt 25-May-2013 14:29 2k
cacao11genes_pub3i.aa.gz 08-Mar-2012 23:47 10.7M
cacao11genes_pub3i.attr.tbl.gz 24-Mar-2012 14:35 6.0M
cacao11genes_pub3i.attr.txt 24-Mar-2012 14:36 24.0M
cacao11genes_pub3i.cds.gz 01-Mar-2012 20:13 16.6M
cacao11genes_pub3i.gff.gz 08-Mar-2012 23:25 27.9M
cacao11genes_pub3i.good.aa.gz 08-Mar-2012 23:47 8.2M
cacao11genes_pub3i.good.cds.gz 01-Mar-2012 20:26 12.3M
cacao11genes_pub3i.good.gff.gz 11-Jul-2012 15:17 21.7M
cacao11genes_pub3i.good.ids 02-Mar-2012 14:50 737k
cacao11genes_pub3i.good.shortaa 23-Aug-2012 21:40 1k
cacao11genes_pub3i.good.tr.gz 01-Mar-2012 20:23 20.3M
cacao11genes_pub3i.readme 23-Aug-2012 21:58 7k
cacao11genes_pub3i.stats.txt 09-Mar-2012 14:42 3k
cacao11genes_pub3i.tr.gz 01-Mar-2012 20:10 27.6M
ensemblplants_cacao_about.txt 15-Jun-2014 14:29 8k
genbank_submit -
genome/ 08-Apr-2019 15:41 -
phytozome_8/ 23-Aug-2012 21:43 -
versions/ 08-Apr-2019 15:38 -
Theobroma cacao public gene set pub3i
08 March 2012, D. Gilbert, gilbertd at indiana edu
Gene class counts
21806 Class:Strong : >= 66% expression/homology evidence
5101 Class:Medium : >= 33% expression/homology evidence
2691 Class:Weak : >= 5% evidence, worth considering if more evidence turns up
13035 Class:Transposon : >= 33% transposon and no/weak expression
3465 Class:Poor : mixed bag of partial models
550 Class:None : no evidence
14977 Class:AltStrong : alternate transcripts from EST/rna assemblies
20 Class:AltMedium
pub3i.good.ids main=29408 alts=14996 include Class:(Strong|Medium|Weak) and Alt transcripts
First transcript ID ends with 't1' but isn't always the best of alternates.
pub3i.good is corrected from pub3h,pub3g for CDS-exon errors: off-by-1, missing strand,
partly mangled proteins, from transcript-gene-assembly software weak on CDS/protein methods.
pub3h>3i: nupdate=570, ndrop=108
34 changemrna (CDS/exon/protein changes)
32 renamelocus ; 8 renamealt
491 newgoodlocus ; 5 other = newgoodlocus (shifted from notgood to good subset)
73 dropoverlaplocus; 21 droplocus ; 14 other = drop
pub3g>3h:
2826 updated transcripts: 785 with CDS exon changes (333 main, 452 alts), 400 altered proteins,
1641 strand additions, 41 dropped records.
CDS sequences in good set translate to protein sequences. There are CDS mismatches in non-good set.
------------------------------------------------------------
2012.07.11: cacao11genes_pub3i.good.gff corrected 1 record transcript ID=Thecc1EG015900t2
with "puevd3b" in score column instead of numeric score.
2012.08.23: cacao11genes_pub3i.good.shortaa contains 29 proteins with size < 40aa,
excluded from cacao11genes_pub3i.good.aa. No significant homology is apparent,
these are likely from either non-coding genes or gene fragments.
------------------------------------------------------------
The cacao mitochondrial genome and associated genes,
M16_mito_v1.0, have been withdrawn from public use for now (6 Dec 2011).
This includes 214 genes mostly of Class Strong (112) or Medium, about 40 are 1-1 orthologs
to other tested plant gene sets. The IDs for these are in cacao11genes_pub3g.mitoremoved.ids
------------------------------------------------------------
Gene data files:
cacao11genes_pub3i.aa protein fasta
cacao11genes_pub3i.cds coding dna
cacao11genes_pub3i.tr transcript dna
cacao11genes_pub3i.attr.txt gene annotation table (tabbed)
cacao11genes_pub3i.gff gene location/annotation format
cacao11genes_pub3i.good.ids IDs of Class:Strong|Medium|Weak (Alt included)
cacao11genes_pub3i.good.{aa,tr,cds} fasta subset of Class:Strong|Medium|Weak
Annotation fields in gene.attr.txt. Same values are in mRNA lines of gene GFF.
transcriptID Thecc1EG000002t1 Thecc1EG000005t1
geneID Thecc1EG000002 Thecc1EG000005
isoform 1 1
quality1 Class:Strong Class:Strong
quality2 Express:Strong Express:Strong
quality3 Homology:OrthologStrong Homology:OrthologStrong
quality4 Intron:Strong Intron:Strong
quality5 Protein:complete Protein:complete
aaSize 205 1269
cdsSize1 62% 77%
cdsSize2 618/977 3810/4930
Name1 Cystathionine beta-synthase.. Kinesin-like calmodulin-binding..
Name2 82%T 74%T
oname1 Uncharacterized protein Uncharacterized protein
oname2 87%U 77%U
groupname Cystathionine beta-synthase Kinesin-like calmodulin-binding..
Dbxref1 TAIR:AT5G10860.1 TAIR:AT5G65930.2
Dbxref2 82% 74%
ortholog1 frave:gene01181 ricco:29682.m000589
ortholog2 87% 83%
paralog1 Thecc1EG034062t1 Thecc1EG000957t1
paralog2 51% 12%
uniprot1 UniRef50_B9I794 UniRef50_B9GJK9
uniprot2 87% 77%
genegroup1 PLA9_G6641 PLA9_G3639
genegroup2 1/11/9 1/13/9
cacaoGD09 CGD0000016/C99.77 na
cacaoTCR1 na Tc01_t000060/C99.83
intron1 100% 100%
intron2 10/10 46/46
express1 94% 82%
express2 75 99
estgroup LeafPistil LeafPistil
location scaffold_1:7897-10405:+ scaffold_1:17413-27097:+
oid rna8b:r8L_g13025t00001 mar7g.mar11f:AUGepir7p1s1g7t1
score 7946 40120
Guide to cacao Evigene annotation table columns and GFF mRNA attributes:
transcriptID (ID in gff mRNA)
geneID (gene in gff mRNA, is Parent= to mRNA)
isoform : alternate transcript number if > 1, matches ID suffix (t2,t3...)
quality : evidence quality values for Expression Homology Intron Protein
aaSize : protein aa length
cdsSize : percent of transcript, cds length / transcript length
Name : homology-derived gene name, P:Plant9 family, U:UniProt or T:TAIR,
with percent align (88%P, 62%T, 74%U)
oname : other name (from next best classifier above)
Dbxref : cross reference gene IDs to TAIR, UniProt
express : expressed span as percent of transcript
estgroup : has significant expression from tissue groups Leaf,Pistil and/or Bean
ortholog : protein orthology percent identity, and protein IDs
paralog : protein paralogy percent identity, and gene ID
genegroup : gene family ID from Orthomcl grouping of 9 plants
genegroup2 : 1/11/9 found 1 cacao gene / 11 plant genes / 9 plant species (of 9 max)
cacaoGD09 : equivalent Cacao CGD (Mars v0.9) gene
cacaoTCR1 : equivalent Cacao Tc (Cirad v1.0) gene
intron : evidence intron splices matched (10/10 for 5 matched introns)
location : genome location
oid : original model ID
score : evidence score sum
scorevec : evidence score vector
Quality notes:
Values are generally Strong/Medium/Weak/None
Class: gene quality class as sum of evidence parts; Transposon, Poor special classes
Express: Strong/Medium/Weak for percent of transcript with expression
Homology: Ortholog if best match is other species, Paralog for this species
Protein: complete or partial
Intron: Strong/Medium/Weak depending on % and total of splice sites matched
|