Index of /EvidentialGene/vertebrates/zebrafish/zebrafish17evigene/aaeval
Name Last modified Size
Parent Directory 26-Apr-2018 20:50 -
alndiff_info.txt 26-Apr-2018 17:51 2k
zf17evg_16nc-human.alndiff 26-Apr-2018 17:06 2.5M
zf17evg_16nc-human_pubmed_top.alndiff 25-Apr-2018 14:59 539k
zf17evg_17ens-human.alndiff 26-Apr-2018 17:26 2.6M
zf17evg_18nc-human.alndiff 26-Apr-2018 17:06 2.5M
zfish17evg_busco_verts.txt 28-Dec-2017 14:06 1k
zfish17evgm1_ncbi_ens.aligndiff.txt 28-Dec-2017 14:16 2k
zfish17evgm6_ncbi_ens.aligndiff.txt 25-Apr-2018 21:22 4k
Alignment difference tables, alndiff, are derived from blastp -query reference.aa -db fishgenesets.aa
where reference.aa is human gene set (NCBI RefSeq 2018 update here, 20100 loci, 113607 proteins)
and fishgenesets here are 4 zebrafish sets, Evigene 2017-Dec, Ensembl/Zfin-2018-Nov, NCBI-2016, and NCBI-2018
BUSCO summary statistics (conserved vertebrate proteins) are given in
zfish17evg_busco_verts.txt
alndiff tables have columns of pairwise alignments, and difference, to reference genes, with
summary counts at tail. Columns (tab delim) are
RefprotID Ref_aasize EvigeneID,alignaa Diff_aa ComparedID,alignaa Refdbx_string GeneSym nPubMed RefName
hum:NP_001157979.1 8525 vDanrer6pEVm000010t1,6116 -64 zf16nc:XP_017213272.1,6180 gid:4703,t1,NM_001164507 NEB pn:59 nebulin isoform 1
hum:NP_775871.2 8384 vDanrer6pEVm000232t5,1434 64 zf16nc:XP_017209615.1,1370 gid:283463,t1,NM_173600 MUC19 pn:19 Mucin-19
hum:NP_001092093.2 7968 vDanrer6pEVm000009t2,6295 594 zf16nc:XP_017209375.1,5701 gid:84033,t1,NM_001098623 OBSCN pn:36 obscurin isoform b
For human gene hum:NP_775871.2, protein size is 8384,
Evigene align is vDanrer6pEVm000232t5, 1434 aa alignment
NCBI model zf16nc:XP_017209615.1,1370 aa alignment
64 alignment difference (+ in favor of Evigene)
Refdbx_string is NCBI GeneID:283463, isoform t1, Refseq mRNA ID NM_173600
Gene symbol MUC19, Pubmed citation counts pn:19,
Gene name Mucin-19
Table summary stats at end give alignment averages, numbers found, and aligned >= 95%,
counts of equivalent and best fish models, counts of fragment proteins (tiny) or extra
large proteins. Dupref is count of second+ fish gene matches, where both compared fish
genes are duplicated to prior match, to other reference genes. These duplicates are
ignored for summary stats, but listed in table. Nref are total matched ref genes, minus
duplicate refs. Refhit are those found by one or other fish gene set.
Here nref + dupref = 20191 human reference genes (primary isoform for each locus).
#stat a,b=zf17evg,zfish16, nref=17486, refhit=15358, dupref=2705, score=Align,4
#stat alndiff(a-b)=13, align%(a,b)=90.8 89.3, align(a,b)=538.9 525.8, sumd=200382
#outsize[<50%,>150%] (atiny,abig,btiny,bbig)=80,0.5% 390,2.5% 192,1.2% 418,2.7%
#best (equal,abest,bbest)=7790,50.7% 6041,39.3% 1527,9.9%
#best (afound,bfound,acov95,bcov95)=15312,87.5% 15197,86.9% 10151,66% 9573,62.3%
|