euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/vertebrates/zebrafish/zebrafish17evigene/aaeval

      Name                                  Last modified       Size  

[DIR] Parent Directory 26-Apr-2018 20:50 - [TXT] zfish17evgm6_ncbi_ens.aligndiff.txt 25-Apr-2018 21:22 4k [TXT] zfish17evgm1_ncbi_ens.aligndiff.txt 28-Dec-2017 14:16 2k [TXT] zfish17evg_busco_verts.txt 28-Dec-2017 14:06 1k [TXT] zf17evg_18nc-human.alndiff 26-Apr-2018 17:06 2.5M [TXT] zf17evg_17ens-human.alndiff 26-Apr-2018 17:26 2.6M [TXT] zf17evg_16nc-human_pubmed_top.alndiff 25-Apr-2018 14:59 539k [TXT] zf17evg_16nc-human.alndiff 26-Apr-2018 17:06 2.5M [TXT] alndiff_info.txt 26-Apr-2018 17:51 2k

Alignment difference tables, alndiff, are derived from blastp -query reference.aa -db fishgenesets.aa
where reference.aa is human gene set (NCBI RefSeq 2018 update here, 20100 loci, 113607 proteins)
and fishgenesets here are 4 zebrafish sets, Evigene 2017-Dec, Ensembl/Zfin-2018-Nov, NCBI-2016, and NCBI-2018

BUSCO summary statistics (conserved vertebrate proteins) are given in

alndiff tables have columns of pairwise alignments, and difference, to reference genes, with
summary counts at tail.  Columns (tab delim) are

RefprotID     Ref_aasize  EvigeneID,alignaa      Diff_aa  ComparedID,alignaa         Refdbx_string          GeneSym  nPubMed  RefName

hum:NP_001157979.1  8525  vDanrer6pEVm000010t1,6116  -64  zf16nc:XP_017213272.1,6180  gid:4703,t1,NM_001164507	NEB   pn:59	nebulin isoform 1
hum:NP_775871.2     8384  vDanrer6pEVm000232t5,1434   64  zf16nc:XP_017209615.1,1370  gid:283463,t1,NM_173600	MUC19  pn:19	Mucin-19
hum:NP_001092093.2  7968  vDanrer6pEVm000009t2,6295  594  zf16nc:XP_017209375.1,5701  gid:84033,t1,NM_001098623	OBSCN  pn:36	obscurin isoform b

For human gene hum:NP_775871.2, protein size is 8384, 
  Evigene align is vDanrer6pEVm000232t5, 1434 aa alignment
  NCBI model zf16nc:XP_017209615.1,1370 aa alignment
  64 alignment difference  (+ in favor of Evigene)
  Refdbx_string is NCBI GeneID:283463, isoform t1, Refseq mRNA ID NM_173600
  Gene symbol MUC19, Pubmed citation counts pn:19,
  Gene name Mucin-19

Table summary stats at end give alignment averages, numbers found, and aligned >= 95%,
counts of equivalent and best fish models, counts of fragment proteins (tiny) or extra 
large proteins.  Dupref is count of second+ fish gene matches, where both compared fish
genes are duplicated to prior match,  to other reference genes.  These duplicates are
ignored for summary stats, but listed in table.  Nref are total matched ref genes, minus 
duplicate refs.  Refhit are those found by one or other fish gene set.
Here nref + dupref = 20191 human reference genes (primary isoform for each locus).

#stat a,b=zf17evg,zfish16, nref=17486, refhit=15358, dupref=2705, score=Align,4 
#stat alndiff(a-b)=13, align%(a,b)=90.8 89.3, align(a,b)=538.9 525.8, sumd=200382
#outsize[<50%,>150%] (atiny,abig,btiny,bbig)=80,0.5% 390,2.5% 192,1.2% 418,2.7%
#best (equal,abest,bbest)=7790,50.7% 6041,39.3% 1527,9.9%
#best (afound,bfound,acov95,bcov95)=15312,87.5% 15197,86.9% 10151,66% 9573,62.3%

Developed at the Genome Informatics Lab of Indiana University Biology Department