Arthropod ARPx13, non-insect-centric protein gene sets Draft3, 24 Feb 2012 by D. Gilbert, gilbertd at indiana edu ref http://arthropods.eugenes.org/arthropods/orthologs/ARP3x/ ======= orthomcl gene groups draft 3 ================ Common gene families presence, ncommon=5112, min taxa=10 Species Have Miss daphmag 4997 115 daphplx 4968 144 tribol 4967 145 locust 4952 160 wasp 4952 160 drosmel 4916 196 zfish 4888 224 human 4887 225 aphid 4830 282 ixodes 4576 536 tetur 4358 754 shrimp 4325 787 dogtick 2407 2705 ----------------------- Clade presence for gene families, ncommon=22426, # dropped 2/5 insects for balance, -locust -aphid Clade Only Miss OutAny OutOnly OutMiss Crust 101 580 6793 144 213 Tick 64 1171 6281 69 471 inSect 519 1683 6432 157 340 -------------------------------------------- OutAny = 1+ species in clade has outgroup gene family; OutOnly= 2+ species in clade have outgroup gene family, none of other clades have. OutMiss= no species in clade has outgroup, both other clades have. Only = all species in clade have family, none of other clades have Miss = no species in clade has family, both other clades have Summary gene group counts, using orthomcl clustering of reciprocal best hit blastp species inGene oGene nGroup Uniq1 UDup Orth1 OrDup OrGrp OrMis1 Guniq Gmax Gmin # Crustaceans daphmag 109520 38049 20334 na 20313 9092 8644 12354 5 7980 1703 5 daphplx 44923 27825 14456 na 10784 10410 6631 11866 10 2590 669 10 shrimp 84835 28999 13397 na 19164 5330 4505 7017 122 6380 727 121 # Ticks/Chelicerata ixodes 20249 11817 8594 na 2079 7239 2499 7945 110 649 263 110 tetur 17072 11194 6685 na 4000 5227 1967 5937 147 748 217 148 dogtick 86879 41142 11157 na 31914 4076 5152 4865 1189 6292 385 1176 # still very poor # Insects aphid 31962 24954 9724 na 8848 6778 9328 8068 43 1656 530 42 drosmel 14289 11523 8449 na 2519 6925 2079 7627 38 822 191 38 locust 83837 26797 14280 na 14667 6826 5304 8705 15 5575 750 15 tribol 16985 12523 8919 na 2119 7584 2820 8429 29 490 231 29 wasp 24296 18662 9605 na 7544 7480 3638 8259 23 1346 248 23 # Outgroups human 21830 18820 11829 na 2310 8282 8228 11089 61 740 1088 12 zfish 24150 19916 11777 na 3130 8139 8647 11171 84 606 1349 14 ------------------------------------------------------------------------------------------------------ inGene = number of input genes (including fragments for some species) oGene = number of genes with reciprocal best hits used by orthomcl nGroup = number of gene family groups (2+genes), orthology + species-unique OrGrp = count of ortho groups (nGroup = OrGrp + unique paralog groups) Uniq1 = species-unique single gene UDup = species-unique duplicated paralog genes Orth1 = count of single ortho gene OrDup = count of duplicated ortho gene OrMis1 = groups missing gene all others have (ignoring human) ======= Data sources for arpx13 gene groups ================ Crustacea/ daphplx_evg10jgi6b_cd.aa. : n=44923 Daphnia pulex 2010, http://arthropods.eugenes.org/EvidentialGene/daphnia/daphnia_genes2010/ daphmag2_estvel5asm.aa : n=109520 Daphnia magna 2011, pre-release gene set shrimp3velaug_cd.aa : n=84835 Pandalus latirostris, http://www.ncbi.nlm.nih.gov/pubmed/22016807 Ticks/Chelicerata/Acari ixodes2011v11_cd.aa : n=20249 Ixodes scapularis 2011, vectorbase.org mites_tetur_cd.aa : n=17072 Tetranychus urticae, http://www.nature.com/nature/journal/v479/n7374/pdf/nature10640.pdf dogtickvelcap5_cd8.aa : n=86879 Dermacentor variabilis, http://www.ncbi.nlm.nih.gov/pubmed/20060044 inSects/ aphid2bo3all_cd.aa : n=31963 pea aphid 2011, http://arthropods.eugenes.org/EvidentialGene/pea_aphid2/genes-bestof3/ drosmel.noalt.aa : n=14289 Drosophila melanogaster, ncbi refseq 2011 locust1vel_cd2.aa : n=83837 Locusta migratoria http://www.ncbi.nlm.nih.gov/pubmed/21209894 :2010; 22 Gb PE Rnaseq trica.noalt.aa : n=16985 Tribolium cast., UniProt 2011 wasp.noalt.aa : n=24296 Nasonia vitripennis 2012, http://arthropods.eugenes.org/EvidentialGene/nasonia/ Outgroups/ human_ncbi_cd.aa : n=21830 Homo sapiens, ncbi refseq 2011 zfish_ncbi_cd.aa : n=24150 Danio rerio, zebrafish, ncbi refseq 2011 #............................................................................... _cd = cd-hit -c 0.9 filter to remove alternates, fragments and nearly same proteins. noalt = removed alternate transcripts identified from gene id ===== protein size distribution ============= n=number of unique proteins found n1000 = number >= 1000aa, n500 = no. >= 500aa; aveaa= average; max,qnt,min= max,quantiles,min size qnt = quantiles at 0.10, .25, .50, .75, .90 counts from max to min size daphmag2_estvel5asm : n=109520; n1000=1009; n500=5000; ave=146; max,qnt,min=6299 309 149 87 55 40 29 daphplx_evg10jgi6b : n=44923; n1000=2287; n500=8810; ave=341; max,qnt,min=7809 720 427 239 111 60 11 shrimp3velaug_cd : n=84835; n1000=494; n500=3232; ave=122; max,qnt,min=3281 261 116 66 46 36 11 ixodes2011v11_cd : n=20249; n1000=549; n500=2738; ave=285; max,qnt,min=4588 579 357 199 115 77 32 mites_tetur_cd : n=17072; n1000=881; n500=3789; ave=361; max,qnt,min=18253 755 467 272 119 48 20 dogtickvelcap5_cd8 : n=86879; n1000=24; n500=219; ave=91; max,qnt,min=1820 139 104 86 57 43 29 aphid2bo3all_cd : n=31963; n1000=2274; n500=8954; ave=419; max,qnt,min=20627 863 536 301 158 91 25 drosmel.noalt : n=14289; n1000=1508; n500=5268; ave=521; max,qnt,min=22971 1028 628 390 217 128 11 locust1vel_cd2 : n=83837; n1000=1279; n500=4785; ave=149; max,qnt,min=8523 329 131 74 50 39 29 trica.noalt : n=16985; n1000=1346; n500=4824; ave=447; max,qnt,min=21117 902 535 331 171 102 10 wasp.noalt : n=24296; n1000=1464; n500=5698; ave=385; max,qnt,min=16711 786 479 265 138 90 9 human_ncbi_cd : n=21830; n1000=2682; n500=8947; ave=573; max,qnt,min=33423 1099 690 426 261 157 24 zfish_ncbi_cd : n=24150; n1000=2628; n500=9095; ave=538; max,qnt,min=32757 1035 643 403 256 156 26 #...........................................................................