Arabidopsis thaliana gene set reconstructed with EvidentialGene METHODS Component over-assembly gene sets from de-novo assemblies of 3 RNA-seq sources, without use of chromosomes, or other species genes ------------------------------------------------- evg3arath RNA source project http://www.ncbi.nlm.nih.gov/bioproject/PRJNA323955 At cultivar:col-0 ; Illumina HiSeq 2000 paired end sequencing, root and other tissues From Virginia Tech, Song Li; 2016-11-18, doi: 10.1016/j.devcel.2016.10.012 5,102,350 gene assemblies were generated from this RNA sample with 11 assembly runs. These were classified/reduced to 37,393 locus transcripts, plus 128,412 alternates with Evigene methods. evg4arath RNA source project http://www.ncbi.nlm.nih.gov/bioproject/PRJNA316113 At cultivar:Col-0; Illumina HiSeq 2500, from Univ. CAMBRIDGE RNA source project http://www.ncbi.nlm.nih.gov/bioproject/PRJNA336053 At cultivar:Col-0; Illumina HiSeq 2000, paired end sequencing, from 5,039,388 gene assemblies were generated from these two RNA samples with 10 assembly runs. These were classified/reduced to 36,347 locus transcripts, plus 135,007 alternates with Evigene methods. evg5arath The two Evigene gene sets, evg3 from a single source for strict comparison of methods, and evg4 adding 2 other RNA sources for more complete genes, are merged into a public set evg5, using transcript-only classification. This produced a set of 34,299 locus transcripts, plus 132,604 alternates. This set is further classified by mapping to AtTAIR10 chromosomes, including reassignment of locus transcripts among alternates and paralogs, and removal of redundant same-locus extra transcripts. Alternates were validated with RNA-seq mapped introns, and those lacking unique alternate spliced exon/intron chains were removed. Genes assembled 2017-Jan/Feb, by D.G. Gilbert, gilbert at indiana edu EvidentialGene methods for producing and reducing RNA over-assemblies are outlined at http://eugenes.org/EvidentialGene/about/ RNA assemblers for Illumina paired short reads: velvet/oases: velvet1.2.10, oases_0.2.08, 2013 soap-trans: SOAPdenovo-Trans v1.03, 2013 idba_tran: idba-1.1.1, 2013, https://code.google.com/p/hku-idba/ trinity: trinity/v2.4.0, 2017.02 -------------------------------- pacbio16arath Pac-Bio RNA data and assembly, using PacBio software/methods RNA source project http://www.ncbi.nlm.nih.gov/bioproject/PRJNA306427 From Virginia Tech, Song Li; 2016-11-18, same author, tissue, RNA source as used for Illumina, PRJNA323955 doi: 10.1016/j.devcel.2016.10.012 RNA genes are extracted and assembled from 12 raw SRA PacBio RNA data entries (hdf5 files, e.g. http://sra-download.ncbi.nlm.nih.gov/srapub_files/SRR3655756_SRR3655756_hdf5.tgz), with Pacific Biosciences SMRTAnalysis software, ConsensusTools.sh and pbtranscript.py, of smrtanalysis_2.3.0.140936. This was obtained from Pac. Bio., installed and run as per directions. Only the RNA source data of project PRJNA306427 is used in this assembly, although external data sets such as a chromosome assembly can be added to further improve genes (in the nature of Cufflinks, StringTie or other assemblers that merge RNA short reads with chromosomes). These Pac-Bio gene assemblies are not used in evg3arath or 4 gene sets, in order to compare results. However they can be used as input transcripts, mixed with others, as part of the over-assembly that EvidentialGene methods reduce to an accurate, non-redundant gene set. -------------------------------- Chromosome assemblies used At TAIR10, cultivar:Col-0, NCBI genomes GCF_000001735.3_TAIR10, PRJNA116 At Ler 2016, cultivar:Ler, NCBI genomes GCA_001651475.1, PRJNA311266 Reference gene sets: At Araport11 gene set, 2016 release, built on At TAIR10 chromosomes Theobroma cacao, Thecc7, 2017 Evigene set update, eugenes.org/EvidentialGene/plants/cacao/ Citrus clementina, from NCBI Genomes, GCF_000493195.1_Citrus_clementina_v1.0 Alternate At gene model set for comparison AUGUSTUS models built on At Ler 2016 chromosome assembly, obtained from NCBI genome annotation, GCA_001651475.1, PRJNA311266 --------------------------------