SjD Laboratory

Resources and Software

CruzDB The UCSC Genomes Browser is a great resource for annoations, regulation and variation and all kinds of data for a growing number of taxa. We developed CruzDB, which aims to make utilizing that data simple so that we can do sophisticated analyses without resorting to awk-ful, error-prone manipulations. 

Software availability: https://github.com/brentp/cruzdb

Reference: Pedersen BS, Yang IV, De S. (2013) CruzDB: software for annotation of genomic intervals with UCSC genome-browser database. Bioinformatics. 29(23):3003-6. PMID: 24037212



SASE-hunter Non-coding regulatory mutations appear to be more frequent than previously suspected and play important roles in oncogenesis. Using a computational method called SASE-hunter, developed here, we identified a novel signature of accelerated somatic evolution marked by a significant excess of somatic mutations localized in a genomic locus, and prioritized those loci that carried the signature in multiple cancer patients. In a pan-cancer analysis of 906 samples from 12 tumor types, we detected SASE in the promoters of several genes, including known cancer genes such as MYC, BCL2, RBM5, and WWOX. These signatures were associated with over-expression, and also correlated with the age of onset of cancer, aggressiveness of the disease, and survival, suggesting that SASE-hunter detects a hitherto under-appreciated and clinically important class of regulatory changes in cancer genomes. 

Software availability: https://github.com/kylessmith/SASE-hunter

Reference: Smith KS, Yadav VK, Pedersen BS, Shaknovich RS, Geraci MW, Pollard KS, De S. Signatures of accelerated somatic evolution in gene promoters in multiple cancer types. Nucleic Acids Res (in press).


SomVarIUS Somatic variant calling typically requires paired tumornormal tissue samples. Yet, paired normal tissues are not always available in clinical settings or for archival samples RESULTS: We present SomVarIUS, a computational method for detecting somatic variants using high throughput sequencing data from unpaired tissue samples. We evaluate the performance of the method using genomic data from synthetic and real tumor samples. SomVarIUS identifies somatic variants in exome-seq data of ~150X coverage with at least 67.7% precision and 64.6% recall rates, when compared with paired-tissue somatic variant calls in real tumor samples. We demonstrate the utility of SomVarIUS by identifying somatic mutations in formalin-fixed samples, and tracking clonal dynamics of oncogenic mutations in targeted deep sequencing data from preand post-treatment leukemia samples.


Reference: Smith KS, Yadav VK, Pei S, Pollyea DA, Jordan CT, De S. (2016) SomVarIUS: Somatic variant identification from unpaired tissue samples. Bioinformatics. [Epub ahead of print] PMID: 26589277.