Archive for September, 2010
RepMaestro: scalable repeat detection on disk-based genome sequences
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: We investigate the problem of exact repeat detection on large genomic sequences. Most existing approaches based on suffix trees and suffix arrays (SAs) are limited either to small sequences or those that are memory resident. We introduce RepMaestro, a software that adapts existing in-memory-enhanced SA algorithms to enable them to scale efficiently to large sequences that are disk resident. Supermaximal repeats, maximal unique matches (MuMs) and pairwise branching tandem repeats have been used to demonstrate the practicality of our approach; the first such study to use an enhanced SA to detect these repeats in large genome sequences.
Results: The detection of supermaximal repeats was observed to be up to two times faster than Vmatch, but more importantly, was shown to scale eff…
MedWorm Message: Register for MedMatcha, MedWorm’s medical advertising network, and receive $5 free advertising.
Association screening of common and rare genetic variants by penalized regression
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
This article extends our recent research on penalized estimation methods in genome-wide association studies to the realm of rare variants.
Results: The new strategy is tested on both simulated and real data. Our findings on breast cancer data replicate previous results and shed light on variant effects within genes.
Availability: Rare variant discovery by group penalized regression is now implemented in the free program Mendel at http://www.genetics.ucla.edu/software/
Contact: huazhou@ucla.edu
Supplementary information: Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
Multi-objective pairwise RNA sequence alignment
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: With an increase in the number of known biological functions of non-coding RNAs, the importance of RNA sequence alignment has risen. RNA sequence alignment problem has been investigated by many researchers as a mono-objective optimization problem where contributions from sequence similarity and secondary structure are taken into account through a single objective function. Since there is a trade-off between these two objective functions, usually we cannot obtain a single solution that has both the best sequence similarity score and the best structure score simultaneously. Multi-objective optimization is a widely used framework for the optimization problems with conflicting objective functions. So far, no one has examined how good alignments we can obtain by applying multi-objec…
An alignment-free model for comparison of regulatory sequences
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: Some recent comparative studies have revealed that regulatory regions can retain function over large evolutionary distances, even though the DNA sequences are divergent and difficult to align. It is also known that such enhancers can drive very similar expression patterns. This poses a challenge for the in silico detection of biologically related sequences, as they can only be discovered using alignment-free methods.
Results: Here, we present a new computational framework called Regulatory Region Scoring (RRS) model for the detection of functional conservation of regulatory sequences using predicted occupancy levels of transcription factors of interest. We demonstrate that our model can detect the functional and/or evolutionary links between some non-alignable enhancers with a …
Ultra-fast FFT protein docking on graphics processors
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: Modelling protein–protein interactions (PPIs) is an increasingly important aspect of structural bioinformatics. However, predicting PPIs using in silico docking techniques is computationally very expensive. Developing very fast protein docking tools will be useful for studying large-scale PPI networks, and could contribute to the rational design of new drugs.
Results: The Hex spherical polar Fourier protein docking algorithm has been implemented on Nvidia graphics processor units (GPUs). On a GTX 285 GPU, an exhaustive and densely sampled 6D docking search can be calculated in just 15 s using multiple 1D fast Fourier transforms (FFTs). This represents a 45-fold speed-up over the corresponding calculation on a single CPU, being at least two orders of magnitude times faster…
A Bayesian method for 3D macromolecular structure inference using class average images from single particle electron microscopy
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
We present a new method for performing ab initio inference of the 3D structures of macromolecules from single particle electron cryo-microscopy experiments using class average images.
Results: We demonstrate this algorithm on one phantom, one synthetic dataset and three real (experimental) datasets (ATP synthase, V-type ATPase and GroEL). Structures consistent with the known structures were inferred for all datasets.
Availability: The software and source code for this method is available for download from our website: http://compbio.cs.toronto.edu/cryoem/
Contact: ndjaitly@cs.toronto.edu; lilien@cs.toronto.edu
Supplementary information: Supplementary data are available at Bioinformatics online. (Source: Bioinformatics)
MedWorm Message: Register for MedMatcha, MedWorm’s medical advertising network, and receive $5 free advertising.
Cross-species queries of large gene expression databases
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: Expression databases, including the Gene Expression Omnibus and ArrayExpress, have experienced significant growth over the past decade and now hold hundreds of thousands of arrays from multiple species. Since most drugs are initially tested on model organisms, the ability to compare expression experiments across species may help identify pathways that are activated in a similar way in humans and other organisms. However, while several methods exist for finding co-expressed genes in the same species as a query gene, looking at co-expression of homologs or arbitrary genes in other species is challenging. Unlike sequence, which is static, expression is dynamic and changes between tissues, conditions and time. Thus, to carry out cross-species analysis using these databases, we need…
Automated analysis of time-lapse fluorescence microscopy images: from live cell images to intracellular foci
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: Complete, accurate and reproducible analysis of intracellular foci from fluorescence microscopy image sequences of live cells requires full automation of all processing steps involved: cell segmentation and tracking followed by foci segmentation and pattern analysis. Integrated systems for this purpose are lacking.
Results: Extending our previous work in cell segmentation and tracking, we developed a new system for performing fully automated analysis of fluorescent foci in single cells. The system was validated by applying it to two common tasks: intracellular foci counting (in DNA damage repair experiments) and cell-phase identification based on foci pattern analysis (in DNA replication experiments). Experimental results show that the system performs comparably to expert human…
Gene function prediction using semantic similarity clustering and enrichment analysis in the malaria parasite Plasmodium falciparum
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: Functional genomics data provides a rich source of information that can be used in the annotation of the thousands of genes of unknown function found in most sequenced genomes. However, previous gene function prediction programs are mostly produced for relatively well-annotated organisms that often have a large amount of functional genomics data. Here, we present a novel method for predicting gene function that uses clustering of genes by semantic similarity, a naïve Bayes classifier and ‘enrichment analysis’ to predict gene function for a genome that is less well annotated but does has a severe effect on human health, that of the malaria parasite Plasmodium falciparum.
Results: Predictions for the molecular function, biological process and cellular component o…
ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments
Posted by Waleed Ghalwash in MedWorm.com on September 26th, 2010
Motivation: Experiments such as ChIP-chip, ChIP-seq, ChIP-PET and DamID (the four methods referred herein as ChIP-X) are used to profile the binding of transcription factors to DNA at a genome-wide scale. Such experiments provide hundreds to thousands of potential binding sites for a given transcription factor in proximity to gene coding regions.
Results: In order to integrate data from such studies and utilize it for further biological discovery, we collected interactions from such experiments to construct a mammalian ChIP-X database. The database contains 189 933 interactions, manually extracted from 87 publications, describing the binding of 92 transcription factors to 31 932 target genes. We used the database to analyze mRNA expression data where we perform gene-list enrichment analysi…
