Archive for February, 2011
Bio::Phylo – phyloinformatic analysis using Perl
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
Conclusions:
Bio::Phylo is composed of 59 richly documented Perl5 modules. It has been deployed successfully on a variety of computer architectures (including various Linux distributions, Mac OS X versions, Windows, Cygwin and UNIX-like systems). It is available as open source (GPL) software from http://search.cpan.org/dist/Bio-Phylo (Source: BMC Bioinformatics – Latest articles)
PrePrint: Superposition and Alignment of Labeled Point Clouds
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
Geometric objects are often represented approximately in terms of a finite set of points in three-dimensional Euclidean space. In this paper, we extend this representation to what we call labeled point clouds. A labeled point cloud is a finite set of points, where each point is not only associated with a position in three-dimensional space, but also with a discrete class label that represents a specific property. Proceeding from this representation, we address the question of how to compare two labeled points clouds in terms of their similarity. Using fuzzy modeling techniques, we develop a suitable similarity measure as well as an efficient evolutionary algorithm to compute it. Moreover, we consider the problem of establishing an alignment of the structures. In this paper, we therefore de…
PrePrint: On Lattice Protein Structure Prediction Revisited
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
Protein structure prediction is regarded as a highly challenging problem both for the biology and for the computational communities. Many approaches have been developed in the recent years, moving to increasingly complex lattice models or even off-lattice models. This paper presents a Large Neighborhood Search (LNS) to find the native state for the Hydrophobic-Polar (HP) model on the Face Centered Cubic (FCC) lattice or, in other words, a self-avoiding walk on the FCC lattice having a maximum number of H-H contacts. The algorithm starts with a tabu-search algorithm, whose solution is then improved by a combination of constraint programming and LNS. This hybrid algorithm improves earlier approaches in the literature over several well-known instances and demonstrates the potential of constra…
PrePrint: Some Mathematical Refinements Concerning Error Minimization in the Genetic Code
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
The genetic code has been shown to be very error robust compared to randomly selected codes, but to be significantly less error robust than a certain code found by a heuristic algorithm. We formulate this optimisation problem as a Quadratic Assignment Problem and thus verify that the code found by the heuristic is the global optimum. We also argue that it is strongly misleading to compare the genetic code only with codes sampled from the fixed block model, because the real code space is orders of magnitude larger. We thus enlarge the space from which random codes can be sampled from approximately 2.433 x 10^18 codes to approximately 5.908 x 10^45 codes. We do this by leaving the fixed block model, and using the wobble rules to formulate the characteristics acceptable for a genetic code. By…
PrePrint: Selecting Oligonucleotide Probes for Whole-Genome Tiling Arrays with a Cross-Hybridization Potential
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
For designing oligonucleotide tiling arrays popular current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization-energy to identify specific oligonucleotide probes. Our Cross Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The …
Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile-profile alignments
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile–profile alignments using support vector machines (SVMs). Using a number of both superfamily-s…
MedWorm Message: Watch the new MedWorm demo and find out how to get all the very latest, relevant, organized information daily!
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
Conclusions:
PaPy offers a modular framework for the creation and deployment of parallel and distributed dataprocessing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. (Source: BMC Bioinformatics – Latest articles)
Information Metrics in Genetic Epidemiology
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
Information-theoretic metrics have been proposed for studying gene-gene and gene-environment interactions in genetic epidemiology. Although these metrics have proven very promising, they are typically interpreted in the context of communications and information transmission, diminishing their tangibility for epidemiologists and statisticians. In this paper, we clarify the interpretation of information-theoretic metrics. In particular, we develop the methods so that their relation to the global properties of probability models is made clear and contrast them with log-linear models for multinomial data. Hopefully, a better understanding of their properties and probabilistic implications will promote their acceptance and correct usage in genetic epidemiology. Our novel development also sugges…
New Technology Pinpoints Genetic Differences Between Cancer, Non-cancer Patients
Posted by Waleed Ghalwash in MedWorm.com on February 28th, 2011
A group of researchers led by scientists from the Virginia Bioinformatics Institute (VBI) at Virginia Tech have developed a new technology that detects distinct genetic changes differentiating cancer patients from healthy individuals and could serve as a future cancer predisposition test. The multidisciplinary team, which includes researchers from the University of Texas Southwestern Medical Center, has created a design for a new DNA microarray that allows them to measure the two million microsatellites (short, repetitive DNA sequences) found within the human genome using 300,000 probes… (Source: Health News from Medical News Today)
Database identifies FDA-approved drugs with potential to be repurposed for treatment of orphan diseases
Posted by Waleed Ghalwash in Oxford journals on February 28th, 2011
Facing substantial obstacles to developing new therapies for rare diseases, some sponsors are looking to ‘repurpose’ drugs already approved for other conditions and use those therapies to treat rare diseases. In an effort to facilitate such repurposing and speed the delivery of new therapies to people who need them, we have established a new resource, the Rare Disease Repurposing Database (RDRD). The advantages of repurposed compounds include their demonstrated efficacy (in some clinical contexts), their observed toxicity profiles and their clearly described manufacturing controls. To create the RDRD, we matched the US Food and Drug Administration (FDA) orphan designation database to FDA drug and biological product approval lists. The RDRD lists 236 products that have received orphan status designation—that is, were found to be ‘promising’ for the treatment of a rare disease—and though not yet approved for marketing for that rare disease, they are already approved for marketing to treat some other disease or condition. The RDRD contains three tables: Orphan-designated products with at least one marketing approval for a common disease indication (N = 109); orphan-designated products with at least one marketing approval for a rare disease indication (N = 76); and orphan-designated products with marketing approvals for both common and rare disease indications (N = 51). While the data included in the database is a re-configuration/cross-indexing of information already released by the FDA, it offers sponsors a new tool for finding special opportunities to develop niche therapies for rare disease patients.
