Archive for September, 2009

Search similar protein structures with classification, sequence and 3d alignments.

Authors: Lu Z, Zhao Z, Garcia S, Krishnaswamy K, Fu B
We have developed an algorithm and web tool to search similar protein structures in the PDB (Protein Data Bank). The algorithm is a combination of a series of methods including protein classification, geometric feature extraction, sequence alignment, and 3D structure alignment. Given a protein structure, the tool can efficiently discover similar structures from hundreds of thousands of structures stored in the PDB. Our experimental results show that it is more accurate than other well-known protein search systems including PSI-BLAST, 3D-BLAST, and SSM in finding proteins that are structurally similar to the query protein, and its speed is also competitive with those systems. The algorithm has been fully implemented and is accessible…

No Comments

Protein fold classification with genetic algorithms and feature selection.

Authors: Chen P, Liu C, Burge L, Mahmood M, Southerland W, Gloster C
Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset a…

No Comments

Predicting local quality of a sequence-structure alignment.

We present two complementary techniques, FragQA and PosQA, to accurately predict local quality of a sequence-structure (i.e. sequence-template) alignment generated by comparative modeling (i.e. homology modeling and threading). FragQA and PosQA predict local quality from two different perspectives. Different from existing methods, FragQA directly predicts cRMSD between a continuously aligned fragment determined by an alignment and the corresponding fragment in the native structure, while PosQA predicts the quality of an individual aligned position. Both FragQA and PosQA use an SVM (Support Vector Machine) regression method to perform prediction using similar information extracted from a single given alignment. Experimental results demonstrate that FragQA performs well on predicting local f…

No Comments

Efficient simulation of ligand-receptor binding processes using the conformation dynamics approach.

Authors: Bujotzek A, Weber M
The understanding of biological ligand-receptor binding processes is relevant for a variety of research topics and assists the rational design of novel drug molecules. Computer simulation can help to advance this understanding, but, due to the high dimensionality of according systems, suffers from the severe computational cost. Based on the framework provided by conformation dynamics and transition state theory, a novel heuristic approach of simulating ligand-receptor binding processes is introduced, which is not dependent on calculating lengthy molecular dynamics trajectories. First, the relevant portion of conformational space is partitioned with meshless methods. Then, each region is sampled separately, using hybrid Monte Carlo. Finally, the dynamical bi…

No Comments

Iterative two-pass algorithm for missing data imputation in SNP arrays.

Authors: Sinoquet C
Though nowadays high-throughput genotyping techniques’ quality improves, missing data still remains fairly common. Studies have shown that even a low percentage of missing SNPs is detrimental to the reliability of down-stream analyses such as SNP-disease association tests. This paper investigates the potentiality for improving the accuracy of an SNP inference method based on the algorithm formerly designed by Roberts and co-workers (NPUTE, 2007). This initial algorithm performs a single scan of an SNP array, inferring missing SNPs in the context of sliding windows. We have first designed a variant, KNNWinOpti, which fully exploits backward and forward dependencies between the overlapping windows and thus restores the genuine dependency of inference upon direction sc…

No Comments

A novel coherence measure for discovering scaling biclusters from gene expression data.

Authors: Mukhopadhyay A, Maulik U, Bandyopadhyay S
Biclustering methods are used to identify a subset of genes that are co-regulated in a subset of experimental conditions in microarray gene expression data. Many biclustering algorithms rely on optimizing mean squared residue to discover biclusters from a gene expression dataset. Recently it has been proved that mean squared residue is only good in capturing constant and shifting biclusters. However, scaling biclusters cannot be detected using this metric. In this article, a new coherence measure called scaling mean squared residue (SMSR) is proposed. Theoretically it has been proved that the proposed new measure is able to detect the scaling patterns effectively and it is invariant to local or global scaling of the input dataset. The …

MedWorm Message: Get the very latest Swine Flu news via the MedWorm Swine Flu RSS news feed – updated hourly from thousands of authoritative health and news sources.

No Comments

Asymptotics of canonical and saturated RNA secondary structures.

Authors: Clote P, Kranakis E, Krizanc D, Salvy B
It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366 . n(-3/2) . 2.618034(n). In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures – canonical and saturated structures. Canonical secondary structures are defined to have no lonely (isolated) base pairs. This class of secondary structures was introduced by Bompfünewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up, by proving that the asymptotic number of canonical RNA secondary structures is 2.1614 . n(-3/2) . 1.96798(n) and that the …

No Comments

A note on the calculation of N-statistics.

Authors: Almudevar A
A class of statistics suitable for testing against equality of multivariate distributions is described by Klebanov and co-workers in 2007. Referred to as N-statistics, their discriminating ability is based on various forms of distance kernels in R(d), the intention being to capture distinct forms of deviation from equality. This makes them particularly suitable for large-scale genomic screening applications, in which such variety of alternatives can be anticipated. One of these kernels, denoted as L(4), introduces weighting by directional densities, hence the evaluation of L(4) requires integration on the unit sphere in R(d). In this note we introduce a methodology for the evaluation of integrals related to L(4). It is shown that for a class of directional densitie…

No Comments

A bioinformatics approach to ascertaining the rarity of HLA alleles

A project of the 15th International Histocompatibility Workshop examined the rarity of human leukocyte antigen (HLA) alleles. A section was constructed in the website, www.allelefrequencies.net to contain this data from different sources. A mechanism to search the data was implemented for use by any individual. (Source: Tissue Antigens)

No Comments

Teaching expression proteomics: From the wet-lab to the laptop

Expression proteomics has become, in recent years, a key genome-wide expression approach in fundamental and applied life sciences. This postgenomic technology aims the quantitative analysis of all the proteins or protein forms (the so-called proteome) of a given organism in a given environmental and genetic context. It is a challenge to provide effective training in this area due to its demanding laboratory procedures and laborious computational data analysis. However, the effective training of undergraduates and postgraduates in this field is highly recommended to prepare them for the challenges of postgenomic research and of medical, industrial and other economical activities. Since 2004, the area of Biological Sciences at the Department of Chemical and Biological Engineering of Institut…

No Comments