Archive for June, 2010
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics
Posted by Waleed Ghalwash in MedWorm.com on June 29th, 2010
The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique—the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This…
Uncovering the Complexity of Transcriptomes with RNA-Seq
Posted by Waleed Ghalwash in MedWorm.com on June 29th, 2010
In recent years, the introduction of massively parallel sequencing platforms for Next Generation Sequencing (NGS) protocols, able to simultaneously sequence hundred thousand
DNA fragments, dramatically changed the landscape of the genetics studies. RNA-Seq for transcriptome studies, Chip-Seq for DNA-proteins interaction,
CNV-Seq for large genome nucleotide variations are only some of the intriguing new
applications supported by these innovative platforms. Among them RNA-Seq
is perhaps the most complex NGS application. Expression levels of specific genes,
differential splicing, allele-specific expression of transcripts can be accurately determined by RNA-Seq experiments to address many biological-related issues. All these attributes are not readily achievable from previously widespread
hybr…
Stability of Ranked Gene Lists in Large Microarray Analysis Studies
Posted by Waleed Ghalwash in MedWorm.com on June 29th, 2010
This paper presents an empirical study that aims to explain the relationship between the number of samples and stability of different gene selection techniques for microarray datasets. Unlike other similar studies where number of genes in a ranked gene list is variable, this study uses an alternative approach where stability is observed at different number of samples that are used for gene selection. Three different metrics of stability, including a novel metric in bioinformatics, were used to estimate the stability of the ranked gene lists. Results of this study demonstrate that the univariate selection methods produce significantly more stable ranked gene lists than the multivariate selection methods used in this study. More specifically, thousands of samples are needed for these multiva…
Fertility After Cesarean Delivery Among Somali-Born Women Resident in the USA
Posted by Waleed Ghalwash in MedWorm.com on June 29th, 2010
Abstract We evaluated the reproductive impact of cesarean versus vaginal delivery in Somali immigrants. Data were extracted for 106
Somali women delivering vaginally (64%) or by cesarean section (36%) between 1994 and 2006. Index delivery (vaginal versus
cesarean) was compared to the cumulative incidence rate of subsequent deliveries. The incidence rate of a delivery after a
vaginal delivery was 3.3% (CI:0–7.8%), 55.4% (CI:40.1–66.8%) and 74.4% (CI:59.0–84.0%) at 1, 2 and 3 years. Cesarean delivery
lead to a second delivery incidence rate of 2.9%(95%CI:0–8.2%), 25.9%(95%CI:9.8–39.2%) and 58.1% (95%CI:27.0–72.2%) at 1,
2 and 3 years. Somali women delivering vaginally were 1.56 times (95% CI:0.94–2.57; P = 0.084) more likely to have a subsequ…
Metamotifs – a generative model for building families of nucleotide position weight matrices
Posted by Waleed Ghalwash in MedWorm.com on June 29th, 2010
Conclusions:
We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite. (Source: BMC Bioinformatics – Latest articles)
One stop shop
Posted by Waleed Ghalwash in MedWorm.com on June 24th, 2010
Nature Reviews Molecular Cell Biology 11, 463 (2010). doi:10.1038/nrm2933
Author: Rachel David
http://www.ebi.ac.uk/ena/index.htmlLast month saw the launch of the new European Nucleotide Archive (ENA), which is developed and maintained by the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL–EBI). Now Europe’s largest database of DNA and RNA sequence information, the ENA is a ‘one stop shop’ (Source: Nature Reviews Molecular Cell Biology)
TabSQL: a MySQL tool to facilitate mapping user data to public databases
Posted by Waleed Ghalwash in MedWorm.com on June 24th, 2010
With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users’ data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user’s data and publi…
TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
Posted by Waleed Ghalwash in MedWorm.com on June 24th, 2010
Conclusions:
TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner. (Source: BMC Bioinformatics – Latest articles)
A high-throughput pipeline for the design of real-time PCR signatures
Posted by Waleed Ghalwash in MedWorm.com on June 24th, 2010
Conclusions:
TOPSI is a computationally efficient, fully integrated tool for high-throughput design of PCR signatures common to multiple bacterial genomes. TOPSI is freely available for download at http://www.bhsai.org/downloads/topsi.tar.gz. (Source: BMC Bioinformatics – Latest articles)
Locating Multiple Interacting Quantitative Trait Loci with the Zero-Inflated Generalized Poisson Regression
Posted by Waleed Ghalwash in MedWorm.com on June 24th, 2010
We consider the problem of locating multiple interacting quantitative trait loci (QTL) influencing traits measured in counts. In many applications the distribution of the count variable has a spike at zero. Zero-inflated generalized Poisson regression (ZIGPR) allows for an additional probability mass at zero and hence an improvement in the detection of significant loci. Classical model selection criteria often overestimate the QTL number. Therefore, modified versions of the Bayesian Information Criterion (mBIC and EBIC) were successfully used for QTL mapping. We apply these criteria based on ZIGPR as well as simpler models. An extensive simulation study shows their good power detecting QTL while controlling the false discovery rate. We illustrate how the inability of the Poisson distributi…
