<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Bioinformatics &#187; Bioinformatics Web</title>
	<atom:link href="http://bioinformatics.me/category/bioinformatics-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://bioinformatics.me</link>
	<description>BioData make sense!</description>
	<lastBuildDate>Thu, 18 Aug 2011 12:47:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Statistical Aspects of Datamining-Video Course-Watch Online</title>
		<link>http://bioinformatics.me/statistical-aspects-of-datamining-video-course-watch-online/</link>
		<comments>http://bioinformatics.me/statistical-aspects-of-datamining-video-course-watch-online/#comments</comments>
		<pubDate>Sat, 29 Aug 2009 18:40:21 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/statistical-aspects-of-datamining-video-course-watch-online/</guid>
		<description><![CDATA[
Data Mining is used to discover patterns and relation-ships in data, with an emphasis on large observational data bases. Despite the obvious connections between data mining and statistical data analysis, most of the methodologies used in Data Mining have so far originated in elds other than Statistics. This course video tutorial gives introduction to datamining, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/08/data_mining_video_tutorial.gif" alt="data_mining_video_tutorial" width="262" height="198" class="aligncenter size-full wp-image-579" /></p>
<p>Data Mining is used to discover patterns and relation-ships in data, with an emphasis on large observational data bases. Despite the obvious connections between data mining and statistical data analysis, most of the methodologies used in Data Mining have so far originated in elds other than Statistics. This course video tutorial gives introduction to datamining, statistical aspects of datamining, with training on R statistics software.</p>
<p>Course:Statistical Aspects of Data Mining<br />
Course teacher:Professor David Mease<br />
url:http://www.stats202.com/original_index.html<br />
Courtsey:Standford University,Google tech talks</p>
</p>
</p>
</p>
</p>
</p>
</p></p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/statistical-aspects-of-datamining-video-course-watch-online/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to make Phylogram-Video Tutorial</title>
		<link>http://bioinformatics.me/how-to-make-phylogram-video-tutorial/</link>
		<comments>http://bioinformatics.me/how-to-make-phylogram-video-tutorial/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 13:19:53 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/how-to-make-phylogram-video-tutorial/</guid>
		<description><![CDATA[
A Phylogenetic tree or Phylogram, sometimes called the &#8216;Tree of Life&#8217;, shows the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. Each node with descendants represents the most recent common ancestor of the descendants, with edge lengths in our tree, corresponding to time estimates. Each node [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/08/phylogenetics_introduction_video1-300x202.png" alt="phylogenetics_introduction_video1" width="300" height="202" class="aligncenter size-medium wp-image-575" /><br />
A Phylogenetic tree or Phylogram, sometimes called the &#8216;Tree of Life&#8217;, shows the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. Each node with descendants represents the most recent common ancestor of the descendants, with edge lengths in our tree, corresponding to time estimates. Each node in a phylogenetic tree is called a taxonomic unit.</p>
<p>This video tutorial explains how to make your own phylogram. Steps include getting Homologene from NCBI website, using EBI clustalW for creating phylogram.</p></p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/how-to-make-phylogram-video-tutorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics and drug discovery</title>
		<link>http://bioinformatics.me/bioinformatics-and-drug-discovery-2/</link>
		<comments>http://bioinformatics.me/bioinformatics-and-drug-discovery-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:02 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-and-drug-discovery-2/</guid>
		<description><![CDATA[
In recent years, we have seen an explosion in the amount of biological information that is available. Various databases are doubling in size every 15 months and we now have the complete genome sequences of more than 100 organisms. It appears that the ability to generate vast quantities of data has surpassed the ability to [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/drug_discovery_bionformatics-300x200.jpg" alt="drug_discovery_bionformatics" width="500" height="300" class="aligncenter size-medium wp-image-380" /></p>
<p>In recent years, we have seen an explosion in the amount of biological information that is available. Various databases are doubling in size every 15 months and we now have the complete genome sequences of more than 100 organisms. It appears that the ability to generate vast quantities of data has surpassed the ability to use this data meaningfully. The pharmaceutical industry has embraced genomics as a source of drug targets. It also recognises that the field of bioinformatics is crucial for validating these potential drug targets and for determining which ones are the most suitable for entering the drug development pipeline.</p>
<p align="justify">Recently, there has been a change in the way that medicines are being developed due to our increased understanding of molecular biology. In the past, new synthetic organic molecules were tested in animals or in whole organ preparations. This has been replaced with a molecular target approach in which in-vitro screening of compounds against purified, recombinant proteins or genetically modified cell lines is carried out with a high throughput. This change has come about as a consequence of better and ever improving knowledge of the molecular basis of disease.</p>
<p align="justify">All marketed drugs today target only about 500 gene products. The elucidation of the human genome which has an estimated 30,000 to 40,000 genes, presents immense new opportunities for drug discovery and simultaneously creates a potential bottleneck regarding the choice of targets to support the drug discovery pipeline. The major advances in genomics and sequencing means that finding an attractive target is no longer a problem but finding the targets that are most likely to succeed has become the challenge. The focus of bioinformatics in the drug discovery process has therefore shifted from target identification to target validation.</p>
<p align="justify">A lot of factors need to be taken into account concerning a candidate target from a multitude of heterogeneous resources. The types of information that one needs to gather about potential targets include nucleotide and protein sequencing information, homologues, mapping information, function prediction, pathway information, disease associations, variants, structural information, gene and protein expression data and species/taxonomic distribution among others. Different bioinformatics tools can be used to gather this information. The accumulation of this information into databases about potential targets means that the pharmaceutical companies can save themselves much time, effort and expense exerting bench efforts on targets that will ultimately fail. The information that is gathered helps to characterise the different targets into families and subfamilies. It also classifies the behaviour of the different molecules in a biochemical and cellular context. Decisions about which families provide the best potential targets is guided by a number of criteria. It is important that the potential target has a suitable structure for interacting with drug molecules. Structural genomics helps to prioritise the families in terms of their 3D structures.</p>
<p align="justify">Sometimes we want to develop broad spectrum drugs that are effective against a wide range of pathogenic species while at other times we want to develop narrow spectrum drugs that are highly specific to a particular organism. Comparative genomics helps to find protein families that are widely taxonomically dispersed and those that are unique to a particular organism.</p>
<p align="justify">For example, when we want to develop a broad spectrum antibiotic, we are looking for targets that are present in a large number of bacteria yet have no similar homologues in human. This means that the antibiotic will be effective against many bacteria killing them while causing no harm to the human. In order to determine the role our potential drug target plays in a particular disease mechanism we use DNA and protein chips. These chips can measure the amount of transcript or protein expressed by a cell at different times or in different states (healthy versus diseased).</p>
<p align="justify">Clustering algorithms are used to organise this expression data into different biologically relevant clusters. We can then compare the expression profiles from the diseased and healthy cells to help us understand the role our gene or protein plays in a disease process. All of these computational tools can help to compose a detailed picture about a protein family, its involvement in a disease process and its potential as a possible drug target.</p>
<p align="justify">Following on from the genomics explosion and the huge increase in the number of potential drug targets, there has been a move from the classical linear approach of drug discovery to a non linear and high throughput approach. The field of bioinformatics has become a major part of the drug discovery pipeline playing a key role for validating drug targets. By integrating data from many inter-related yet heterogeneous resources, bioinformatics can help in our understanding of complex biological processes and help improve drug discovery.</p>
<p align="justify"><strong><em>Source: </em></strong><em> 2can </em></p>
<p align="justify"><strong>Drug Design based on Bioinformatics Tools </strong></p>
<p align="justify">The processes of designing a new drug using bioinformatics tools have open a new area of research. However, computational techniques assist one in searching drug target and in designing drug in silco, but it takes long time and money. In order to design a new drug one need to follow the following path.</p>
<div>
<ul>
<li><strong>Identify Target Disease: </strong> One needs to know all about the disease and existing or traditional remedies. It is also important to look at very similar afflictions and their known treatments.<br />
Target identification alone is not sufficient in order to achieve a successful treatment of a disease. A real drug needs to be developed.This drug must influence the target protein in such a way that it does not interfere with normal metabolism. One way to achieve this is to block activity of the protein with a small molecule. Bioinformatics methods have been developed to virtually screen the target for compounds that bind and inhibit the protein. Another possibility is to find other proteins that regulate the activity of the target by binding and formiong a complex.</li>
<li><strong>Study Interesting Compounds: </strong> One needs to identify and study the lead compounds that have some activity against a disease. These may be only marginally useful and may have severe side effects. These compounds provide a starting point for refinement of the chemical structures.</li>
<li><strong>Detect the Molecular Bases for Disease: </strong>If it is known that a drug must bind to a particular spot on a particular protein or nucleotide then a drug can be tailor made to bind at that site. This is often modeled computationally using any of several different techniques. Traditionally, the primary way of determining what compounds would be tested computationally was provided by the researchers&#8217; understanding of molecular interactions. A second method is the brute force testing of large numbers of compounds from a database of available structures.</li>
<li><strong>Rational drug design techniques: </strong> These techniques attempt to reproduce the researchers&#8217; understanding of how to choose likely compounds built into a software package that is capable of modeling a very large number of compounds in an automated way. Many different algorithms have been used for this type of testing, many of which were adapted from artificial intelligence applications. The complexity of biological systems makes it very difficult to determine the structures of large biomolecules. Ideally experimentally determined (x-ray or NMR) structure is desired, but biomolecules are very difficult to crystallize.</li>
<li><strong>Refinement of compounds: </strong> Once you got a number of lead compounds have been found, computational and laboratory techniques have been very successful in refining the molecular structures to give a greater drug activity and fewer side effects. This is done both in the laboratory and computationally by examining the molecular structures to determine which aspects are responsible for both the drug activity and the side effects.</li>
<li><strong>Quantitative Structure Activity Relationships (QSAR): </strong>This computational technique should be used to detect the functional group in your compound in order to refine your drug. This can be done using QSAR that consists of computing every possible number that can describe a molecule then doing an enormous curve fit to find out which aspects of the molecule correlate well with the drug activity or side effect severity. This information can then be used to suggest new chemical modifications for synthesis and testing.</li>
<li><strong>Solubility of Molecule: </strong> One need to check whether the target molecule is water soluble or readily soluble in fatty tissue will affect what part of the body it becomes concentrated in. The ability to get a drug to the correct part of the body is an important factor in its potency. Ideally there is a continual exchange of information between the researchers doing QSAR studies, synthesis and testing. These techniques are frequently used and often very successful since they do not rely on knowing the biological basis of the disease which can be very difficult to determine.</li>
<li><strong>Drug Testing: </strong>Once a drug has been shown to be effective by an initial assay technique, much more testing must be done before it can be given to human patients. Animal testing is the primary type of testing at this stage. Eventually, the compounds, which are deemed suitable at this stage, are sent on to clinical trials. In the clinical trials, additional side effects may be found and human dosages are determined.<br />
<strong><em>Source: </em></strong><em>By Dr.G. P. S. Raghava, Institute of Microbial Technology Sector 39-A, Chandigarh, India . </em></li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-and-drug-discovery-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics-Virtual Drug Development</title>
		<link>http://bioinformatics.me/bioinformatics-virtual-drug-development-2/</link>
		<comments>http://bioinformatics.me/bioinformatics-virtual-drug-development-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:02 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-virtual-drug-development-2/</guid>
		<description><![CDATA[
These days, computers are an integral part of genomics-based drug discovery, helping researchers find drug targets by comparing databases of genomic information with annotations about functional information, by analyzing the data that comes in from various wetlab experiments, and by simply keeping track of the huge amounts of biological data being unearthed in life sciences [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/virutal_drug_development-300x255.jpg" alt="virutal_drug_development" width="500" height="255" class="aligncenter size-medium wp-image-384" /></p>
<p>These days, computers are an integral part of genomics-based drug discovery, helping researchers find drug targets by comparing databases of genomic information with annotations about functional information, by analyzing the data that comes in from various wetlab experiments, and by simply keeping track of the huge amounts of biological data being unearthed in life sciences research. This is the role of bioinformatics, a field that has exploded in importance over the last few years as companies have begun to realize they are drowning in raw data.</p>
<p align="justify">But now the uses of computers for other parts of the discovery and development process are coming to the fore. Theoretically, researchers could now test virtual drug compounds against virtual protein targets, study the virtual pharmacokinetics of their optimized virtual lead in what amounts to virtual animals, study its effects on virtual organs, design a virtual clinical trial to test assumptions and variances, and even answer some regulatory questions through simulation. Somewhere in that process, a chemist has to actually mix up a compound and conduct some experiment but buckets of silicon are being added to the discovery and development process every day, with the hope that the wet lab will one day become as dry as a sand box.</p>
<p align="justify">Twentieth century biology has been about cataloging the elements of life. Every day, we have a little more of the recipe of life, stretching before us as an almost endless line of As, Gs, Cs, and Ts&#8211;forming general sequences common to most living organisms, gene sequences common to most humans, polymorphisms peculiar to small subpopulations. But this static information amounts to little more than a parts catalog, a shopping list for a living organism. A vital thrust of 21st century biology will be the animation of these static parts.</p>
<p align="justify">After all, a long string of base pair letters is like well a long string of letters. It makes for a less interesting read than a telephone directory, and while it tells you how dial up all sorts of important proteins, most sequences alone tell you little more about a person than does their phone number. We cannot yet predict protein folding from amino acid sequence, nor can we accurately predict protein function from protein shape. We can, of course, correlate certain polymorphisms with likely disease outcomes, and we learn more every day. But the more we learn about the importance of these new variables, the more we have to take into consideration when developing clinical strategies, undertaking drug development, and designing clinical trials. And gene sequence, even when linked to functional information, will only be one of many variables to consider in optimally designing therapeutic interventions and treating disease.</p>
<p align="justify">One way of animating our growing store of static information is through computer simulation. It is an area that is beginning to emerge slowly in the life sciences, with only a handful of academic and commercial players active in the area. But for a fledging discipline, there is a great variety in the scope of work being undertaken. While academic labs try to create accurate simulations of red blood cells and simple bacteria, the private companies are taking on bolder projects&#8211;simulating human organs and even human diseases in their entirety.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-virtual-drug-development-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Genome Projects</title>
		<link>http://bioinformatics.me/genome-projects-2/</link>
		<comments>http://bioinformatics.me/genome-projects-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:02 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/genome-projects-2/</guid>
		<description><![CDATA[
Genome Projects
Genome projects are scientific endeavours that aim to map the genome of a living being or of a species (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus), that is, the complete set of genes caried by this living being or virus. The Human Genome Project [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/genomeprojects_bioinformatics.jpg" alt="genomeprojects_bioinformatics" width="400" height="373" class="aligncenter size-full wp-image-387" /><br />
<strong>Genome Projects</strong></p>
<p align="justify">Genome projects are scientific endeavours that aim to map the genome of a living being or of a species (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus), that is, the complete set of genes caried by this living being or virus. The Human Genome Project was such a project. Some have argued that the era of genomics is one of the more fundamental advances in human history.</p>
<p align="justify"><strong>Genome sequencing</strong></p>
<p align="justify">There are essentially two ways to sequence a genome. The BAC-to-BAC method, the first to be employed in human genome studies, is slow but sure. The BAC-to-BAC approach, also referred to as the map-based method, evolved from procedures developed by a number of researchers during the late 1980s and 90s and that continues to develop and change.*</p>
<p align="justify">The other technique, known as whole genome shotgun sequencing, brings speed into the picture, enabling researchers to do the job in months to a year. The shotgun method was developed by J. Craig Venter in 1996.</p>
<p align="justify"><strong>1.BAC to BAC Sequencing</strong></p>
<p align="justify">The BAC to BAC approach first creates a crude physical map of the whole genome before sequencing the DNA. Constructing a map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of DNA before taking a closer look and sequencing all the fragments.</p>
<p align="justify">1.Several copies of the genome are randomly cut into pieces base pairs (bp) long.</p>
<p align="justify">2.Each of these fragments is inserted into a BAC-a bacterial artificial chromosome. A BAC is a man made piece of DNA that can replicate inside a bacterial cell. The whole collection of BACs containing the entire human genome is called a BAC library, because each BAC is like a book in a library that can be accessed and copied.</p>
<p align="justify">3.These pieces are fingerprinted to give each piece a unique identification tag that determines the order of the fragments. Fingerprinting involves cutting each BAC fragment with a single enzyme and finding common sequence landmarks in overlapping fragments that determine the location of each BAC along the chromosome. Then overlapping BACs with markers every 100,000 bp form a map of each chromosome.</p>
<p>Each BAC is then broken randomly into 1,500 bp pieces and placed in another artificial piece of DNA called M13. This collection is known as an M13 library.</p>
<p align="justify">All the M13 libraries are sequenced. 500 bp from one end of the fragment are sequenced generating millions of sequences.These sequences are fed into a computer program called PHRAP that looks for common sequences that join two fragments together.</p>
<p align="justify"><strong>2.Whole Genome Shotgun Sequencing</strong></p>
<p align="justify">The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map. Therefore, it is much faster.</p>
<p align="justify">1.Multiple copies of the genome are randomly shredded into pieces that are 2,000 base pairs (bp) long by squeezing the DNA through a pressurized syringe. This is done a second time to generate pieces that are 10,000 bp long.</p>
<p align="justify">2.Each 2,000 and 10,000 bp fragment is inserted into a plasmid, which is a piece of DNA that can replicate in bacteria. The two collections of plasmids containing 2,000 and 10,000 bp chunks of human DNA are known as plasmid libraries.</p>
<p align="justify">3.Both the 2,000 and the 10,000 bp plasmid libraries are sequenced. 500 bp from each end of each fragment are decoded generating millions of sequences. Sequencing both ends of each insert is critical for the assembling the entire chromosome.</p>
<p align="justify">Computer algorithms assemble the millions of sequenced fragments into a continuous stretch resembling each chromosome.</p>
<p>comprehensive access to information regarding complete and ongoing genome projects, as well as metagenomes and metadata,<a href="http://genomesonline.org/index2.htm" target="_blank">Visit Gold</a>
</p>
<p align="justify"><span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/genome-projects-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Proteomics-Introduction</title>
		<link>http://bioinformatics.me/proteomics-introduction-2/</link>
		<comments>http://bioinformatics.me/proteomics-introduction-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:02 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/proteomics-introduction-2/</guid>
		<description><![CDATA[ 
Proteomics-Introduction
Definition: &#8220;The analysis of complete complements of proteins. Proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and, ultimately, their function. Initially encompassing just two- dimensional (2D) gel electrophoresis for protein separation and identification, proteomics now refers to any procedure that characterizes large [...]]]></description>
			<content:encoded><![CDATA[<p> <img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/proteomics_introduction.jpg" alt="Proteomics-Introduction" width="368" height="350" class="aligncenter size-full wp-image-391" /><br />
Proteomics-Introduction<br />
Definition: &#8220;The analysis of complete complements of proteins. Proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and, ultimately, their function. Initially encompassing just two- dimensional (2D) gel electrophoresis for protein separation and identification, proteomics now refers to any procedure that characterizes large sets of proteins. The explosive growth of this field is driven by multiple forces &#8211; genomics and its revelation of more and more new proteins; powerful protein technologies, such as newly developed mass spectrometry approaches, global [yeast] two- hybrid techniques, and spin- offs from DNA arrays; and innovative computational tools and methods to process, analyze, and interpret prodigious amounts of data.&#8221;</p>
<p>The theme of molecular biology research, in the past, has been oriented around the gene rather than the protein. This is not to say that researchers have neglected to study proteins, but rather that the approaches and techniques most commonly used have looked primarily at the nucleic acids and then later at the protein(s) implicated.</p>
<p>The main reason for this has been that the technologies available, and the inherent characteristics of nucleic acids, have made the genes the low hanging fruit. This situation has changed recently and continues to change as larger scale, higher throughput methods are developed for both nucleic acids and proteins. The majority of processes that take place in a cell are not performed by the genes themselves, but rather by the proteins that they code for.</p>
<p>A disease can arise when a gene/protein is over- or under-expressed, or when a mutation in a gene results in a malformed protein, or when post translational modifications alter a protein&#8217;s function. Thus to truly understand a biological process, the relevant proteins must be studied directly. But there are more challenges when studying proteins compared to studying genes, due to their complex 3-D structure which is related to the function, analogous to a machine.</p>
<p>Proteomics is defined as the systematic large-scale analysis of protein expression under normal and perturbed (stressed, diseased, and/or drugged) states, and generally involves the separation, identification, and characterization of all of the proteins in a cell or tissue sample. The meaning of the term has also been expanded, and is now used loosely to refer to the approach of analyzing which proteins a particular type of cell synthesizes, how much the cell synthesizes, how cells modify proteins after synthesis, and how all of those proteins interact.</p>
<p>There are orders of magnitude more proteins than genes in an organism &#8211; based on alternative splicing (several per gene) and post translational modifications (over 100 known), there are estimated to be a million or more.</p>
<p>Fortunately there are features such as folds and motifs, which allow them to be categorized into groups and families, making the task of studying them more tractable. There is a broad range of technologies used in proteomics, but the central paradigm has been the use of 2-D gel electrophoresis (2D-GE) followed by mass spectrometry (MS). 2D-GE is used to first separate the proteins by isoelectric point and then by size.</p>
<p>The individual proteins are subsequently removed from the gel and prepared, then analyzed by MS to determine their identity and characteristics. There are various types of mass analyzers used in proteomics MS including quadrupole, time-of-flight (TOF), and ion trap, and each has its own particular capabilities. Tandem arrangements are often used, such as quadrupole-TOF, to provide more analytical power. The recent development of soft ionization techniques, namely matrix-assisted laser desorption ionization (MALDI) and electro-spray ionization (ESI), has allowed large biomolecules to be introduced into the mass analyzer without completely decomposing their structures, or even without breaking them at all, depending on the design of the experiment.</p>
<p>There are techniques which incorporate liquid chromatography (LC) with MS, and others that use LC by itself. Robotics have been applied to automate several steps in the 2DGE-MS process such as spot excision and enzyme digests. To determine a protein&#8217;s structure, XRD and NMR techniques are being improved to reach higher throughput and better performance.</p>
<p>For example, automated high-throughput crystallization methods are being used upstream of XRD to alleviate that bottleneck. For NMR, cryo-probes and flow probes shorten analysis time and decrease sample volume requirements. The hope is that determining about 10,000 protein structures will be enough to characterize the estimated 5,000 or so folds, which will feed into more reliable in silico structural prediction methods.</p>
<p>Structure by itself does not provide all of the desired information, but is a major step in the right direction. Protein chips are being developed for many of the processes in proteomics. For example, researchers are developing protocols for protein microarrays at institutions such as Harvard and Stanford as well as at several companies. These chips &#8211; grids of attached peptide fragments, attached antibodies, or gel &#8220;pads&#8221; with proteins suspended inside &#8211; will be used for various experiments such as protein-protein interaction studies and differential expression analysis.</p>
<p>They can also be used to filter out high abundance proteins before further experiments; one of the major challenges in proteomics is isolating and analyzing the low abundance proteins, which are thought to be the most important. There are many other types of protein chips, and the number will continue to grow. For example, microfluidics chips can combine the sample preparation steps prior to MS, such as enzyme digests, with nanoelectrospray ionization, all on the one chip. Or, the samples can be ionized directly off of the surface of the chip, similar to a MALDI target. Microfluidics chips are also being combined with NMR.</p>
<p>In the next few years, various protein chips will be used increasingly in diagnostic applications as well. The bioinformatics side of proteomics includes both databases and analysis software. There are many public and private databases containing protein data ranging from sequences, to functions, to post translational modifications. Typically, a researcher will first perform 2D-GE followed by MS; this will result in a fingerprint, molecular weight, or even sequence for each protein of interest, which can then be used to query databases for similarities or other information.</p>
<p>Swiss-Prot and TrEMBL, developed in a collaboration between the Swiss Institute of Bioinformatics and the European Bioinformatics Institute, are currently the major databases dedicated to cataloging protein data, but there are dozens of more specialized databases and tools. New bioinformatics approaches are constantly being introduced. Recent customized versions of PSI-BLAST can, for example, utilize not only the curated protein entries in Swiss-Prot but also linguistic analyses of biomedical journal articles to help determine protein family relationships. Publicly available databases and tools are popular, but there are also several companies offering subscriptions to proprietary databases, which often include protein-protein interaction maps generated using the yeast two-hybrid (Y2H) system.</p>
<p>The proteomics market is comprised of instrument manufacturers, bioinformatics companies, laboratory product suppliers, service providers, and other biotech related companies which can defy categorization. A given company can often overlap more than one of these areas. Many of the companies involved in the proteomics market are actually doing drug discovery as their major focus, while partnering, or providing services or subscriptions, to other companies to generate short term revenues. The market for proteomics products and services was estimated to be $1.0B in 2000, growing at a CAGR of 42% to about $5.8B in 2005.</p>
<p>The major drivers will continue to be the biopharmaceutical industry&#8217;s pursuit of blockbuster drugs and the recent technological advances which have allowed large-scale studies of genes and proteins. Alliances are becoming increasingly important in this field, because it is challenging for companies to find all of the necessary expertise to cover the different activities involved in proteomics. Synergies must be created by combining forces. For example, many companies working with mass spectrometry, both the manufacturers and end user labs, are collaborating with protein chip related companies. The technologies are a natural fit for many applications, such as microfluidic chips which provide nanoelectrospray ionization into a mass spectrometer.</p>
<p>There are many combinations of diagnostics, instrumentation, chip, and bioinformatics companies which create effective partnerships. In general, proteomics appears to hold great promise in the pursuit of biological knowledge. There has been a general realization that the large-scale approach to biology, as opposed to the strictly hypothesis-driven approach, will rapidly generate much more useful information.</p>
<p>The two approaches are not mutually exclusive, and the happy medium seems to be the formation of broad hypotheses which are subsequently investigated by designing large-scale experiments and selecting the appropriate data. Proteomics and genomics, and other varieties of &#8216;omics&#8217;, will all continue to complement each other in providing the tools and information for this type of research. </p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/proteomics-introduction-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to Microarray</title>
		<link>http://bioinformatics.me/introduction-to-microarray-2/</link>
		<comments>http://bioinformatics.me/introduction-to-microarray-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:02 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/introduction-to-microarray-2/</guid>
		<description><![CDATA[
Microarray-Definition
A 2D array, typically on a glass, filter, or silicon wafer, upon which genes or gene fragments are deposited or synthesized in a predetermined spatial order allowing them to be made available as probes in a high-throughput, parallel manner.
Microarrays that consist of ordered sets of DNA fixed to solid surfaces provide pharmaceutical firms with a [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/microarray_introduction.jpg" alt="microarray_introduction" width="439" height="427" class="aligncenter size-full wp-image-394" /><br />
<strong>Microarray-Definition</strong><br />
A 2D array, typically on a glass, filter, or silicon wafer, upon which genes or gene fragments are deposited or synthesized in a predetermined spatial order allowing them to be made available as probes in a high-throughput, parallel manner.</p>
<div>Microarrays that consist of ordered sets of DNA fixed to solid surfaces provide pharmaceutical firms with a means to identify drug targets.<br />
In the future, the emerging technology promises to help physicians decide the most effective drug treatments for individual patients.</div>
<p align="justify">Microarrays are simply ordered sets of DNA molecules of known sequence. Usually rectangular, they can consist of a few hundred to hundreds of thousands of sets. Each individual feature goes on the array at precisely defined location on the substrate. The identity of the DNA molecule fixed to each feature never changes. Scientists use that fact in calculating their experimental results. Microarray analysis permits scientists to detect thousands of genes in a small sample simultaneously and to analyze the expression of those genes. As a result, it promises to enable biotechnology and pharmaceutical companies to identify drug targets &#8211; the proteins with which drugs actually interact. Since it can also help identify individuals with similar biological patterns, microarray analysis can assist drug companies in choosing the most appropriate candidates for participating in clinical trials of new drugs. In the future, this emerging technology has the potential to help medical professionals select the most effective drugs, or those with the fewest side effects, for individual patients.</p>
<p align="justify"><strong>Potential of Microarray analysis:<br />
</strong>The academic research community stands to benefit from microarray technology just as much as the pharmaceutical industry. The ability to use it in place of existing technology will allow researchers to perform experiments faster and more cheaply, and will enable them to concentrate on analyzing the results of microarray experiments rather than simply performing the experiments. This research could then lead to a better understanding of the disease process. That will require many different levels of research. While the field of expression has received most attention so far, looking at the gene copy level and protein level is just as important. Microarray technology has potential applications in each of these three levels.</p>
<p align="justify">Identifying drug targets provided the initial market for the microarrays. A good drug target has extraordinary value for developing pharmaceuticals. By comparing the ways in which genes are expressed in a normal and diseased heart, for example, scientists might be able to identify the genes and hence the associated proteins &#8212; that are part of the disease process. Researchers could then use that information to synthesize drugs that interact with these proteins, thus reducing the disease&#8217;s effect on the body.</p>
<p align="justify">Gene sequences can be measured simultaneously and calculated instantly when an ordered set of DNA molecules of known sequence a microarray is used. Consequently, scientists can evaluate an entire set of genes at once, rather than looking at physiological changes one gene at a time. For example, Genetics Institute, a biotechnology company in Cambridge, Massachusetts, built an array consisting of genes for cytokines, which are proteins that affect cell physiology during the inflammatory response, among other effects. The full set of DNA molecules contained more than 250 genes. While that number was not large by current standards of microarrays, it vastly outnumbered the one or two genes examined in typical pre-microarray experiments. The Genetics Institute scientists used the array to study how changes experienced by cells in the immune system during the inflammatory response are reflected in the behavior of all 250 genes at the same time. This experiment established the potential for using the patterns of response to help locate points in the body at which drugs could prove most effective.</p>
<p align="justify"><strong>Microarray Products:<br />
</strong>Within that basic technological foundation, microarray companies have created a variety of products and services. They range in price, and involve several different technical approaches. A kit containing a simple array with limited density can cost as little as $1,100, while a versatile system favored by R&amp;D laboratories in pharmaceutical and biotechnology companies costs more than $200,000. The differences among products lies in the basic components and the precise nature of the DNA on the arrays.</p>
<p align="justify">The type of molecule placed on the array units also varies according to circumstances. The most commonly used molecule is cDNA, or complementary DNA, which is derived from messenger RNA and cloned. Since they are derived from a distinct messenger RNA, each feature represents an expressed gene.</p>
<p align="justify"><strong>Microarray-Identifying interactions: </strong></p>
<p align="justify">To detect interactions at microarray features, scientists must label the test sample in such a way that an appropriate instrument can recognize it. Since the minute size of microarray features limits the amount of material that can be located at any feature, detection methods must be extremely sensitive.</p>
<p align="justify">Other than a few low-end systems that use radioactive or chemiluminescent tagging, most microarrays use fluorescent tags as their means of identification. These labels can be delivered to the DNA units in several different ways. One simple and flexible approach involves attaching a fluorophore such as fluorescein or Cy3 to the oligonucleotide layer. While relatively simple, this approach has low sensitivity because it delivers only one unit of label per interaction. Technologists can achieve more sensitivity by multiplexing the labeled entity &#8212; that is, delivering more than one unit of label per interaction.</p>
<p align="justify"><strong>Microarrays and bioinformatics</strong><br />
Experimental Design Due to the biological complexity of gene expression, the considerations of experimental design that are discussed in the expression profiling article are of critical importance if statistically and biologically valid conclusions are to be drawn from the data. Standardization<br />
The lack of standardization in arrays presents an interoperability problem in bioinformatics, which hinders the exchange of array data. Various grass-roots open-source projects are attempting to facilitate the exchange and analysis of data produced with non-proprietary chips. The &#8220;Minimum Information About a Microarray Experiment&#8221; (MIAME) checklist helps define the level of detail that should exist and is being adopted by many journals as a requirement for the submission of papers incorporating microarray results. MIAME describes possible content but is not a format, many formats can in turn support the MIAME requirements yet there is no way to computationally determine semantic compliance.<br />
There is currently an ongoing project being conducted by the FDA to develop standards and quality control metrics which will eventually allow the use of MicroArray data in drug discovery, clinical practice and regulatory decision-making.</p>
<p align="justify"><strong>Statistical analysis</strong><br />
The analysis of DNA microarrays poses a large number of statistical problems, including the normalization of the data. There are dozens of proposed normalization methods in the published literature; as in many other cases where authorities disagree, a sound conservative approach is to try a number of popular normalization methods and compare the conclusions reached: how sensitive are the main conclusions to the method chosen? From a hypothesis-testing perspective, the large number of genes present on a single array means that the experimenter must take into account a multiple testing problem: even if each gene is extremely unlikely to randomly yield a result of interest, the combination of all the genes is likely to show at least one or a few occurrences of this result which are false positives.<br />
A basic difference between microarray data analysis and much traditional biomedical research is the dimensionality of the data. A large clinical study might collect, say, 100 data items per patient for thousands of patients. A medium-size microarray study will obtain many thousands of numbers per sample for perhaps a hundred samples. Many analysis techniques treat each sample as a single point in a space with thousands of dimensions, then attempt by various techniques to reduce the dimensionality of the data to something humans can visualize.</p>
<p align="justify"><strong>Relation between probe and gene</strong><br />
The relation between a probe and the mRNA that it is expected to detect is problematic. On the one hand, some mRNAs may cross-hybridize probes in the array that are supposed to detect another mRNA. On the other hand, probes that are designed to detect the mRNA of a particular gene may be relying on genomic EST information that is incorrectly associated with that gene.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/introduction-to-microarray-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Human Genome Project (HGP) &#8211; facts</title>
		<link>http://bioinformatics.me/human-genome-project-hgp-facts-2/</link>
		<comments>http://bioinformatics.me/human-genome-project-hgp-facts-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:01 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/human-genome-project-hgp-facts-2/</guid>
		<description><![CDATA[
Ten facts from Human Genome Project (HGP)
# There are between 30,000 and 40,000 genes in the human genome.
Some previous estimates suggested there could be 100,000 or more human genes.
# A human being can be made from a gene count only twice as great as that of a fly or worm.
There are 26,000 genes in the [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/humangenomeproject.jpg" alt="humangenomeproject" width="300" height="300" class="aligncenter size-full wp-image-364" /></p>
<p><strong>Ten facts from Human Genome Project (HGP)</strong><br />
# There are between 30,000 and 40,000 genes in the human genome.<br />
Some previous estimates suggested there could be 100,000 or more human genes.<br />
# A human being can be made from a gene count only twice as great as that of a fly or worm.<br />
There are 26,000 genes in the plant thale cress; 18,000 in the nematode worm; 13,000 in a fruitfly, 6,000 in yeast, and 4,000 in the tuberculosis microbe.<br />
# We are not fruitflies or worms because some our genes work differently &#8211; we have more &#8220;control genes.&#8221;<br />
As we trace the increase of complexity from single cell creatures, through small animals like worms and flies, and up to us, what we appear to be adding is control genes. Evolution is not so much adding new genes performing wholly new functions &#8211; what it&#8217;s chiefly doing is to increase the variety and subtlety of genes that control other genes.<br />
# Hundreds of genes appear to have come from bacteria &#8211; one of which has been associated with depression.<br />
We don&#8217;t understand the mechanism of transfer, and indeed it&#8217;s possible that the bacteria have picked up our genes rather than the other way round &#8211; though this seems less likely. But either way it&#8217;s a tremendous reminder of the unity of life, and of the fact that we don&#8217;t live in a cocoon isolated from other species.<br />
# Most mutations occur in males.<br />
It&#8217;s about a two-fold difference. One suggested reason is the larger number of cell divisions in the male germ line (sperm).<br />
# More than one million SNPs have been identified.<br />
Looking at the genetic differences between people &#8211; one variation every 500 to 1,000 bases (letters) &#8211; will usher in a new era of personalised medicine. Currently more than 1.4 million of these variations, known as SNPs (single nucleotide polymorphisms) have been found. Overall, humans are 99.8% genetically similar.</p>
<p>    * The purpose of the 97% of &#8220;junk&#8221; DNA is being discovered.<br />
      We have got stronger hints than before that the repeat family called Alu may play some important function. We have always suspected that we couldn&#8217;t simply divide the genome into 3% of good stuff (genes) and 97% of junk. Here we are beginning to see some of the functions of the &#8216;junk&#8217;. Exactly as one would expect the junk has a function &#8211; rather more diffuse than the hard information carried by the genes, but nevertheless functional in some way. It may help to move genes around.<br />
    * Just 483 existing &#8220;targets&#8221; in the body account for all the pharmaceutical drugs on the market.<br />
      The HGP and the SNPs research will provide thousands of extra &#8220;doorways&#8221; or destinations for new medicines and drugs to work on. Already new ways of tackling asthma, Alzheimer&#8217;s disease and depression are being looked at, using new genetic targets.<br />
    * Understanding of how the body works is dramatically increasing due to HGP knowledge.<br />
      Apart from new drugs, the HGP research is pointing to a vastly increased knowledge of how the human body works &#8211; with better explanations now available for a range of conditions or biological responses. One small example is that the mystery of bitter taste has been solved &#8211; a new family of proteins (which come from genes) that control this response have been found in taste buds.<br />
    * Understanding of how we evolved as human beings is being rapidly advanced through &#8220;genetic archaeology.&#8221;<br />
      Genetic sequencing information is providing more evidence of how we diverged from monkeys 25 million years ago. What genetics is also clearly showing is our close relationship with other life forms. In the words of John Sulston: &#8220;We are confirming Darwin &#8211; that is the most useful take home message from this. It is the unity of life, or Nature being conservative, or the idea of the Blind Watchmaker &#8211; the notion of evolution as a constant reworking or random recombining of parts.&#8221;</p>
<p>DID YOU KNOW ?</p>
<p>    * If the DNA sequence of the human genome were compiled in books, the equivalent of 200 volumes the size of a Manhattan telephone book (at 1000 pages each) would be needed to hold it all.<br />
    * It would take about 9.5 years to read out loud (without stopping) the 3 billion bases in a person&#8217;s genome sequence. This is calculated on a reading rate of 10 bases per second, equaling 600 bases/minute, 36,000 bases/hour, 864,000 bases/day, 315,360,000 bases/year.<br />
    * One million bases (called a megabase and abbreviated Mb) of DNA sequence data is roughly equivalent to 1 megabyte of computer data storage space.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/human-genome-project-hgp-facts-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics Glossary</title>
		<link>http://bioinformatics.me/bioinformatics-glossary-3/</link>
		<comments>http://bioinformatics.me/bioinformatics-glossary-3/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:01 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-glossary-3/</guid>
		<description><![CDATA[
Bioinformatics Glossary
A
Accession number
An identifier supplied by the curators of the major biological databases upon submission of a novel entry that uniquely identifies that sequence (or other) entry.
Active site
The amino acid residues at the catalytic site of an enzyme. These residues provide the binding and activation energy needed to place the substrate into its transition state [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/bioinformatics_glossary.jpeg" alt="bioinformatics_glossary" width="600" height="345" class="aligncenter size-full wp-image-367" /></p>
<p>Bioinformatics Glossary<br />
<strong>A</strong></p>
<p>Accession number</p>
<p>An identifier supplied by the curators of the major biological databases upon submission of a novel entry that uniquely identifies that sequence (or other) entry.</p>
<p>Active site</p>
<p>The amino acid residues at the catalytic site of an enzyme. These residues provide the binding and activation energy needed to place the substrate into its transition state and bridge the energy barrier of the reaction undergoing catalysis</p>
<p>Adenine</p>
<p>A purine base found in DNA and RNA</p>
<p>Agents</p>
<p>Independent, autonomous, software modules that can search the Internet for data or content pertinent to a particular application, such as a gene, protein, or biological system.</p>
<p>Agricultural biotechnology (AgBio)</p>
<p>The application of rDNA technology to agriculturally important plants and organisms.</p>
<p>Algorithm</p>
<p>A series of steps defining a procedure or formula for solving a problem, that can be coded into a programming language and executed. Bioinformatics algorithms typically are used to process, store, analyze, visualize and make predictions from biological data.</p>
<p>Alignment</p>
<p>The result of a comparison of two or more gene or protein sequences in order to determine their degree of base or amino acid similarity. Sequence alignments are used to determine the similarity, homology, function or other degree of relatedness between two or more genes or gene products.</p>
<p>Allele</p>
<p>A given form of a gene that occupies a specific position or locus on a chromosome. Variant forms of genes occurring at the same locus are said to be alleles of one another.</p>
<p>Alternative splicing</p>
<p>One of the alternate combinations of a folded protein that are possible due to by recombination of multiple gene segments during mRNA splicing that occurs in higher organisms.</p>
<p>Alternative splice-form</p>
<p>One of the possible alternate combinations of exons into a folded protein that are possible by recombining multiple gene segments during mRNA splicing in higher organisms.</p>
<p>Alu family</p>
<p>A common set of dispersed DNA sequences found throughout the human genome; each is about 300 bases long and they are repeated at least 500,000 times. Alu sequences are speculated to have originated from viral RNA sequences that integrated into human DNA thousands of years ago.</p>
<p>Amino acid</p>
<p>One of the 20 chemical building blocks that are joined by amide (peptide) linkages to form a polypeptide chain of a protein</p>
<p>Analogy</p>
<p>Reasoning by which the function of a novel gene or protein sequence may be deduced from comparisons with other gene or protein sequences of known function.  Identifying analogous or homologous genes via similarity searching and alignment is one of the chief uses of Bioinformatics. (See also alignment, similarity search.)</p>
<p>Annotation</p>
<p>A combination of comments, notations, references, and citations, either in free format or utilizing a controlled vocabulary, that together describe all the experimental and inferred information about a gene or protein.  Annotations can also be applied to the description of other biological systems.  Batch, automated annotation of bulk biological sequence is one of the key uses of Bioinformatics tools.</p>
<p>Anticodon</p>
<p>The triplet of contiguous bases on tRNA that binds to the codon sequence of nucleotides on mRNA. Example: GGG codes for Glycine.</p>
<p>Antigen</p>
<p>Any foreign molecule that stimulates an immune response in a vertebrate organism. Many antigens are proteins such as the surface proteins of foreign organisms.</p>
<p>Antisense</p>
<p>DNA or RNA composed of the complementary sequence to the target DNA/RNA. Also used to describe a therapeutic strategy that uses antisense DNA or RNA sequences to target specific gene DNA sequences or mRNA implicated in disease, in order to bind and physically inhibit their expression by physically blocking them.</p>
<p>Assay</p>
<p>A method for measuring a biological activity. This may be enzyme activity, binding affinity, or protein turnover. Most assays utilize a measurable parameter such as color, fluorescence or radioactivity to correlate with the biological activity.</p>
<p>Assembly</p>
<p>Compilation of overlapping sequences from one or more related genes that have been clustered together based on their degree of sequence identity or similarity. Sequence assembly may be used to piece together &#8220;shotgun&#8221; sequencing fragments (see shotgun sequencing) based upon overlapping restriction enzyme digests, or may be used to identify and index novel genes from &#8220;single-pass&#8221; cDNA sequencing efforts.</p>
<p>Autoradiography</p>
<p>A method used to locate radioisotope-labeled materials which have been separated in gels or are present in blots. The location of the radiolabeled material is determined by overlaying the test material with a photographic film that is sensitive to the radioisotope.</p>
<p><strong>B</strong></p>
<p>Bacterial artificial chromosome (BAC)</p>
<p>Cloning vector that can incorporate large fragments of DNA. (see YACS)</p>
<p>Bacteriophage</p>
<p>A virus that infects bacteria. The bacteriophage DNA has served as a basis for cloning vectors, and is also utilized to create phage libraries containing human or other genes.</p>
<p>Baculovirus</p>
<p>An insect virus which forms the basis of a protein expression system</p>
<p>Base pair</p>
<p>A pair of nitrogenous bases (a purine and a pyrimidine), held together by hydrogen bonds, that form the core of DNA and RNA i.e the A:T, G:C and A:U interactions.</p>
<p>Beta sheet</p>
<p>A three dimensional arrangement taken up by polypeptide chains that consists of alternating strands linked by hydrogen bonds. The alternating strands together form a sheet that is frequently twisted. One of the secondary structural elements characteristic of proteins.</p>
<p>Bioinformatics</p>
<p>The field of endeavor that relates to the collection, organization and analysis of large amounts of biological data using networks of computers and databases (usually with reference to the genome project and DNA sequence information)</p>
<p>Bivalent</p>
<p>Having two binding sites; having 2 free electrons available for binding.</p>
<p>Blunt-end (ligation)</p>
<p>The joining of DNA fragments that contain no overhang at either end and consequently no DNA bases available for hybridization (cf. sticky-end ligation).</p>
<p><strong>C</strong></p>
<p>Carboxyl group</p>
<p>The -COOH functional group, acidic in nature, found in all amino acids</p>
<p>cDNA (complementary DNA)</p>
<p>A DNA strand copied from mRNA using reverse transcriptase. A cDNA library represents all of the expressed DNA in a cell.</p>
<p>cDNA library</p>
<p>A set of DNA fragments prepared from the total mRNA obtained from a selected cell, tissue or organism.</p>
<p>Cell</p>
<p>The basic unit of any living organism.</p>
<p>Cell Cycle</p>
<p>The life cycle of a cell which is marked by cell division which is separated into four phases: G1, S, G2, and M. DNA replication is confined to the S(synthesis) phase, and chromosomal separation in the M (mitotic) phase .</p>
<p>Chimeric clone</p>
<p>A cloning artifact created by a foreign gene being inserted into a vector in an incorrect orientation resulting in theexpression of a protein consisting of a fusion of two different gene products.</p>
<p>Chromat</p>
<p>Data file output from most popular DNA sequencers. Chromat files consist of the fluorescent traces generated by the sequencer for each of the four chemical bases, A, C, G, and T, together with the sequence and measures of the error in the traces at each sequence position.</p>
<p>Chromatin</p>
<p>The chromosome as it appears in its condensed state, composed of DNA and associated proteins (mainly histones).</p>
<p>Chromosome</p>
<p>The structure in the cell nucleus that contains all of the cellular DNA together with a number of proteins that compact and package the DNA.</p>
<p>Clinical trials</p>
<p>Research studies that involve patients. Biotechnology companies typically use clinical trials to assess the efficacy and safety of new therapies and to answer scientific questions. Typically, there are 3 phases during a clinical trial. Phase I is designed to evaluate the safety of the product in humans; phase II analyses the effects of dose escalation, and phase III definitively evaluates the clinical efficacy of the product.</p>
<p>Clone</p>
<p>A population of genetically identical cells or DNA molecules.</p>
<p>Cloning</p>
<p>The formation of clones or exact genetic replicas.</p>
<p>Cluster</p>
<p>The grouping of similar objects in a multidimensional space.  Clustering is used for constructing new features which are abstractions of the existing features of those objects. The quality of the clustering depends crucially on the distance metric in the space. In bioinformatics, clustering is performed on sequences, high-throughput expression and other experimental data. Clusters of partial or complete gene sequences can be used to identify the complete (contiguous) sequence and to better identify its function. Clustering expression data enables the researcher to discern patterns of co-regulation in groups of genes.</p>
<p>Coding regions (CDS)</p>
<p>The portion of a genomic sequence bounded by start and stop codons that identifies the sequence of the protein being coded for by a particular gene.</p>
<p>Codon</p>
<p>A sequence of three adjacent nucleotides that designates a specific amino acid or start/stop site for transcription.</p>
<p>Combinatorial chemistry</p>
<p>The use of chemical methods to generate all possible combinations of chemicals starting with a subset of compounds. The building blocks may be peptides, nucleic acids or small molecules. The libraries of compounds formed by this methodology are used to probe for new pharmaceutical reagents (see high-throughput screening).</p>
<p>Complementary determining region (CDR)</p>
<p>The hypervariable regions of an antibody molecule, consisting of three loops from the heavy chain and three from the light chain, that together form the antigen-binding site.</p>
<p>Complexity (of gene sequence)</p>
<p>The term &#8220;low complexity sequence&#8221; may be thought of as synonymous with regions of locally biased amino acid composition. In these regions, the sequence composition deviates from the random model thatunderlies the calculation of the statistical significance (P-value) of an alignment.  Such alignments among low complexity sequences are statistically but not biologically significant, i.e., one cannot infer homology (common ancestry) or functional similarity.  </p>
<p>Configuration</p>
<p>(in software) The complete ordering and description of all parts of a software or database system.  Configuration management is the use of software to identify, inventory and maintain the component modules that together comprise one or more systems or products.</p>
<p>Conformation</p>
<p>The precise three-dimensional arrangement of atoms and bonds in a molecule describing its geometry and hence its molecular function.</p>
<p>Consensus sequence</p>
<p>A single sequence delineated  from an alignment of multiple constituent sequences that represents a &#8220;best fit&#8221; for all those sequences. A &#8220;voting&#8221; or other selection procedure is used to determine which residue (nucleotide or amino acid) is placed at a given position in the event that not all of the constituent sequences have the identical residue at that position.</p>
<p>Constitutive synthesis (expression)</p>
<p>Synthesis of mRNA and protein at an unchanging or constant rate regardless of a cell&#8217;s requirements (see housekeeping genes).</p>
<p>Contig</p>
<p>A length of contiguous sequence assembled from partial, overlapping sequences, generated from a &#8220;shotgun&#8221; sequencing project.  Contigs are typically created computationally, by comparing the overlapping ends of several sequencing reads generated by restriction enzyme digestion of a segment of genomic DNA.  The creation of contigs in the presence of sequencing errors, ambiguities and the presence of repeats is one of the most computationally challenging aspects of the role of Bioinformatics in genome analysis.</p>
<p>Convergence</p>
<p>The end-point of any algorithm that uses iteration or recursion to guide a series of data processing steps. An algorithm is usually said to have reached convergence when the difference between the computed and observed steps falls below a pre-defined threshold.</p>
<p>Cosmids</p>
<p>DNA vectors that allow the insertion of long fragments of DNA (up to 50 kbases).</p>
<p>Crystal structure</p>
<p>Term used to describe the high resolution molecular structure derived by x- ray crytallographic analysis of protein or other biomolecular crystals.</p>
<p>Cytoplasm</p>
<p>The medium of the cell between the nucleus and the cell membrane.</p>
<p>Cytosine</p>
<p>A pyrimidine base found in DNA and RNA.</p>
<p><strong>D</strong></p>
<p>Data Cleaning</p>
<p>A process whereby automated or semi-automated algorithms are used to process experimental data, including noise, experimental errors and other artifacts, in order to generate and store high-quality data for use in subsequent analysis. Data cleaning is typically required in high-throughput sequencing where compression or other experimental artifacts limit the amount of sequence data generated from each sequencing run or &#8220;read.&#8221;</p>
<p>Data Mining</p>
<p>The ability to query very large databases in order to satisfy a hypothesis (&#8221;top-down&#8221; data mining); or to interrogate a database in order to generate new hypotheses based on rigorous statistical correlations (&#8221;bottom-up&#8221; data mining).</p>
<p>Data Processing</p>
<p>Data processing is defined as the systematic performance of operations upon data such as handling, merging, sorting, and computing. The semantic content of the original data should not be changed, but the semantic content of the processed data may be changed.</p>
<p>Data Warehouses</p>
<p>Vast arrays of heterogeneous (biological) data, stored within a single logical data repository, that are accessible to different querying and manipulation methods.</p>
<p>Database</p>
<p>Any file system by which data gets stored following a logical process.  (see also relational database)</p>
<p>Deconvolution</p>
<p>Mathematical procedure to separate out the overlapping effects of molecules such as mixtures of compounds in a high-throughput screen, or mixtures of cDNAs in a high density array.</p>
<p>Deletion</p>
<p>A chromosomal alteration in which a portion of the chromosome or the underlying DNA is lost.</p>
<p>Deletion mapping</p>
<p>Process in which different deletions in a region of DNA are created and used to map the functionally critical areas of that DNA. e.g the minimal region of DNA required for a test promoter can be ascertained by systematic deletions in the region of interest.</p>
<p>Dendrogram<br />
A graphical procedure for representing the output of a hierarchical clustering method.  A dendrogram is strictly defined as a binary tree with a distinguished root, that has all the data items at its leaves.  Conventionally, all the leaves are shown at the same level of the drawing.  The ordering of the leaves is arbitrary, as is their horizontal position. The heights of the internal nodes may be arbitrary, or may be related to the metric information used to form the clustering.</p>
<p>Dimer</p>
<p>A composite molecule formed by the binding of two molecules (see homo and heterodimers).</p>
<p>Disulphide bond</p>
<p>Covalent link formed between the sulphur atoms of two different cysteine residues in a protein. Important in maintaining the folded structure of a protein, and also for linking different proteins in a complex.</p>
<p>DNA (deoxyribonucleic acid)</p>
<p>The chemical that forms the basis of the genetic material in virtually all organisms. DNA is composed of the four nitrogenous bases Adenine, Cytosine, Guanine, and Thymine, which are covalently bonded to a backbone of deoxyribose-phosphate to form a DNA strand. Two complementary strands (where all Gs pair with Cs and As with Ts) form a double helical structure which is held together by hydrogen bonding between the cognate bases.</p>
<p>DNA fingerprinting</p>
<p>A technique for identifying human individuals based on a restriction enzyme digest of tandemly repeated DNA sequences that are scattered throughout the human genome, but are unique to each individual.</p>
<p>DNA microarrays</p>
<p>The deposition of oligonucleotides or cDNAs onto an inert substrate such as glass or silicon. Thousands of molecules may be organized spatially into a high-density matrix. These DNA chips may be probed to allow expression monitoring of many thousands of genes simultaneously. Uses include study of polymorphisms in genes, de novo sequencing or molecular diagnosis of disease.</p>
<p>DNA polymerase</p>
<p>An enzyme that catalyzes the synthesis of DNA from a DNA template given the deoxyribonucleotide precursors.</p>
<p>DNA probes</p>
<p>Short single stranded DNA molecules of specific base sequence, labeled either radioactively or immunologically, that are used to detect and identify the complementary base sequence in a gene or genome by hybridizing specifically to that gene or sequence.</p>
<p>DNA sequencing</p>
<p>The technique in which the specific sequence of bases forming a particular DNA region is deciphered.</p>
<p>DNase (Deoxyribonuclease)</p>
<p>One of a series of enzymes that can digest DNA.</p>
<p>Domain (protein)</p>
<p>A region of special biological interest within a single protein sequence. However, a domain may also be defined as a region within the three-dimensional structure of a protein that may encompass regions of several distinct protein sequences that accomplishes a specific function. A domain class is a group of domains that share a common set of well-defined properties or characteristics.</p>
<p>Drug</p>
<p>An agent that affects a biological process. Specifically, a molecule whose molecular structure can be correlated with its pharmacological activity.</p>
<p>Drug discovery cycle</p>
<p>The cycle of events required to develop a new drug. Typically this involves research, preclinical testing and clinical development, and can take from 5 to 12 years.<br />
<strong><br />
E</strong></p>
<p>Electronic Northerns</p>
<p>The use of an electronic database of cDNA sequences (or probes derived from them) in order to measure the relative levels of mRNAs expressed in different cells or tissues. An example of the use of an electronic Northern might be to identify the differences in the genes expressed in prostate cancer and those in benign prostate hyperplasia, by subtracting the database of one from the other and seeing which cDNAs remain.</p>
<p>Electrophoresis</p>
<p>The use of an external electric field to separate large biomolecules on the basis of their charge by running them through acrylamide or agarose gels.</p>
<p>Enhancers</p>
<p>DNA sequences that can greatly increase the transcription rates of genes even though they may be far upstream or downstream from the promoter they stimulate.</p>
<p>Enzyme</p>
<p>A class of proteins that are capable of catalyzing chemical reactions (the making or breaking of chemical bonds). They do so by orienting their substrates into a suitable geometry in a particular location (the active site) where electrophilic or nucleophilic amino acid residues can participate in the reaction. Enzymes are protein catalyst that speeds up chemical reactions that would otherwise be prohibitively slow under physiological conditions.</p>
<p>Epigenomics</p>
<p>The study of complex expression networks or linkages both spatially (within the body) and temporally (at different times in development).</p>
<p>Equilibrium constant</p>
<p>Value that describes the equilibrium state of the reversible reaction between two molecular species.</p>
<p>Eukaryote</p>
<p>A cell or organism with a distinct membrane-bound nucleus as well as specialized membrane-based organelles (see also prokaryote).</p>
<p>Exon</p>
<p>The region of DNA within a gene that codes for a polypeptide chain or domain. Typically a mature protein is composed of several domains coded by different exons within a single gene.</p>
<p>Expressed Sequence Tags (ESTs)</p>
<p>A small sequence from an expressed gene that can be amplified by PCR. ESTs act as physical markers for cloning and full length sequencing of the cDNAs of expressed genes. Typically identified by purifying mRNAs, converting to cDNAs, and then sequencing a portion of the cDNAs.</p>
<p>Expression (gene or protein)</p>
<p>A measure of the presence, amount, and time-course of one or more gene products in a particular cell or tissue.  Expression studies are typically performed at the RNA (mRNA) or protein level in order to determine the number, type, and level of genes that may be up-regulated or down-regulated during a cellular process, in response to an external stimulus, or in sickness or disease.  Gene chips and proteomics now allow the study of expression profiles of sets of genes or even entire genomes.</p>
<p>Expression profile</p>
<p>The level and duration of expression of one or more genes, selected from a particular cell or tissue type, generally obtained by a variety of high-throughput methods, such as sample sequencing, serial analysis, or microarray-based detection.</p>
<p>Expression vector</p>
<p>A cloning vector that is engineered to allow the expression of protein from a cDNA. The expression vector provides an appropriate promoter and restriction sites that allow insertion of cDNA.</p>
<p><strong>F</strong></p>
<p>Fingerprint</p>
<p>A fingerprint is a set of motifs used to predict the occurrence of similar motifs, in either an individual sequence or in a database. Fingerprints are refined by iterative scanning of a composite protein sequence database.  A composite or multiple-motif fingerprint contains a number of aligned motifs taken from different parts of a multiple alignment.  True family members are then easy to identify by virtue of possessing all elements of the fingerprint, while subfamily members may be identified by possessing only part of it.</p>
<p>Frameshift</p>
<p>A deletion, substitution, or duplication of one or more bases that causes the reading-frame of a structural gene to shift from the normal series of triplets.</p>
<p>Functional genomics</p>
<p>The use of genomic information to delineate protein structure, function, pathways and networks. Function may be determined by &#8220;knocking out&#8221; or &#8220;knocking in&#8221; expressed genes in model organisms such as worm, fruitfly, yeast or mouse.</p>
<p>Fusion protein</p>
<p>The protein resulting from the genetic joining and expression of 2 different genes (see chimeric)</p>
<p><strong>G</strong></p>
<p>Gaps (affine gaps)</p>
<p>A gap is defined as any maximal, consecutive run of spaces in a single string of a given alignment. Gaps help create alignments that better conform to underlying biological models and more closely fit patterns that one expects to find in meaningful alignment. The idea is to take in account the number of continuous gaps and not only the number of spaces when calculating an alignment. Affine gaps contain a component for gap insertion and a component for gap extension, where the extension penalty is usually much lower than the insertion penalty. This mimics biological reality as multiple gaps would imply multiple mutations, but a single mutation can lead to a long gap quite easily.</p>
<p>Gap penalties</p>
<p>The penalty applied to a similarity score for the introduction of an insertion or deletion gap, the extension of a gap, or both. Gap penalties are usually subtracted from a cumulative score being determined for the comparison of two or more sequences via an optimization algorithm that attempts to maximize that score.</p>
<p>Gel electrophoresis</p>
<p>A technique by which molecules are separated by size or charge by passing them through a gel under the influence of an external electric field.</p>
<p>Gene Index</p>
<p>A listing of the number, type, label and sequence of all the genes identified within the genome of a given organism. Gene indices are usually created by assembling overlapping EST sequences into clusters, and then determining if each cluster corresponds to a unique gene. Methods by which a cluster can be identified as representing a unique gene include identification of long open reading frames (ORFs), comparison to genomic sequence, and detection of SNPs or other features in the cluster that are known to exist in the gene.  </p>
<p>GenBank</p>
<p>Data bank of genetic sequences operated by a division of the National Institutes of Health.</p>
<p>Gene</p>
<p>Classically, a unit of inheritance. In practice, a gene is a segment of DNA on a chromosome that encodes a protein and all the regulatory sequences (promoter) required to control expression of that protein.</p>
<p>Gene chips (also Gene arrays)</p>
<p>The covalent attachment of oligonucleotides or cDNA directly onto a small glass or silicon chip in organized arrays. Over 50,000 different DNA fragments can be presented on a single chip providing a high throughput parallel method of probing gene expression, genotype or gene function.</p>
<p>Gene expression</p>
<p>The conversion of information from gene to protein via transcription and translation.</p>
<p>Gene families</p>
<p>Subsets of genes containing homologous sequences which usually correlate with a common function.</p>
<p>Gene library</p>
<p>A collection of cloned DNA fragments created by restriction endonuclease digestion that represent part or all of an organism&#8217;s genome.</p>
<p>Gene product</p>
<p>The product, either RNA or protein, that results from expression of a gene. The amount of gene product reflects the activity of the gene.</p>
<p>Gene therapy</p>
<p>The use of genetic material for therapeutic purposes. The therapeutic gene is typically delivered using recombinant virus or liposome based delivery systems.</p>
<p>Genetic code</p>
<p>The mapping of all possible codons into the 20 amino acids including the start and stop codons.</p>
<p>Genetic engineering (Recombinant DNA technology)</p>
<p>The procedures used to isolate, splice and manipulate DNA outside the cell. Genetic Engineering allows a recombinantly engineered DNA segment to be introduced into a foreign cell or organism, and be able to replicate and function normally.</p>
<p>Genetic marker</p>
<p>Any gene that can be readily recognized by its phenotypic effect, and which can be used as a marker for a cell, chromosome, or individual carrying that gene. Also, any detectable polymorphism used to identify a specific gene.</p>
<p>Genome</p>
<p>The complete genetic content of an organism.</p>
<p>Genomic DNA (sequence)</p>
<p>DNA sequence typically obtained from mammalian or other higher-order species, which includes both intron and exon sequence (coding sequence), as well as non-coding regulatory sequences such as promoter, and enhancer sequences.</p>
<p>Genomics</p>
<p>The analysis of the entire genome of a chosen organism.</p>
<p>Genotype</p>
<p>Strictly, all of the genes possessed by an individual. In practice, the particular alleles present in a specific genetic locus.</p>
<p>Glycosylation</p>
<p>The addition of carbohydrate groups (sugars) e.g. to polypeptide chains</p>
<p>Guanine (G)</p>
<p>One of the nitrogenous purine bases found in DNA and RNA</p>
<p><strong>H</strong></p>
<p>Hairpin</p>
<p>A double-helical region in a single DNA or RNA strand formed by the hydrogen-bonding between adjacent inverse complementary sequences to form a hairpin shaped structure.</p>
<p>Haploid</p>
<p>A cell or organism containing only one set of chromsomes without the homologous pairs. (cf. diploid)</p>
<p>Heterodimer</p>
<p>Protein composed of 2 different chains or subunits .</p>
<p>Heteroduplex</p>
<p>Hybrid structure formed by the annealing of two DNA strands (or an RNA and DNA) that have sufficient complementarity in their sequence to allow hydrogen bonding.</p>
<p>Hidden Markov model (HMM)</p>
<p>A joint statistical model for an ordered sequence of variables.  The result of stochastically perturbing the variables in a Markov chain (the original variables are thus &#8220;hidden&#8221;), where the Markov chain has discrete variables which select the &#8220;state&#8221; of the HMM at each step. The perturbed values can be continuous and are the &#8220;outputs&#8221; of the HMM. A Hidden Markov Model is equivalently a coupled mixture model where the joint distribution over states is a Markov chain. Hidden Markov models are valuable in bioinformatics because they allow a search or alignment algorithm to be trained using unaligned or unweighted input sequences; and because they allow position-dependent scoring parameters such as gap penalties, thus more accurately modeling the consequences of evolutionary events on sequence families.</p>
<p>High-throughput screening</p>
<p>The method by which very large numbers of compounds are screened against a putative drug target in either cell-free or whole-cell assays. Typically, these screenings are carried out in 96 well plates using automated, robotic station based technologies or in higher- density array (&#8221;chip&#8221;) formats.</p>
<p>HLA complex</p>
<p>Another name for the MHC in humans; refers to the &#8220;Human Leukocyte Antigen&#8221; complex located on chromosome 6.</p>
<p>Homeobox</p>
<p>A highly conserved region in a homeotic gene composed of 180 bases (60 amino acids) that specifies a protein domain (the homeodomain) that serves as a master genetic regulatory element in cell differentiation during development in species as diverse as worms, fruitflies, and humans.</p>
<p>Homeodomain</p>
<p>A 60 amino-acid protein domain coded for by the homeobox region of a homeotic gene.</p>
<p>Homeotic gene</p>
<p>A gene that controls the activity of other genes involved in the development of a body plan. Homeotic genes have been found in organisms ranging from plants to humans.</p>
<p>Homology</p>
<p>(strict) Two or more biological species, systems or molecules that share a common evolutionary ancestor. (general) Two or more gene or protein sequences that share a significant degree of similarity, typically measured by the amount of identity (in the case of DNA), or conservative replacements (in the case of protein), that they register along their lengths. Sequence &#8220;homology&#8221; searches are typically performed with a query DNA or protein sequence to identify known genes or gene products that share significant similarity and hence might inform on the ancestry, heritage and possible function of the query gene.</p>
<p>Housekeeping genes</p>
<p>Genes that are always expressed (ie. they are said to be constitutively expressed) due to their constant requirement by the cell.</p>
<p>Human Anti-Murine Antibody Response (HAMA)</p>
<p>An immune response generated in humans to antibodies raised in murine (e.g. mouse or rat) cells.</p>
<p>Hybridization</p>
<p>The interaction of complementary nucleic acid strands. This can occur between two DNA strands or between DNA and RNA strands, and is the basis of many techniques such as Southern and northern blots.</p>
<p>Hydrogen bond</p>
<p>A weak chemical interaction between an electronegative atom (e.g. nitrogen or oxygen) and a hydrogen atom that is covalently attached to another atom. This bond maintains the two-helices of DNA together and is also the primary interaction between water molecules.</p>
<p>Hydrophilicity</p>
<p>(lit. water-loving) The degree to which a molecule is soluble in water. Hydrophilicity depends to a large degree on the charge and polarizability of the molecule and its ability to form transient hydrogen-bonds with (polar) water molecules.</p>
<p>Hydrophobicity</p>
<p>(lit. water-hating) The degree to which a molecule is insoluble in water, and hence is soluble in lipids. If a molecule lacking polar groups is placed in water, it will be entropically driven to finding a hyrdophobic environment (such as the interior of a protein or a membrane).</p>
<p><strong>I</strong></p>
<p>Idiotype</p>
<p>Antibody variants localized to the variable portion of an immunoglobulin that are recognised by their antigenic determinants. The determinants are composed from the antigen-combining site or CDRs. Every unique antigenic determinant has a specific antibody with its own unique idiotype.</p>
<p>Immunoglobulin</p>
<p>A member of the globulin protein family consisting of two light and two heavy chains linked by disulfide bonds. All antibodies are immunoglobulins.</p>
<p>in silico (biology)</p>
<p>(Lit. computer mediated). The use of computers to simulate, process, or analyse a biological experiment.</p>
<p>in situ hybridization</p>
<p>A variation of the DNA/RNA hybridization procedure in which the denatured DNA is in place in the cell and is then challenged with RNA or DNA extracted from another source. (See also fluorescence in situ hybridization).</p>
<p>Integration</p>
<p>The physical insertion of DNA into the host cell genome. The process is used by retroviruses where a specific enzyme catalyses the process or can occur at random sites with other DNA (eg. transposons).</p>
<p>Intracellular signalling</p>
<p>The communication of a molecular message from the surface of the cell to the nucleus via the participation of a series of molecules, including receptors, enzymes, proteins, and small-molecules. The end result of the signalling process is the up- or down-regulation of a particular series of genes that may be involved in cell growth, division or differentiation.</p>
<p>Introns</p>
<p>Nucleotide sequences found in the structural genes of eukaryotes that are non-coding and interrupt the sequences containing information that codes for polypeptide chains. Intron sequences are spliced out of their RNA transcripts before maturation and protein synthesis. (cf. Exons)</p>
<p>Isoschizomers</p>
<p>Two different restriction enzymes which recognize and cut DNA at the same recognition site. e.g Sma I and Xma I both recognize and cut the sequence CCCGGG.</p>
<p>Isozymes</p>
<p>Two or more enzymes capable of catalyzing the same reaction but varying in their specificity due to differences in their structures and hence their efficiencies under different environmental conditions.</p>
<p>Iteration</p>
<p>A series of steps in an algorithm whereby the processing of data is performed repetitively until the result exceeds a particular threshold. Iteration is often used in multiple sequence alignments whereby each set of pairwise alignments are compared with every other, starting with the most similar pairs and progressing to the least similar, until there are no longer any sequence-pairs remaining to be aligned.</p>
<p><strong>J</strong></p>
<p>Junk DNA</p>
<p>Term used to describe the excess DNA that is present in the genome beyond that required to encode proteins. A misleading term since these regions are likely to be involved in gene regulation, and other as yet unidentified functions.</p>
<p>K</p>
<p>Karyotype</p>
<p>The constitution (typically number and size) of chromosomes in a cell or individual.</p>
<p>Knockout mice (gene targeting)</p>
<p>Mice which have been engineered to lack a chosen gene. The gene is inactivated in so called embryonic stem cells using the technique of homologous recombination. These cells are then introduced into a early stage embryo (blastocyst) and this is then transplanted into a recipient mouse. The subsequent progeny lack the targeted gene in some cells. This technique is used to determine the function of the chosen gene.</p>
<p>L</p>
<p>&#8220;Lab on a chip&#8221;</p>
<p>Term describing microdevices that allow rapid, microanalytical analysis of DNA or protein in a single, fully integrated system. Typically, these devices are miniature surfaces, made of silicon, glass or plastic, which carry the necessary microdevices (pumps, valves, microfluidic controllers, and detectors) that allow sample separation and analysis. These devices are used in drug discovery, genetic testing and separation science.</p>
<p>Lead compound</p>
<p>A candidate compound identified as the best &#8220;hit&#8221; (tight binder) after screening of a combinatorial (or other) compound library, that is then taken into further rounds of screening to determine its suitability as a drug.</p>
<p>Lead optimization</p>
<p>The process of converting a putative lead compound (&#8221;hit&#8221;) into a therapeutic drug with maximal activity and minimal side affects, typically using a combination of computer-based drug design, medicinal chemistry and pharmacology.</p>
<p>Leucine zipper</p>
<p>Protein motif which binds DNA in which 4-5 Leucines are found at 7 amino acid intervals. This motif is present typically in transcription factors and other proteins that bind DNA.</p>
<p>Lexicon</p>
<p>In Bioinformatics, a lexicon refers to a pre-defined list of terms that together completely define the contents of a particular database.<br />
(strict.) The component in the grammar which is in bare form a list of words or lexical entries.</p>
<p>Library</p>
<p>A large collection of compounds, peptides, cDNAs or genes which may be screened in order to isolate cognate molecules.</p>
<p>Ligand</p>
<p>Any small molecule that binds to a protein or receptor; the cognate partner of many cellular proteins, enzymes, and receptors.</p>
<p>Linkage</p>
<p>The association of genes (or genetic loci) on the same chromosome. Genes that are linked together tend to be transmitted together.</p>
<p>Linkage map</p>
<p>A genetic map of a chromosome or genome delineated by mapping the positions of genes to their chromosomes by their linkage to readily identifiable genetic loci.</p>
<p>Locus</p>
<p>The specific position occupied by a gene on a chromosome. At a given locus, any one of the variant forms of a gene may be present. The variants are said to be alleles of that gene.</p>
<p>M</p>
<p>Map unit</p>
<p>A measure of genetic distance between two linked genes that corresponds to a recombination frequency of 1%.</p>
<p>Markov chain</p>
<p>Any multivariate probability density whose independence diagram is a chain.The variables are ordered, and each variable &#8220;depends&#8221; only on its neighbors in the sense of being conditionally independent of the others.  Markov chains are an integral component of hidden Markov models.</p>
<p>Meiosis</p>
<p>A process within the cell nucleus that results in the reduction of the chromosome number from diploid (two copies of each chromosome) to haploid (a single copy) through two reductive divisions in germ cells.</p>
<p>Melting (of DNA)</p>
<p>The denaturation of double-stranded DNA into two single strands by the application of heat. (Denaturation breaks the hydrogen bonds holding the double-stranded DNA together).</p>
<p>Messenger RNA (mRNA)</p>
<p>The complementary RNA copy of DNA formed from a single-stranded DNA template during transcription that migrates from the nucleus to the cytoplasm where it is processed into a sequence carrying the information to code for a polypeptide domain.</p>
<p>Methylation</p>
<p>The addition of -CH3 (methyl) groups to a target site. Typically such addition occurs on to the cytosine bases of DNA. (see maternal imprinting).</p>
<p>Microarray</p>
<p>A 2D array, typically on a glass, filter, or silicon wafer, upon which genes or gene fragments are deposited or synthesized in a predetermined spatial order allowing them to be made available as probes in a high-throughput, parallel manner.</p>
<p>Microfluidics</p>
<p>The miniaturization of chemical reactions or pharmacalogical assays into microscopic tubes or vessels in order to greatly increase their throughput, by placing many of them side-by-side in an array.</p>
<p>Mimetics</p>
<p>Compounds that mimic the function of other molecules via their high degree of structural (conformational) similarity, and hence physio-chemical properties.</p>
<p>Missense mutation</p>
<p>A point mutation in which one codon (triplet of bases) is changed into another designating a different amino acid.</p>
<p>Mitosis</p>
<p>The nuclear division that results in the replication of the genetic material and its redistribution into each of the daughter cells during cell division.</p>
<p>Modeling</p>
<p>In bioinformatics, modeling usually refers to molecular modeling, a process whereby the three-dimensional architecture of biological molecules is interpreted (or predicted), visually represented, and manipulated in order to determine their molecular properties. (general) A series of mathematical equations or procedures which simulate a real-life process, given a set of assumptions, boundary parameters, and initial conditions.</p>
<p>Monomer</p>
<p>A single unit of any biological molecule or macromolecule, such as an amino acid, nucleic acid, polypeptide domain, or protein.</p>
<p>Monovalent</p>
<p>Having one binding site; strictly, an atom with only one free electron available for binding in its highest energy shell.</p>
<p>Motif</p>
<p>A conserved element of a protein sequence alignment that usually correlates with a particular function. Motifs are generated from a local multiple protein sequence alignment corresponding to a region whose function or structure is known. It is sufficient that it is conserved, and is hence likely to be predictive of any subsequent occurrence of such a structural/functional region in any other novel protein sequence.</p>
<p>Multigene family</p>
<p>A set of genes derived by duplication of an ancestral gene, followed by independent mutational events resulting in a series of independent genes either clustered together on a chromosome or dispersed throughout the genome.</p>
<p>Multiple (sequence) alignment</p>
<p>A Multiple Alignment of k sequences is a rectangular array, consisting of characters taken from the alphabet A , that satisfies the following conditions: There are exactly k rows; ignoring the gap character, row number i is exactly the sequence s I ; and each column contains at least one character different from &#8220;-&#8221;. In practice multiple sequence alignments include a cost/weight function, that defines the penalty for the insertion of gaps (the &#8220;-&#8221; character) and weights identities and conservative substitutions accordingly. Multiple alignment algorithms attempt to create the optimal alignment defined as the one with the lowest cost/weight score.</p>
<p>Multiplex sequencing</p>
<p>Approach to high-throughput sequencing that uses several pooled DNA samples run through gels simultaneously and then separated and analyzed.</p>
<p>Mutagen</p>
<p>Any agent that can cause an increase in the rate of mutations in an organism.</p>
<p>Mutation</p>
<p>An inheritable alteration to the genome that includes genetic (point or single base) changes, or larger scale alterations such as chromosomal deletions or rearrangements. </p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-glossary-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics Glossary</title>
		<link>http://bioinformatics.me/bioinformatics-glossary-4/</link>
		<comments>http://bioinformatics.me/bioinformatics-glossary-4/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:01 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-glossary-4/</guid>
		<description><![CDATA[
N 
Naked DNA 
Pure, isolated DNA devoid of any proteins that may bind to it.
NCEs (New Chemical Entity) 
Compounds identified as potential drugs that are sent from research and development into clinical trials to determine their suitability . 
Nested PCR 
The second round amplification of an already PCR-amplified sequence using a new pair of primers [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-367" src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/bioinformatics_glossary.jpeg" alt="bioinformatics_glossary" width="600" height="345" /></p>
<p align="center"><strong>N </strong></p>
<p align="justify"><strong>Naked DNA </strong></p>
<p align="justify">Pure, isolated DNA devoid of any proteins that may bind to it.</p>
<p align="justify"><strong>NCEs (New Chemical Entity) </strong></p>
<p align="justify">Compounds identified as potential drugs that are sent from research and development into clinical trials to determine their suitability <strong>. </strong></p>
<p align="justify"><strong>Nested PCR </strong></p>
<p align="justify">The second round amplification of an already PCR-amplified sequence using a new pair of primers which are internal to the original primers. Typically done when a single PCR reaction generates insufficient amounts of product.</p>
<p align="justify"><strong>Neural net </strong></p>
<p align="justify">A neural net is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal brain. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns. Neural nets are used in bioinformatics to map data and make predictions, such as taking a multiple alignment of a protein family as a training set in order to identify novel members of the family from their sequence data alone.</p>
<p align="justify"><strong>Nonsense mutation </strong></p>
<p align="justify">A point mutation in which a codon specific for an amino-acid is converted into a nonsense codon.</p>
<p align="justify"><strong>Northern blotting </strong></p>
<p align="justify">A technique to identify RNA molecules by hybridization that is analogous to Southern blotting (see Southern blotting).</p>
<p align="justify"><strong>Nuclease </strong></p>
<p align="justify">Any enzyme that can cleave the phosphodiester bonds of nucleic acid backbones.</p>
<p align="justify"><strong>Nucleoside </strong></p>
<p align="justify">A five-carbon sugar covalently attached to a nitrogen base.</p>
<p align="justify"><strong>Nucleotide </strong></p>
<p align="justify">A nucleic acid unit composed of a five carbon sugar joined to a phosphate group and a nitrogen base.</p>
<p align="center"><strong>O </strong></p>
<p align="justify"><strong>Object-Relational Database </strong></p>
<p align="justify">Object databases combine the elements of object orientation and object-oriented programming languages with database capabilities. They provide more than persistent storage of programming language objects. Object databases extend the functionality of object programming languages (e.g., C++, Smalltalk, or Java) to provide full-featured database programming capability. The result is a high level of congruence between the data model for the application and the data model of the database.  Object-relational databases are used in Bioinformatics to map molecular biological objects (such as sequences, structures, maps and pathways) to their underlying representations (typically within the rows and columns of relational database tables.) This enables the user to deal with the biological objects in a more intuitive manner, as they would in the laboratory, without having to worry about the underlying data model of their representation.</p>
<p align="justify"><strong>Oligonucleotide </strong></p>
<p align="justify">A short molecule consisting of several linked nucleotides (typically between 10 and 60) covalently attached by phosphodiester bonds.</p>
<p align="justify"><strong>Open reading frame (ORF) </strong></p>
<p align="justify">Any stretch of DNA that potentially encodes a protein. Open reading frames start with a start codon, and end with a termination codon. No termination codons may be present internally. The identification of an ORF is the first indication that a segment of DNA may be part of a functional gene.</p>
<p align="justify"><strong>Operator </strong></p>
<p align="justify">A segment of DNA that interacts with the products of regulatory genes and facilitates the transcription of one or more structural genes.</p>
<p align="justify"><strong>Operon </strong></p>
<p align="justify">A unit of transcription consisting of one or more structural genes, an operator, and a promoter.</p>
<p align="justify"><strong>Ortholog </strong></p>
<p align="justify">Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. (See also Paralogs.)</p>
<p align="justify"><strong>Overlapping clones </strong></p>
<p align="justify">Collection of cloned sequences made by generating randomly overlapping DNA fragments with infrequently cutting restriction enzymes.</p>
<p align="center"><strong>P </strong></p>
<p align="justify"><strong>Palindrome </strong></p>
<p align="justify">A region of DNA with a symmetrical arrangement of bases occuring about a single point such that the base sequences on either side of that point are identical (if the strands are both read in the same direction) e.g 5&#8242; GAATTC 3&#8242; whose complementary sequence is 3&#8242; CTTAAG 5&#8242;.</p>
<p align="justify"><strong>Pattern </strong></p>
<p align="justify">Molecular biological patterns usually occur at the level of the characters making up the gene or protein sequence. A pattern language must be defined in order to apply different criteria to different positions of a sequence. In order to have position-specific comparison done by a computer, a pattern-matching algorithm must allow alternative residues at a given position, repetitions of a residue, exclusion of alternative residues, weighting, and ideally, combinatorial representation.</p>
<p align="justify"><strong>Pathways </strong></p>
<p align="justify">Bioinformatics strives to define representations of key biological datatypes, algorithms and inference procedures, including sequences, structures, biological pathways and reactions. Representing and computing with biological pathways requires ontologies for representing pathway knowledge; User interfaces to these databases; Physico-chemical properties of enzymes and their substrates in pathways; And pathway analysis of whole genomes including identifying common patterns across species and species differences.</p>
<p align="justify"><strong>Paralog </strong><br />
Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.</p>
<p align="justify"><strong>Parameters </strong></p>
<p align="justify">Parameters are user-selectable values, typically experimentally determined, that govern the boundaries of an algorithm or program. For instance, selection of the appropriate input parameters governs the success of a search algorithm. Some of the most common search parameters in bioinformatics tools include the stringency of an alignment search tool, and the weights (penalties) provided for mismatches and gaps.</p>
<p align="justify"><strong>Peptide </strong></p>
<p align="justify">A short stretch of amino acids each covalently coupled by a peptide (amide) bond.</p>
<p align="justify"><strong>Peptide bond (amide bond) </strong></p>
<p align="justify">A covalent bond formed between two amino acids when the amino group of one is linked to the carboxy group of another (resulting in the elimination of one water molecule).</p>
<p align="justify"><strong>Phage (Bacteriophage) </strong></p>
<p align="justify">A virus that infects bacterial cells and serves as a useful vector for introducing genes into bacteria for a number of purposes.</p>
<p align="justify"><strong>Phage display </strong></p>
<p align="justify">A technique in which phage are engineered to fuse a foreign peptide or protein with their capsid (surface) proteins and hence display it on their cell surfaces. The immobilized phage may then be used as a screen to see what ligands bind to the expressed fusion protein exhibited (displayed) on the phage surface.</p>
<p align="justify"><strong>Pharmacogenomics </strong></p>
<p align="justify">The use of (DNA-based) genotyping in order to target pharmaceutical agents to specific patient populations. Genetic differences are known to affect responses to many types of drug therapy, and pharmacogenomics analysis serves to customize the use of pharmaceuticals for specific subgroups of patients.The rationale for this approach is that observed gene expression differences may correlate with, and explain, the differences in side effects and efficacy to drugs in humans.</p>
<p align="justify"><strong>Pharmacophore </strong></p>
<p align="justify">The three dimensional spatial arrangment of atoms, substituents, functional groups, or chemical features that together are sufficient to describe the pharmacologically active components of a drug molecule or molecule series.</p>
<p align="justify"><strong>Phenotype </strong></p>
<p align="justify">Any observable feature of an organism that is the result of one or more genes.</p>
<p align="justify"><strong>Phylum </strong></p>
<p align="justify">The segmentation of the animal kingdom into about 30 major groups collectively known as phyla. The members of each phylum share the same basic structure and organization. For instance, fish, birds, and human beings belong to one phylum &#8211; the Chordata &#8211; because all have spinal cords.</p>
<p align="justify"><strong>Physical map </strong></p>
<p align="justify">A physical map consists of a linearly ordered set of DNA fragments encompassing the genome or region of interest. Physical maps are of two types, macro-restriction maps and ordered clone maps. The former consists of an ordered set of large DNA fragments generated by using restriction enzymes whose recognition sequences are infrequently represented in the genome. An ordered clone map consists of an overlapping collection of cloned DNA fragments. The DNA may be cloned into any one of the available vector systems&#8211;YACs, cosmids, phage, or even plasmids. Major advantages of ordered clone<br />
maps are that they are of high resolution and directly provide the clones for further study.</p>
<p align="justify"><strong>Plasmid </strong></p>
<p align="justify">Any replicating DNA element that can exist in the cell independently of the chromosomes. Synthetic plasmids are used for DNA cloning. Most commonly found in bacterial cells.</p>
<p align="justify"><strong>Pleitropy </strong></p>
<p align="justify">The multiple effects on an organism&#8217;s phenotype due to a single gene or allele e.g the cytokines which can bind to multiple cellular receptors and effect growth and multiple immune pathways.</p>
<p align="justify"><strong>Point mutation </strong></p>
<p align="justify">A mutation in which a single nucleotide in a DNA sequence is substituted by another nucleotide.</p>
<p align="justify"><strong>Poly(A) tail </strong></p>
<p align="justify">The stretch of Adenine (A) residues at the 3&#8242; end of eukaryotic mRNA that is added to the pre-mRNA as it is processed, before its transport from the nucleus to the cytoplasm and subsequent translation at the ribosome.</p>
<p align="justify"><strong>Polyadenylation site </strong></p>
<p align="justify">A site on the 3&#8242;-end of messenger RNA (mRNA) that signals the addition of a series of Adenines during the RNA processing step and before the mRNA migrates to the cytoplasm.  These so-called poly(A) &#8220;tails&#8221; increase mRNA stability andallow one to isolate mRNA from cells by PCR-amplification using poly(T) primers.</p>
<p align="justify"><strong>Polygenic inheritance </strong></p>
<p align="justify">Inheritance involving alleles at many genetic loci.</p>
<p align="justify"><strong>Polymerase chain reaction (PCR ) </strong></p>
<p align="justify">Technique used to amplify or generate large amounts of replica DNA of a segment of any DNA whose &#8220;flanking&#8221; sequences are known. Oligonucleotide primers which bind these flanking sequences are used by an enzyme (Taq polymerase) to copy the sequence in between the primers. Cycles of heat to break apart the DNA strands, cooling to allow the primers to bind, and heating again to allow the enzyme to copy the intervening sequence lead to a doubling of DNA at each cycle. The reactions are typically carried out on a regulated heating block and consist of 30-35 cycles of repeated amplification of all the DNA present. Single molecules of &#8220;target&#8221; DNA can be amplified to microgram amounts of DNA. The target DNA can be of any origin.</p>
<p align="justify"><strong>Polymorphism </strong></p>
<p align="justify">(lit. many forms) The existence of a gene in a population in at least two different forms at a frequency far higher than that attributable to recurrent mutation alone. Variations in a population may be measured by determining the rate of mutation in polymorphic genes (see SNPs).</p>
<p align="justify"><strong>Polypeptide </strong></p>
<p align="justify">A single chain of covalently attached amino acids joined by peptide bonds. Polypeptide chains usually fold into a compact, stable form (a domain) that is part (or all) of the final protein.</p>
<p align="justify"><strong>Positional cloning </strong></p>
<p align="justify">Method used to define the location of a gene on a chromosome and use this information to identify and clone the gene. The location of the gene is determined by linkage analysis of DNA from a large family containing afflicted and normal members to identify linkages between the transmission of the disease gene and observable genetic markers. This information is then used to screen (by chromosomal jumping and walking) the location for putative genes. The disease gene must be compared between the afflicted and normal family members and be shown to be different in the two groups. The full sequencing of the gene will then provide information regarding the characteristics and function of the gene product, and a potential explanation for the cause of the disease.</p>
<p align="justify"><strong>Post-transcriptional modification </strong></p>
<p align="justify">Alterations made to pre-mRNA before it leaves the nucleus and becomes mature mRNA.</p>
<p align="justify"><strong>Post-translational modification </strong></p>
<p align="justify">Alterations made to a protein after its synthesis at the ribosome. These modifications, such as the addition of carbohydrate or fatty acid chains, may be critical to the function of the protein.</p>
<p align="justify"><strong>Primary sequence (protein) </strong></p>
<p align="justify">The linear sequence of a polypeptide or protein.</p>
<p align="justify"><strong>Primary structure (protein) </strong></p>
<p align="justify">see primary sequence.</p>
<p align="justify"><strong>Primer </strong></p>
<p align="justify">A short oligonucleotide that provides a free 3&#8242; hydroxyl for DNA or RNA synthesis by the appropriate polymerase (DNA polymerase or RNA polymerase).</p>
<p align="justify"><strong>Probe </strong></p>
<p align="justify">Any biochemical that is labelled or tagged in some way so that it can be used to identify or isolate a gene, RNA, or protein.</p>
<p align="justify"><strong>Profile </strong></p>
<p align="justify">Sequence profiles are usually derived from multiple alignments of sequences with a known relationship, and consist of tables of position-specific scores and gap-penalties. Each position in the profile contains scores for all of the possible amino acids, as well as one penalty score for opening and one for continuing a gap at the specified position. Attempts have been made to further improve the sensitivity of the profile by refining the procedures to construct a profile starting from a given multiple alignment. Other representations for sequence domains or motifs do not necessarily require the presence of a correct and complete multiple alignment, such as hidden Markov models.</p>
<p align="justify"><strong>Prokaryote </strong></p>
<p align="justify">An organism or cell that lacks a membrane-bounded nucleus. Bacteria and blue-green algae are the only surviving prokaryotes (cf. Eukaryote).</p>
<p align="justify"><strong>Promoter (site) </strong></p>
<p align="justify">A promoter site is defined by its recognition by eukaryotic RNA polymerase II; its activity in a higher eukaryote; by experimentally evidence, or homology and sufficient similarity to an experimentally defined promoter; and by observed biological function.</p>
<p align="justify"><strong>Protein families </strong></p>
<p align="justify">Sets of proteins that share a common evolutionary origin reflected by their relatedness in function which is usually reflected by similarities in sequence, or in primary, secondary or tertiary structure. Subsets of proteins with related structure and function.</p>
<p align="justify"><strong>Proteome </strong></p>
<p align="justify">The entire protein complement of a given organism.</p>
<p align="justify"><strong>Proteomics </strong></p>
<p align="justify">The study of the proteome. Typically, the cataloging of all the expressed proteins in a particular cell or tissue type, obtained by identifying the proteins from cell extracts using a combination of 2D gel electrophoresis and mass spectrometry. The large scale analysis of the protein composition and function. (cf genomics)</p>
<p align="justify"><strong>Purine </strong></p>
<p align="justify">A nitrogen-containing compound with a double-ring structure. The parent compound of Adenine and Guanine.</p>
<p align="justify"><strong>Pyrimidine </strong></p>
<p align="justify">A nitrogen-containing compound with a single six-membered ring structure. The parent compound of Thymidine and Cytosine.</p>
<p align="justify">
<p align="center"><strong>Q </strong></p>
<p align="justify"><strong>Query (sequence) </strong></p>
<p align="justify">A DNA, RNA of protein sequence used to search a sequence database in order to identify close or remote family members (homologs) of known function, or sequences with similar active sites or regions (analogs), from whom the function of the query may be deduced.</p>
<p align="center"><strong>R </strong></p>
<p align="justify"><strong>Rational drug design (Structure based drug design) </strong></p>
<p align="justify">The development of drugs based on the 3-dimensional molecular structure of a particular target.</p>
<p align="justify"><strong>Reading frame </strong></p>
<p align="justify">A sequence of codons beginning with an intiation codon and ending with a termination codon, typically of at least 150 bases (50 amino acids) coding for a polypeptide or protein chain (see ORF and URF).</p>
<p align="justify"><strong>Reagents </strong></p>
<p align="justify">Sources of biological or chemical material that can be used as the starting blocks in laboratory experiments. Reagents can range from chemicals needed to perform a particular chemical reaction, constituents of a laboratory protocol, or clones to be used in a large-scale gene expression study.</p>
<p align="justify"><strong>Recessive </strong></p>
<p align="justify">Any trait that is expressed phenotypically only when present on both alleles of a gene (cf dominant).</p>
<p align="justify"><strong>Recombinant DNA (rDNA) </strong></p>
<p align="justify">DNA molecules resulting from the fusion of DNA from different sources. The technology employed for splicing DNA from different sources and for amplifying the resultant heterogenous DNA.</p>
<p align="justify"><strong>Recombination </strong></p>
<p align="justify">A new combination of alleles resulting from the rearrangement occuring by crossing-over or by independent assortment (see crossing over).</p>
<p align="justify"><strong>Recursion </strong></p>
<p align="justify">An algorithmic procedure whereby an algorithm calls on itself to perform a calculation until the result exceeds a threshold, in which case the algorithm exits. Recursion is a powerful procedure with which to process data and is computationally quite efficient.</p>
<p align="justify"><strong>Regulatory gene </strong></p>
<p align="justify">A DNA sequence that functions to control the expression of other genes by producing a protein that modulates the synthesis of their products (typically by binding to the gene promoter). (cf. Structural gene).</p>
<p align="justify"><strong>Relational Database </strong></p>
<p align="justify">A database that follows E. F. Codd&#8217;s 11 rules, a series of mathematical and logical steps for the organization and systemization of data into a software system that allows easy retrieval, updating, and expansion. An RDBMS stores data in a database consisting of one or more tables of rows and columns. The rows correspond to a record (tuple); the columns correspond to attributes (fields) in the record. In an RDBMS, a view, defined as a subset of the database that is the result of the evaluation of a query, is a table. RDBMSs use Structured Query Language (SQL) for data definition, data management, and data access and retrieval. Relational and object-relational databases are used extensively in bioinformatics to store sequence and other biological data.</p>
<p align="justify"><strong>Relational Database Management Systems (RDBMS) </strong></p>
<p align="justify">A software system that includes a database architecture, query language, and data loading and updating tools and other ancillary software that together allow the creation of a relational database application.</p>
<p align="justify"><strong>Repeats (repeat sequences) </strong></p>
<p align="justify">Repeat sequences and approximate repeats occur throughout the DNA of higher organisms (mammals). For example, the <em>Alu </em> sequences of length about 300 characters, appear hundreds of thousands of times in Human DNA with about 87% homology to a consensus <em>Alu </em> string. Some short substrings such as TATA-boxes, poly-A and (TG)* also appear more often than by chance. Repeat sequences may also occur within genes, as mutations or alterations to those genes. Repetitive sequences, especially mobile elements, have many applications in genetic research. DNA transposons and retroposons are routinely used for insertional mutagenesis, gene mapping, gene tagging, and gene transfer in several model systems.</p>
<p align="justify"><strong>Repetitive elements </strong></p>
<p align="justify">Repetitive elements provide important clues about chromosome dynamics, evolutionary forces, and mechanisms for exchange of genetic information between organisms The most ubiquitous class of repetitive elements in the DNA sequence in primate genomes is the <em>Alu </em> family of interspersed repeats which have arisen in the last 65 million years of evolution <em>Alu </em> repeats belong to a class of sequences defined as short interspersed elements (SINEs). Approximately 500,000 <em>Alu </em> SINEs exist within the human genome, representing about 5% of the genome by mass.</p>
<p align="justify"><strong>Replication </strong></p>
<p align="justify">The synthesis of an informationally identical macromolecule (e.g. DNA) from a template molecule.</p>
<p align="justify"><strong>Repressor </strong></p>
<p align="justify">The protein product of a regulatory gene that combines with a specific operator (regulatory DNA sequence) and hence blocks the transcription of genes in an operon.</p>
<p align="justify"><strong>Restriction enzyme (restriction endonuclease) </strong></p>
<p align="justify">A type of enzyme that recognizes specific DNA sequences (usually palindromic sequences 4, 6, 8 or 16 base pairs in length) and produces cuts on both strands of DNA containing those sequences only. The &#8220;molecular scissors&#8221; of rDNA technology.</p>
<p align="justify"><strong>Restriction fragment length polymorphisms (RFLPs) </strong></p>
<p align="justify">Variation within the DNA sequences of organisms of a given species that can be identified by fragmenting the sequences using restriction enzymes, since the variation lies within the restriction site. RFLPs can be used to measure the diversity of a gene in a population.</p>
<p align="justify"><strong>Restriction map </strong></p>
<p align="justify">A physical map or depiction of a gene (or genome) derived by ordering overlapping restriction fragments produced by digestion of the DNA with a number of restriction enzymes.</p>
<p align="justify"><strong>Reverse Genetics </strong></p>
<p align="justify">The use of protein information to elucidate the genetic sequence encoding that protein. Used to describe the process of gene isolation starting with a panel of afflicted patients (see positional cloning) <strong>. </strong></p>
<p align="justify"><strong>Reverse transcriptase </strong></p>
<p align="justify">A DNA polymerase that can synthesise a complementary DNA (cDNA) strand using RNA as a template &#8211; a so-called RNA-dependent DNA polymerase.</p>
<p align="justify"><strong>Reverse transcriptase-PCR (RT-PCR) </strong></p>
<p align="justify">Procedure in which PCR amplification is carried out on DNA that is first generated by the conversion of mRNA to cDNA using reverse transcriptase.</p>
<p align="justify"><strong>Ribonucleic acid (RNA) </strong></p>
<p align="justify">A category of nucleic acids in which the component sugar is ribose and consisting of the four nucleotides Thymidine, Uracil, Guanine, and Adenine. The three types of RNA are messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA).</p>
<p align="center"><strong>S </strong></p>
<p align="justify"><strong>Secondary structure (protein) </strong></p>
<p align="justify">The organization of the peptide backbone of a protein that occurs as a result of hydrogen bonds e.g alpha helix, Beta pleated sheet.</p>
<p align="justify"><strong>Selectivity </strong></p>
<p align="justify">Selectivity of bioinformatics similarity search algorithms is defined as the significance threshold for reporting database sequence matches. As an example, for BLAST searches, the parameter E is interpreted as the upper bound on the expected frequency of chance occurrence of a match within the context of the entire database search.  E may be thought of as the number of matches one expects to observe by chance alone during the database search.</p>
<p align="justify"><strong>Sense strand </strong></p>
<p align="justify">The strand of double-stranded DNA that acts as the template strand for RNA synthesis. Typically only one gene product is produced per gene, reading from the sense strand only. (Some viruses have open reading frames in both the sense and the antisense strands).</p>
<p align="justify"><strong>Sensitivity </strong></p>
<p align="justify">Sensitivity of bioinformatics similarity search algorithms centers around two areas: First, how well can the method detect biologically meaningful relationships between two related sequences in the presence of mutations and sequencing errors; Secondly how does the heuristic nature of the algorithm affect the probability that a matching sequence will not be detected. At the user&#8217;s discretion, the speed of most similarity search programs can be sacrificed in exchange for greater sensitivity &#8211; with an emphasis on detecting lower scoring matches.</p>
<p align="justify"><strong>Sequence Tagged Site (STS) </strong></p>
<p align="justify">A unique sequence from a known chromosomal location that can be amplified by PCR. STSs act as physical markers for genomic mapping and cloning.</p>
<p align="justify"><strong>Sexual PCR (Molecular Diversity) </strong></p>
<p align="justify">Sexual PCR is a form of PCR in which similar, but not identical, DNA sequences are reassembled to obtain novel juxtapositions, simulating the result of genetic recombination. The result is the creation of an array of related genes which may possess improved characteristics. By repeated rounds of recombination, selection and PCR-based amplification vastly improved gene-products, such as enzymes with greater activity, may be generated and selected.</p>
<p align="justify"><strong>Shotgun cloning </strong></p>
<p align="justify">The cloning of an entire gene segment or genome by generating a random set of fragments using restriction endonucleases to create a gene library that can be subsequently mapped and sequenced to reconstruct the entire genome.</p>
<p align="justify"><strong>Similarity (homology) search </strong></p>
<p align="justify">Given a newly sequenced gene, there are two main approaches to the prediction of structure and function from the amino acid sequence. Homology methods are the most powerful and are based on the detection of significant extended sequence similarity to a protein of known structure, or of a sequence pattern characteristic of a protein family. Statistical methods are less successful but more general and are based on the derivation of structural preference values for single residues, pairs of residues, short oligopeptides or short sequence patterns. The transfer of structure/function information to a potentially homologous protein is straightforward when the sequence similarity is high and extended in length, but the assessment of the structural significance of sequence similarity can be difficult when sequence similarity is weak or restricted to a short region.</p>
<p align="justify"><strong>Signal sequence (leader sequence) </strong></p>
<p align="justify">A short sequence added to the amino-terminal end of a polypeptide chain that forms an amphipathic helix allowing the nascent polypeptide to migrate through membranes such as the endoplasmic reticulum or the cell membrane. It is cleaved from the polypeptide after the protein has crossed the membrane.</p>
<p align="justify"><strong>Single nucleotide polymorphisms (SNPs) </strong></p>
<p align="justify">Variations of single base pairs scattered throughout the human genome that serve as measures of the genetic diversity in humans. About 1 million SNPs are estimated to be present in the human genome, and SNPs are useful markers for gene mapping studies.</p>
<p align="justify"><strong>Single-pass sequencing </strong></p>
<p align="justify">Rapid sequencing of large segments of the genome of an organism by isolating as many expressed (cDNA) sequences as possible and performing single sequencer runs on their 5&#8242; or 3&#8242; ends. Single-pass sequencing typically results in individual, error-prone sequencing reads of 400-700 bases, depending on the type of sequencer used. However, if many of these are generated from numerous clones from different tissues, they may be overlapped and assembled to remove the errors and generate a contiguous sequence for the entire expressed gene.</p>
<p align="justify"><strong>Site </strong></p>
<p align="justify">Sites in sequences can be located either in DNA (e.g. binding sites, cleavage sites) or in proteins. In order to identify a site in DNA, ambiguity symbols are used to allow several different symbols at one position. Proteins, however, need a different mechanism (see Pattern). Restriction enzyme cleavage sites, for instance, have the following properties:  limited length (typically, less than 20 base pairs); definition of the cleavage site and its appearance (3&#8242;, 5&#8242; overhang or blunt); definition of the binding site.</p>
<p align="justify"><strong>Southern blotting </strong></p>
<p align="justify">A procedure for the identification of DNA by transmitting a fragment isolated on an agarose gel to a nitrocellulose filter where it can be hybridized with a complementary &#8220;probe&#8221; sequence.</p>
<p align="justify"><strong>Splice site </strong></p>
<p align="justify">The sequence found at the 5&#8242; and 3&#8242; region of exon/intron boundaries, usually defined by a consensus sequence:</p>
<p align="justify"><em>Intron </em></p>
<p align="justify">5&#8242; CAGGTAAGT&#8212;&#8212;&#8212;TNCAGG 3&#8242;</p>
<p align="justify">A G C T</p>
<p align="justify">N represents any nucleotide; the bottom line represents alternative nucleotides at the indicated positions.</p>
<p align="justify"><strong>Splice form </strong></p>
<p align="justify">By using alternative splicing, a single message precursor from DNA can generate an entire family of mRNAs and proteins. This can be utilized to create specificity in cell-cell or cell-ligand interactions. A cell may produce a given protein, but it will be a different splice-form of the protein than that produced by an adjacent cell. In this manner, the two cells have the potential to interact differently with other cells or molecules. Two places where this has been extremely important is in the production of cell-surface specificity proteins in the immune and nervous systems.</p>
<p align="justify"><strong>Splicing </strong></p>
<p align="justify">The joining together of separate DNA or RNA component parts. For example, RNA splicing in eukaryotes involves the removal of introns and the stitching together of the exons from the pre-mRNA transcript before maturation.</p>
<p align="justify"><strong>Solvent accessibility </strong></p>
<p align="justify">The surface area (typically measured in square angstroms) of a biological molecule, usually a protein, that is exposed to solvent in its native, folded form. Determining the solvent accessibility of a protein helps define which amino acids in its molecular sequence are on the exterior of the molecule, and thus available to participate in interactions with other molecules.</p>
<p align="justify"><strong>Structural gene </strong></p>
<p align="justify">Gene which encodes a structural protein (cf. Regulatory gene).</p>
<p align="justify"><strong>Structure prediction </strong></p>
<p align="justify">Algorithms that predict the secondary, tertiary and sometimes even quarternary structure of proteins from their sequences.  Determining protein structure from sequence has been dubbed &#8220;the second half of the Genetic Code&#8221; since it is the folded tertiary structure of a protein that governs how it functions as a gene product.  As yet most structure prediction methods are only partially successful, and typically work best for certain well-defined classes of proteins.</p>
<p align="justify"><strong>Substitution matrix </strong></p>
<p align="justify">A model of protein evolution at the sequence level resulting in the development of a set of widely used substitution matrices. These are frequently called Dayhoff, MDM (Mutation Data Matrix), BLOSUM or PAM (Percent Accepted Mutation) matrices. They are derived from global alignments of closely related sequences.  Matrices for greater evolutionary distances are extrapolated from those for lesser ones.</p>
<p align="justify"><strong>Subtraction library </strong></p>
<p align="justify">A cDNA library that only contains cDNAs uniquely expressed in a given cell or tissue. e.g T cells and B cells will express many common RNAs, as well as a very small percentage which will be unique for T cells and B cells respectively. To make a T cell subtraction library, the cDNA from a T cell library is hybridized with a vast excess of B cell RNA. The commonly expressed genes will result in RNA-cDNA hybrids which can be removed (or subtracted) to leave only T cell specific cDNAs.</p>
<p align="center"><strong>T </strong></p>
<p align="justify"><strong>Tentative Consensus (TC) </strong></p>
<p align="justify">The identification of a sequence from an EST cluster that represents part or all of a complete gene.  TCs are usually determined by clustering ESTs allowing for sequencing errors, artefacts such as chimeric clones, and naturally occuring biological phenomena such as alternative splicing.  Creation of a cluster allows one to generate a consensus sequence and then identify a long open reading frame which would suggest the possibility of that consensus representing a <em>bona fide </em> gene.</p>
<p align="justify"><strong>Tentative Human Consensus sequences (THCs) </strong></p>
<p align="justify">A consensus sequence generated from human EST fragments. THCs may be validated by comparison against databases of known human gene sequences, human genomic sequences, or by identification of the ORFs or other sequence features contained within the consensus as belonging to a known human gene product.</p>
<p align="justify"><strong>Tertiary structure </strong></p>
<p align="justify">Folding of a protein chain via interactions of its sideschain molecules including formation of disulphide bonds between cysteine residues.</p>
<p align="justify"><strong>Thymine </strong></p>
<p align="justify">A pyrimidine base found in DNA but not in RNA.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="justify"><strong>Transcript </strong></p>
<p align="justify">The single-stranded mRNA chain that is assembled from a gene template.</p>
<p align="justify"><strong>Transcription </strong></p>
<p align="justify">The assembly of complementary single-stranded RNA on a DNA template.</p>
<p align="justify"><strong>Transcription factors </strong></p>
<p align="justify">A group of regulatory proteins that are required for transcription in eukaryotes. Transcription factors bind to the promoter region of a gene and facilitate transcription by RNA polymerase.</p>
<p align="justify"><strong>Transfer RNA (tRNA) </strong></p>
<p align="justify">A small RNA molecule that recognizes a specific amino acid, transports it to a specific codon in the mRNA, and positions it properly in the nascent polypeptide chain.</p>
<p align="justify"><strong>Transformation </strong></p>
<p align="justify">A genetic alteration to a cell as a result of the incorporation of DNA from a genetically diferent cell or virus; can also refer to the introduction of DNA into bacterial cells for genetic manipulation.</p>
<p align="justify"><strong>Transgene </strong></p>
<p align="justify">A foreign gene that is introduced into a cell or whole organism (eg.transgenic mice) for therapeutic or experimental purposes.</p>
<p align="justify"><strong>Translation </strong></p>
<p align="justify">The process of converting RNA to protein by the assembly of a polypeptide chain from an mRNA molecule at the ribosome.</p>
<p align="justify"><strong>Transmembrane region </strong></p>
<p align="justify">The region of a transmembrane protein that actually spans the membrane.  Transmembrane regions are usually hydrophobic in order to be thermodynamically compatible with the lipid bilayer portion of the membrane.  They may consist of either alpha-helical or beta-strand secondary structure elements, but in either case the external residues (the ones facing the membrane) are invariably hydrophobic while the internal residues may be hydrophilic (as in the case of a pore or channel) or polar.  One common transmembrane structural domain is the seven-helix bundle seen in numerous channel proteins.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="justify"><strong>Tentative Consensus (TC) </strong></p>
<p align="justify">The identification of a sequence from an EST cluster that represents part or all of a complete gene.  TCs are usually determined by clustering ESTs allowing for sequencing errors, artefacts such as chimeric clones, and naturally occuring biological phenomena such as alternative splicing.  Creation of a cluster allows one to generate a consensus sequence and then identify a long open reading frame which would suggest the possibility of that consensus representing a <em>bona fide </em> gene.</p>
<p align="justify"><strong>Tentative Human Consensus sequences (THCs) </strong></p>
<p align="justify">A consensus sequence generated from human EST fragments. THCs may be validated by comparison against databases of known human gene sequences, human genomic sequences, or by identification of the ORFs or other sequence features contained within the consensus as belonging to a known human gene product.</p>
<p align="justify"><strong>Tertiary structure </strong></p>
<p align="justify">Folding of a protein chain via interactions of its sideschain molecules including formation of disulphide bonds between cysteine residues.</p>
<p align="justify"><strong>Thymine </strong></p>
<p align="justify">A pyrimidine base found in DNA but not in RNA.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="justify"><strong>Transcript </strong></p>
<p align="justify">The single-stranded mRNA chain that is assembled from a gene template.</p>
<p align="justify"><strong>Transcription </strong></p>
<p align="justify">The assembly of complementary single-stranded RNA on a DNA template.</p>
<p align="justify"><strong>Transcription factors </strong></p>
<p align="justify">A group of regulatory proteins that are required for transcription in eukaryotes. Transcription factors bind to the promoter region of a gene and facilitate transcription by RNA polymerase.</p>
<p align="justify"><strong>Transfer RNA (tRNA) </strong></p>
<p align="justify">A small RNA molecule that recognizes a specific amino acid, transports it to a specific codon in the mRNA, and positions it properly in the nascent polypeptide chain.</p>
<p align="justify"><strong>Transformation </strong></p>
<p align="justify">A genetic alteration to a cell as a result of the incorporation of DNA from a genetically diferent cell or virus; can also refer to the introduction of DNA into bacterial cells for genetic manipulation.</p>
<p align="justify"><strong>Transgene </strong></p>
<p align="justify">A foreign gene that is introduced into a cell or whole organism (eg.transgenic mice) for therapeutic or experimental purposes.</p>
<p align="justify"><strong>Translation </strong></p>
<p align="justify">The process of converting RNA to protein by the assembly of a polypeptide chain from an mRNA molecule at the ribosome.</p>
<p align="justify"><strong>Transmembrane region </strong></p>
<p align="justify">The region of a transmembrane protein that actually spans the membrane.  Transmembrane regions are usually hydrophobic in order to be thermodynamically compatible with the lipid bilayer portion of the membrane.  They may consist of either alpha-helical or beta-strand secondary structure elements, but in either case the external residues (the ones facing the membrane) are invariably hydrophobic while the internal residues may be hydrophilic (as in the case of a pore or channel) or polar.  One common transmembrane structural domain is the seven-helix bundle seen in numerous channel proteins.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="center"><strong>U </strong></p>
<p align="justify"><strong>Unidentified reading frame (URF) </strong></p>
<p align="justify">An open reading frame encoding a protein of undefined function <strong>. </strong></p>
<p align="justify"><strong>Uracil </strong></p>
<p align="justify">Nitrogenous pyrimidine base found in RNA but not DNA.</p>
<p align="justify">
<p align="justify">
<p align="justify"><strong>Variable numbers of tandem repeats (VNTRs) </strong></p>
<p align="justify">DNA sequence blocks of 2-60 base pairs which are repeated from two to more than 20 times in different individuals. This polymorphism makes VNTRs very useful DNA markers used in genomic mapping, linkage analysis and also DNA fingerprinting.</p>
<p align="justify"><strong>Variation (genetic) </strong></p>
<p align="justify">Variation in genetic sequences and the detection of DNA sequence variants genome-wide allow studies relating the distribution of sequence variation to a population history. This in turn allows one to determine the density of SNPS or other markers needed for gene mapping studies.  Quantitation of these variations together with analytical tools for studying sequence variation also relate genetic variations to phenotype.</p>
<p align="justify"><strong>Vector </strong></p>
<p align="justify">Any agent that transfers material (typically DNA) from one host to another. Typically DNA vectors are autonomous DNA elements (such as plasmids) that can be manipulated and integrated into a host&#8217;s DNA or recombinant viruses.</p>
<p align="justify"><strong>Virtual libraries </strong></p>
<p align="justify">The creation and storage of vast collections of molecular structures in an electronic database. These databases may be queried for subsets that exhibit specific physicochemical features, or may be &#8220;virtually screened&#8221; for their ability to bind a drug target. This process may be performed prior to the synthesis and testing of the molecules themselves.</p>
<p align="justify"><strong>Visualization </strong></p>
<p align="justify">Visualization is the process of representing abstract scientific data as images that can aid in understanding the meaning of the data.</p>
<p align="center"><strong>V </strong></p>
<p align="justify"><strong>Weight matrix </strong></p>
<p align="justify">The density of binding sites in a gene or sequence can be used to derive a ratio of density for each element in a pattern of interest. The combined individual density ratios of all elements are then collectively used to build a scoring profile known as a weight matrix. This profile can be used to test the prediction of the identification of the selected pattern and the ability of the algorithm to discriminate them from non-pattern sequences.</p>
<p align="justify"><strong>Western blot </strong></p>
<p align="justify">Technique in which specific antibodies are used to identify their antigens from a mixture of proteins. Typically, these proteins mixtures are first separated by electrophoresis and then transfered onto nylon sheets by electrotransfer. Radiolabeled or enzyme-linked antibodies are incubated with the sheets and unbound antibodies washed away allowing the position of the bound antibody to be revealed by autoradiography or color which is formed upon addition of a substrate.</p>
<p align="justify"><strong>Wild type<br />
</strong>Form of a gene or allele that is considered the &#8220;standard&#8221; or most common.</p>
<p align="center"><span>X </span></p>
<p align="justify"><strong>X chromosome </strong></p>
<p align="justify">In mammals, the sex chromosome that is found in two copies in the homogametic sex (female in humans) and one copy in the hererogametic sex (male in humans).</p>
<p align="justify"><strong>Y </strong></p>
<p align="justify"><strong>Yeast 2-hybrid system </strong></p>
<p align="justify">A yeast-based method used to simultaneously identify, and clone the gene for, proteins interacting with a known protein. The basis of this method is a &#8220;transcriptional reporter assay&#8221; (see definition) in which reporter gene expression is dependent on two domains. The first domain is linked to the known protein. The second domain is genetically linked to a library. If the library is screened against the known protein the two domains will interact only if a protein from the library binds the known protein, resulting in transcription activation of the reporter gene, and a blue color. The &#8220;blue yeast clone&#8221; will contain the gene encoding the newly identified protein.</p>
<p align="center"><strong>Z </strong></p>
<p align="justify"><strong>Z-DNA </strong></p>
<p align="justify">A conformation of DNA existing as a left-handed double helix (the phosphate-sugar backbone forms a left-handed zig-zag course), which may play a role in gene regulation.</p>
<p align="justify"><strong>Zinc fingers </strong></p>
<p align="justify">A protein motif formed by the interaction of repeated cysteine and histidine residues with a zinc ion. The spacing of the repeats results in finger like arrangements of the protein loops formed from the interaction which interact with DNA. These motifs are typically found in transcription factors.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-glossary-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Genomic glossary</title>
		<link>http://bioinformatics.me/genomic-glossary-2/</link>
		<comments>http://bioinformatics.me/genomic-glossary-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:01 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/genomic-glossary-2/</guid>
		<description><![CDATA[
biological atlas:  Maps describing different aspects of protein function should be compiled into a &#8220;biological atlas&#8221; By integrating the information contained in the atlas, increasingly meaningful biological hypotheses could be formulated.
cDNA maps:  Shows the locations of expressed DNA regions (exons) on the chromosomal map. Because they represent expressed genomic regions, cDNAs are thought [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-367" src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/bioinformatics_glossary.jpeg" alt="bioinformatics_glossary" width="600" height="345" /></p>
<p align="justify"><strong>biological atlas: </strong> Maps describing different aspects of protein function should be compiled into a &#8220;biological atlas&#8221; By integrating the information contained in the atlas, increasingly meaningful biological hypotheses could be formulated.</p>
<p align="justify"><strong>cDNA maps: </strong> Shows the locations of expressed DNA regions (exons) on the chromosomal map. Because they represent expressed genomic regions, cDNAs are thought to identify the parts of the genome with the most biological and medical significance. A cDNA map can provide the chromosomal location for genes whose functions are currently unknown. For disease- gene hunters, the map can also suggest a set of candidate genes to test when the approximate location of a disease gene has been mapped by genetic linkage techniques.</p>
<p align="justify"><strong>cell mapping: </strong> The determination of the subcellular location of proteins and of protein- protein interactions by the purification of organelles or protein complexes followed by mass- spectrometric identification of the components. Most proteins are thought to exist in the cell not as free entities but as part of ï¿½cellular machines&#8217; which perform cellular functions cooperatively. Systematic identification of protein complexes would permit these machines to be defined and allow ï¿½physical maps&#8217; to be created for a variety of cell types and states. Such information is of great value for the assignment of protein function.</p>
<p align="justify"><strong>cell maps: </strong> A cell map specifies the proteins that constitute a given organelle within a given cell type. Cell maps for normal and diseased cells can be constructed which give insight into the role proteins have in disease and can guide the drug development process.</p>
<p align="justify"><strong>chromosomal maps: </strong> Genes or other identifiable DNA fragments are assigned to their respective chromosomes, with distances measured in base pairs. These markers can be physically associated with particular bands (identified by cytogenetic staining) primarily by in situ hybridization, a technique that involves tagging the DNA marker with an observable label (e.g., one that fluoresces or is radioactive). The location of the labeled probe can be detected after it binds to its complementary DNA [cDNA] strand in an intact chromosome.</p>
<p align="justify"><strong>chromosome mapping: </strong>Any method used for determining the location of and relative distances between genes on a chromosome.</p>
<p align="justify"><strong>clone-based maps: </strong> The physical map of the human genome published by Nature is a clone- based physical map of 3.2 gigabases (25 times larger than any previously mapped genome). This approach involved generating an overlapping series of clones for the whole genome. With a fingerprinted BAC map clones could be selected for sequencing ensuring comprehensive coverage of the genome.</p>
<p align="justify"><strong>comparative genome mapping: </strong> Comparative genome mapping in the sequence-based era: early experience with human chromosome 7</p>
<p align="justify"><strong>contig mapping </strong>: Overlapping of cloned or sequenced DNA to construct a continuous region of a gene, chromosome or genome.</p>
<p align="justify"><strong>contig maps </strong>: Contig maps are important because they provide the ability to study a complete, and often large segment of the genome by examining a series of overlapping clones which then provide an unbroken succession of information about that region.</p>
<p align="justify">The bottom- up approach involves cutting the chromosome into small pieces, each of which is cloned and ordered. The ordered fragments form contiguous DNA blocks (contigs). Currently, the resulting library of clones varies in size from 10,000 bp to 1 Mb. An advantage of this approach is the accessibility of these stable clones to other researchers Contig construction can be verified by FISH [fluorescence in situ hybridization], which localizes cosmids to specific regions within chromosomal bands. Consist of a linked library of small overlapping clones representing a complete chromosomal segment. While useful for finding genes localized to a small area (under 2 Mb), contig maps are difficult to extend over large stretches of a chromosome because all regions are not clonable. DNA probe techniques can be used to fill in the gaps, but they are time consuming.</p>
<p align="justify"><strong>cosmid maps: </strong> &#8220;Constructing chromosome- and region-specific cosmid maps of the human genome&#8221;</p>
<p align="justify"><strong>cytogenetic maps: </strong> The visual appearance of a chromosome when stained and examined under a microscope. Particularly important are visually distinct regions, called light and dark bands, which give each of the chromosomes a unique appearance. This feature allows a person&#8217;s chromosomes to be studied in a clinical test known as a karyotype, which allows scientists to look for chromosomal alterations.</p>
<p align="justify"><strong>epitope mapping </strong>: Methods used for studying the interactions of antibodies with specific regions of protein antigens. Important applications of epitope mapping are found within the area of immunochemistry.</p>
<p align="justify"><strong>evolutionary genetics: </strong> Evolutionary study of genes has been purely theoretical, but it can provide useful information for guiding gene mapping. People are now finding, for example, that a lot of things are not true associations; instead, they are an artifact of association. You can make such mistakes when you are looking at two individuals who share a common ancestry. Understanding the phylogeny helps us, for example, understand horizontal gene transfer between microorganisms. For humans or other sexually reproducing organisms, the use of phylogenetic information improves resolution for making associations by helping to avoid type I errors &#8211; that is, finding an association that is actually merely due to sharing a recent common ancestor, or, in other words, being closely related.</p>
<p align="justify"><strong>expression imbalance map EIM </strong>: A new visualization method, for detecting mRNA expression imbalance regions, reflecting genomic losses and gains at a much higher resolution than conventional technologies such as comparative genomic hybridization (CGH). Simple spatial mapping of the microarray expression profiles on chromosomal location provides little information about genomic structure, because mRNA expression levels do not completely reflect genomic copy number and some microarray probes would be of low quality. The EIM, which does not employ arbitrary selection of thresholds in conjunction with hypergeometric distribution- based algorithm, has a high tolerance of these complex factors.</p>
<p align="justify"><strong>expression mapping: </strong> The creation of quantitative maps of protein expression from cell or tissue extracts, akin to the EST maps commercially available. This approach relies on 2D gel maps and image analysis, and opens up the possibility of studying cellular pathways and their perturbation by disease, drug action or other biological stimuli at the whole- proteome level ï¿½ Expression mapping is a valuable tool in the discovery of disease markers and its use in gaining information in toxicological and drug- action studies seems assured. It is unclear at present how successful this approach will be in elucidating cellular pathways and their importance in disease processes, and how much the precise measurement of protein levels matters when compared with the rough guide provided by the measurement of mRNA levels ï¿½ the ability to measure protein- level changes directly would seem to carry inherent advantages and it seems likely that expression proteomics will be a useful tool in drug target discovery and in studying the effects of various biological stimuli on the cell.</p>
<p align="justify"><strong>functional maps: </strong>In addition to the raw data, it will be important to design the proper visualization tools to graphically represent the functional relationships contained in different maps &#8230; Finally, it will be important to consider the possibility that functional maps need to be related back to particular tissues or even cell types.</p>
<p align="justify"><strong>gene mapping: </strong> Determination of the relative positions of genes on a DNA molecule (chromosome or plasmid) and of the distance, in linkage units or physical units, between them. [DOE]</p>
<p align="justify"><strong>genetic linkage map: </strong>Shows the relative locations of specific DNA markers along the chromosome. Any inherited physical or molecular characteristic that differs among individuals and is easily detectable in the laboratory is a potential genetic marker. [Primer on Molecular Genetics, Oak Ridge National Lab, US] <a href="http://www.geocities.com/bioinformaticsweb/genetic%20linkage%20map:%20Shows%20the%20relative%20locations%20of%20specific%20DNA%20markers%20along%20the%20chromosome.%20Any%20inherited%20physical%20or%20molecular%20characteristic%20that%20differs%20among%20individuals%20and%20is%20easily%20detectable%20in%20the%20laboratory%20is%20a%20potential%20genetic%20marker.%20%20%5BPrimer%20on%20Molecular%20Genetics,%20Oak%20Ridge%20National%20Lab,%20US%5D%20http://www.ornl.gov/hgmis/publicat/primer/prim2.html#1%20%20Related%20term%20linkage%20maps.%20">http://www.ornl.gov/hgmis/publicat/primer/prim2.html#1 </a> Related term linkage maps.</p>
<p align="justify"><strong>genetic maps: </strong> Also known as a linkage map. A chromosome map of a species that shows the position of its known genes and/ or markers relative to each other, rather than as specific physical points on each chromosome.</p>
<p align="justify">The value of the genetic map is that an inherited disease can be located on the map by following the inheritance of a DNA marker present in affected individuals (but absent in unaffected individuals), even though the molecular basis of the disease may not yet be understood nor the responsible gene identified. Genetic maps have been used to find the exact chromosomal location of several important disease genes, including cystic fibrosis, sickle cell disease, Tay- Sachs disease, fragile X syndrome, and myotonic dystrophy. [Primer on Molecular Genetics, Oak Ridge National Lab, US] <a href="http://www.ornl.gov/hgmis/publicat/primer/prim2.html#1%20">http://www.ornl.gov/hgmis/publicat/primer/prim2.html </a></p>
<p align="justify"><strong>genome control maps: </strong>Would identify all the components of the transcriptional machinery that have roles at any particular promoter and the contribution that specific components make to coordinate regulation of genes. The map will facilitate modeling of the molecular mechanisms that regulate gene expression and implicate components of the transcription apparatus in functional interactions with gene-specific regulators.</p>
<p align="justify"><strong>genome fingerprint map: </strong> The collection of all fingerprint clone contigs placed in a genome- wide map.</p>
<p align="justify"><strong>genome map: </strong> A reconstruction of the entire set of chromosomes for a given organism, showing the relative position of every gene.</p>
<p align="justify"><strong>genome scale metabolic maps: </strong>Annotated genomic data, along with legacy data on the cell&#8217;s biochemistry and physiology, can be used to construct genome- scale metabolic maps. The challenge now is to formulate reliable mathematical descriptions of the integrated function of these maps. It has proven difficult, if not impossible, to formulate detailed theory-based models of these genome-scale maps. An alternative approach that is data driven and constraints based will be described. It is an iterative model-building process.</p>
<p align="justify"><strong>genomic cartography: </strong> [Fry's] research focuses on methods of visualizing large amounts of data from dynamic information sources. The work uses ideas from distributed and adaptive systems to form organic representations that react and respond to the input data. This work is currently directed towards Genomic Cartography which is a study into new methods to represent the data found in the human genome.</p>
<p align="justify"><strong>genomic mapping: </strong>While a few technologies for functional analysis on a genomic basis are being developed at present, additional approaches and technologies for genomic interpretation that can be applied efficiently and economically at the level of an entire genome will be required for comprehensive analyses. Informatics will continue to play an important role in achieving all of these goals, as well as in ensuring the maintenance and accessibility of the forthcoming data. The development and application of new technologies for acquisition, management, analysis, and dissemination of genomic data are still required.</p>
<p align="justify"><strong>haplotype map: </strong> Francis Collins, director of the NHGRI, speaking at BIO 2001 (San Diego CA, US, June 2001) announced plans for a public- private effort to create a human haplotype map. Creators hope this so- called haplotype map will be a tool for pinning down the genes that contribute to the development of complex diseases such as cancer, diabetes, and mental illness.</p>
<p align="justify"><strong>haplotype mapping: </strong> Is often carried out as part of a genome scan. In a population isolate, the appearance of a rare Mendelian disease is almost always attributable to a single founder gene or mutation. The disease allele can be identified by searching for a common haplotype signature shared among patients. As the ancestral haplotype signature is passed from generation to generation, it is disrupted by recombination. Partial conservation of the haplotype signature in a patient strongly suggests that the disease locus resides in the conserved region of the haplotype.</p>
<p align="justify"><strong>high-resolution genetic maps: </strong>2-5 cM [centiMorgans]. Genetic mapping resolution has been increased through the application of recombinant DNA technology, including in vitro radiation- induced chromosome fragmentation and cell fusions (joining human cells with those of other species to form hybrid cells) to create panels of cells with specific and varied human chromosomal components.</p>
<p align="justify"><strong>high- resolution physical mapping: </strong> The two current approaches are termed top- down (producing a macrorestriction map) and bottom- up (resulting in a contig map). With either strategy the maps represent ordered sets of DNA fragments that are generated by cutting genomic DNA with restriction enzymes. The fragments are then amplified by cloning or by polymerase chain reaction (PCR) methods. Electrophoretic techniques are used to separate the fragments according to size into different bands, which can be visualized by direct DNA staining or by hybridization with DNA probes of interest. The use of purified chromosomes separated either by flow sorting from human cell lines or in hybrid cell lines allows a single chromosome to be mapped.</p>
<p align="justify"><strong>homology map: </strong> The Davis Human/ Mouse Homology Map, a table comparing genes in homologous segments of DNA from human and mouse sources, sorted by position in each genome. A total of 1793 loci are presented, most of which are genes. The authors did not include pseudogenes, members of multigene families where specific homology relationships could not be determined, nor any other genes for which homology was in doubt. In addition, for 568 of the loci there are provisional assignments of markers that link the homology map with that of the Gene Map of the Human Genome. . These links also provide a rough approximation of the position of markers in the Genethon linkage map. In constructing this table, the authors first ordered genes so as to best maintain order according to both human cytogenetic position and mouse genetic map position. Within these homologous regions, genes were ordered according to the mouse genetic mapping data.</p>
<p align="justify"><strong>International SNP Map Working Group: </strong> Identifies and localizes 1.42 millions SNPs in the human genome. ["A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms" International SNP Map Working Group Nature 409: 928- 933, 15 Feb. 2001] <a href="http://www.nature.com/cgi-taf/DynaPage.taf?%20">http://www.nature.com/cgi-taf </a></p>
<p align="justify"><strong>linkage disequilibrium: </strong>Evidence for linkage disequilibrium can be helpful in mapping disease genes since it suggests that the two [alleles] may be very close to one another</p>
<p align="justify"><strong>linkage maps: </strong> A map of the relative positions of genetic loci on a chromosome, determined on the basis of how often the loci are inherited together. Distance is measured in centimorgans (cM).</p>
<p align="justify"><strong>localizome mapping: </strong> One can imagine comprehensive mapping projects of the &#8220;localizome&#8221;, with the goal of recording not only where all proteins of a proteome can be found but also when.</p>
<p align="justify"><strong>locus: </strong>Any genomic site, whether functional or not, that can be mapped through formal genetic analysis.</p>
<p align="justify"><strong>macrorestriction map: </strong> Describes the order and distance between enzyme cutting (cleavage) sites &#8230; In top- down mapping, a single chromosome is cut (with rare- cutter restriction enzymes) into large pieces, which are ordered and subdivided; the smaller pieces are then mapped further. The resulting macro- restriction maps depict the order of and distance between sites at which rare- cutter enzymes cleave. This approach yields maps with more continuity and fewer gaps between fragments than contig maps, but map resolution is lower and may not be useful in finding particular genes; in addition, this strategy generally does not produce long stretches of mapped sites. Currently, this approach allows DNA pieces to be located in regions measuring about 100,000 bp to 1 Mb.</p>
<p align="justify"><strong>mapping: </strong> The determination of the relative positions of genes within the chromosomes or of restriction sites along a DNA molecule.The process of determining the position of a locus on the chromosome relative to other loci.</p>
<p align="justify"><strong>nucleotide mapping: </strong> Two- dimensional separation and analysis of nucleotides.</p>
<p align="justify"><strong>optical mapping: </strong>An enabling technology for whole genome analysis which involves the capture of individual DNA molecules, obtained directly from genomic DNA, followed by digestion in situ by selected restriction endonucleases. The resulting fragments are then visualized directly to produce detailed optical restriction maps. This methodology allows patterns of sequence variation to be detected across entire genomes, without the need for DNA amplification, and, unlike other genomewide scanning methods, provides detailed haplotype information by analyzing individual DNA molecules. OpGen will provide optical mapping services to three main markets ï¿½ i.e., genome sequencing projects, cancer diagnostics, and genetic association studies.</p>
<p align="justify"><strong>peptide mapping: </strong>Two- dimensional separation and analysis of peptides.The characteristic pattern of fragments formed by the separation of a mixture of peptides resulting from hydrolysis of a protein or peptide.</p>
<p align="justify"><strong>peptide maps: </strong> One way to research the structure of a protein is to seek to identify the specific sequence of amino acids which form the protein. Tr. at 202-3. An initial approach is to cut the protein into fragments, called peptides, using enzymes or chemicals which reliably divide the chain at predictable points. Tr. at 523. These fragments can be isolated and analyzed to create &#8220;peptide maps,&#8221; which show the pattern of pieces from these breaks as a unique &#8220;fingerprint.&#8221; &#8230; A peptide map does not disclose the precise order of the amino acids in a protein, although it may point to areas of difference between similar proteins.</p>
<p align="justify"><strong>phenome mapping: </strong> The conceptual matrix for a comprehensive &#8220;phenome&#8221; mapping project would be as follows: one axis represents all available knockouts while the other represents a large series of standardized phenotypes that can be screened.</p>
<p align="justify"><strong>phenome maps: </strong> Can be thought of as lists of similar phenotypes that could be referred to as &#8220;pheno- clusters&#8221;.</p>
<p align="justify"><strong>physical mapping: </strong>The procedure of physical mapping coarsely divides into two steps First large pieces of DNA (contigs), a library of cloned fragments are ordered according to their position in the genome. Different experimental techniques are used to do that. Roughly, these are clone- probe hybridization mapping, restriction mapping, radiation- hybrid mapping and optical mapping. &#8230; Second the cloned fragments are cut by restriction enzymes, smaller DNA fragments are obtained which are sequenced in detail (shotgun- sequencing), and the overall sequence in detail is obtained by Sequence Assembly.</p>
<p align="justify"><strong>physical maps: </strong> A map of the locations of identifiable landmarks on DNA (e.g. restriction enzyme cutting sites, genes) regardless of inheritance. Distance is measured in base pairs. For the human genome, the lowest- resolution physical map is the banding patterns on the 24 different chromosomes; the highest resolution map would be the complete nucleotide sequence of the chromosomes.</p>
<p align="justify">A chromosome map of a species that shows the specific physical locations of its genes and/ or markers on each chromosome. Physical maps are particularly important when searching for disease genes by positional cloning strategies and for DNA sequencing.</p>
<p align="justify"><strong>positional cloning: </strong> Requires a genetic map with a large number of markers (especially in the region of interest), and the use of physical mapping and DNA sequencing technologies to isolate and sequence the targeted gene.</p>
<p align="justify"><strong>protein expression map: </strong> Since 2D Electrophoresis gel patterns reveal not only the amounts of protein, but is unrivaled in its ability to detect post- translational modifications, the 2DE protein map provides much more relevant information about cellular dynamics than the corresponding expression map at the mRNA level. By comparing the 2DE gel patterns of samples exposed to different physiological conditions or different drug treatments it is possible to identify groups of proteins with related functions or whose expression is interdependent (expression proteomics).</p>
<p align="justify"><strong>Protein Expression Mapping PEM: </strong> Details the distribution and abundance of protein in specific samples, under defined physiological conditions. [CHI Proteomics] Quantitative study of global changes in protein expression in tissues, cells or body fluids using 2D gels and image analysis. Currently carried out by 2D gel electrophoresis, though alternatives are under investigation.</p>
<p align="justify"><strong>protein interaction maps: </strong> Hybrigenics&#8221; comprehensive protein interaction maps using automated yeast- two- hybrid methodology in pathogens and in cDNA of normal and diseased tissues.</p>
<p align="justify"><strong>protein linkage maps: </strong>With respect to a genome- wide use of the two- hybrid assay in the case of yeast, the goal is to find which proteins in the yeast genome interact with every other protein. This process would generate protein linkage maps, delineating large networks of interacting proteins. The approximately 6,000 yeast proteins can potentially interact in 18 million pairwise combinations.</p>
<p align="justify"><strong>proteome map: </strong> A number of organizations have announced plans to produce a map of the proteome, including Myriad Genetics, Large Scale Biology, CuraGen and others.</p>
<p align="justify"><strong>QTL mapping Quantitative Trait Loci mapping: </strong> A phenotype driven approach to gene function. As such it permits the discovery of new genes and can be contrasted with gene- driven approaches such knock-out and knock-in mice which allow for the study of known genes. QTL reflect natural genetic variations as they exist in the mouse strains under study. We are limited to detecting those genes that vary among the available strains. However the natural variations among mouse strains are vast and largely untapped.</p>
<p align="justify"><strong>RFLP (Restriction Fragment Length Polymorphism): </strong> See Genetic variations glossary Polymorphic sequences that result in RFLPs are used as markers on both physical maps and genetic linkage maps. RFLPs are usually caused by mutation at a cutting site.</p>
<p align="justify"><strong>Radiation Hybrid RH maps: </strong> Chromosome maps that are calculated from RH score vectors. An RH score vector is the pattern of assay results of a particular STS (marker) on a particular panel. The vector consists of 1&#8217;s (did amplify) and 0&#8217;s (did not amplify). Simplistically speaking, the more similar two score vectors are, the closer the markers are on the chromosome.</p>
<p align="justify"><strong>radiation hybrid mapping: </strong> A method for ordering genetic loci along CHROMOSOMES. The method involves fusing irradiated donor cells with host cells from another species. Following cell fusion, fragments of DNA from the irradiated cells become integrated into the chromosomes of the host cells. Molecular probing of DNA obtained from the fused cells is used to determine if two or more genetic loci are located within the same fragment of donor cell DNA.</p>
<p align="justify"><strong>restriction map: </strong>A description of restriction endonuclease cleavage sites within a piece of DNA. Generating such a map is usually the first step in characterizing an unknown DNA, and a prerequisite to manipulating it for other purposes. Typically, restriction enzymes that cleave DNA infrequently (e.g. those with 6 bp recognition sites) and are relatively inexpensive are used to produce at a map.</p>
<p align="justify"><strong>restriction mapping: </strong> Use of restriction endonucleases to analyze and generate a physical map of genomes, genes, or other segments of DNA.</p>
<p align="justify"><strong>SNP maps: </strong> A collection of SNPs that can be superimposed over the existing genome map, creating greater detail and facilitating further genetic studies.</p>
<p align="justify">Current estimates indicate that a very dense marker map (30,000 &#8211; 1,000,000 variants) would be required to perform haplotype &#8211; based association studies. We have constructed a SNP map of the human genome with sufficient density to study human haplotype structure, enabling future study of human medical and population genetics.</p>
<p align="justify"><strong>telomere maps: </strong> Telomeres are the tips of the chromosomes. They are crucial in maintaining the chromosomes&#8217; stability and are important in the cell cycle and ageing. Because of the way the physical maps are constructed, many telomeres of chromosomes are left out.</p>
<p align="justify">transcript maps: In only a year or two, most human genes will be sequence- tagged and placed on various physical maps. Such a ï¿½transcript map&#8217; (or ï¿½expression map&#8217;) of the genome will be an important part of the sequencing infrastructure, as well as a critical resource for the positional candidate approach to gene cloning. One of the specific goals of the US Human Genome Project is the construction of a high resolution STS map of the genome. .. One of the early problems with gene- based STSs was that there simply were not enough unique human gene sequences to bother with. But all of that changed with the advent of EST sequencing, at which time several groups began mapping ESTs albeit on a limited scale and only to the resolution of a chromosome assignment.</p>
<p align="justify"><strong>transcriptome maps: </strong> Consist of &#8220;expression clusters&#8221; of co-regulated genes. Challenges ahead for computational biology include the integration of clusters obtained for the transcriptome, the interactome, the phenome, and the localizome.</p>
<p align="justify"><strong>whole genome clone- based maps: </strong>In their paper, the International Human Genome Mapping Consortium describe how they constructed the first whole- genome physical map, how they created the templates from which the genome was sequenced and demonstrated how the map was essential for the accurate assembly of the human genome by the publicly funded effort. Four short reports accompanying the whole- genome mapping paper (Bruls; Bentley; Kucherlaparti; Page), describe alternative mapping strategies that were implemented for chromosomes 12, 14 and Y, as well as a host of other chromosomes. Information from all these papers were integrated into the whole- genome paper and demonstrate how a rich resource of mapping information can be generated by the cooperation of international independent efforts.</p>
<p align="justify"><strong>YAC maps: </strong>Yeast artificial chromosome maps, a type of physical map.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/genomic-glossary-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics- Sequence analysis</title>
		<link>http://bioinformatics.me/bioinformatics-sequence-analysis-2/</link>
		<comments>http://bioinformatics.me/bioinformatics-sequence-analysis-2/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 16:25:01 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-sequence-analysis-2/</guid>
		<description><![CDATA[
Sequence analysis is the application of Information Technologies to Molecular Biology. It deals with biological sequences, and processes them to extract significant information that may yield new insights and guidelines in the understanding of biological organisms
Basics for sequence analysis 
Proteins
A protein is typically built of a series of basic blocks called amino acids , chained [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/sequence_analysis_bioinformatics.gif" alt="sequence_analysis_bioinformatics" width="472" height="292" class="aligncenter size-full wp-image-377" /></p>
<p align="justify">Sequence analysis is the application of Information Technologies to Molecular Biology. It deals with biological sequences, and processes them to extract significant information that may yield new insights and guidelines in the understanding of biological organisms</p>
<p align="justify"><strong>Basics for sequence analysis </strong></p>
<h3>Proteins</h3>
<p>A protein is typically built of a series of basic blocks called amino acids , chained together in a linear sequence of blocks. Amino acids may come in a variety of shapes and properties: they may be small or bulky, hidrophobic or hidrophyllic, electrically charged or neutral, etc&#8230; hence allowing for very complex shapes and interactions to be produced.</p>
<p align="justify">Amino acids are commonly referred to by name or by an abbreviation, usually in three or one letter. This allows for more efficient descriptions of how they are chained together to build a protein:</p>
<div>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td>
<p align="center">Neutral-Nonpolar</p>
</td>
<td>
<p align="center">3-letter</p>
</td>
<td>
<p align="center">1-letter</p>
</td>
</tr>
<tr>
<td>
<p align="center">Glycine</p>
</td>
<td>
<p align="center">Gly</p>
</td>
<td>
<p align="center">G</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Alanine</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">A</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Valine</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">V</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Isoleucine</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">I</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Leucine</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">L</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Phenylalanine</p>
</td>
<td>
<p align="center">Phe</p>
</td>
<td>
<p align="center">F</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Proline</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">P</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Methionine</p>
</td>
<td>
<p align="center">Met</p>
</td>
<td>
<p align="center">M</p>
</td>
</tr>
<tr>
<td>
<p align="center">Neutral-Polar</p>
</td>
<td>
<p align="center">
</td>
<td>
<p align="center">
</td>
</tr>
<tr>
<td>
<p align="center">L-Serine</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">S</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Threonine</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">T</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Tyrosine</p>
</td>
<td>
<p align="center">Tyr</p>
</td>
<td>
<p align="center">Y</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Tryptophan</p>
</td>
<td>
<p align="center">Trp</p>
</td>
<td>
<p align="center">W</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Asparagine</p>
</td>
<td>
<p align="center">Asn</p>
</td>
<td>
<p align="center">N</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Glutamine</p>
</td>
<td>
<p align="center">Gln</p>
</td>
<td>
<p align="center">Q</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Cysteine</p>
</td>
<td>
<p align="center">Cys</p>
</td>
<td>
<p align="center">C</p>
</td>
</tr>
<tr>
<td>
<p align="center">Acidic</p>
</td>
<td>
<p align="center">
</td>
<td>
<p align="center">
</td>
</tr>
<tr>
<td>
<p align="center">L-Aspartic</p>
</td>
<td>
<p align="center">Asp</p>
</td>
<td>
<p align="center">D</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Glutamic</p>
</td>
<td>
<p align="center">Glu</p>
</td>
<td>
<p align="center">E</p>
</td>
</tr>
<tr>
<td>
<p align="center">Basic</p>
</td>
<td>
<p align="center">
</td>
<td>
<p align="center">
</td>
</tr>
<tr>
<td>
<p align="center">L-Lysine</p>
</td>
<td>
<p align="center">Lys</p>
</td>
<td>
<p align="center">K</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Arginine</p>
</td>
<td>
<p align="center">Arg</p>
</td>
<td>
<p align="center">R</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Histidine</p>
</td>
<td>
<p align="center">His</p>
</td>
<td>
<p align="center">H</p>
</td>
</tr>
</tbody>
</table>
</div>
<h3>Nucleic Acids</h3>
<p align="justify">For them the number of basic building blocks is a lot smaller, each nucleic acid chain being composed of series of only four possible different nucleotides which furthermore provide for a very limited set of interactions.</p>
<p align="justify">Nucleic acids come in two flavors: DNA (DeoxyriboNucleic Acid) and RNA (RiboNucleic Acid). Both of them consist of a series of nucleotides that are glued one after the other to constitute the sequence of blocks that make up the functional chain.</p>
<p align="justify">Nucleotides are composed of a phosphate group, a sugar (ribose in RNA, and deoxyribose in DNA) and a base which marks the specific difference among nucleotides. The base may be one of guanine, cytosine, adenine and thymine in the case of DNA or guanine, cytosine, adenine or uracil for RNA. They can be referred to by their one letter abbreviations G, C, A, T and U. Interactions are mainly driven by the stablishment of hydrogen bonds, which can only be established among thymine (or uracil) and adenine (two hydrogen bonds) and cytosine and guanine (three hydrogen bonds).</p>
<p align="justify">As we said previously, the main role of nucleic acids is to convey all the genetic information needed to make proteins and control the building process. Protein sequences are coded by nucleic acids using groups three of nucleotides that code for a given amino acid: the code is more or less universal with little exceptions, and includes redundancy to increase the fidelity of the reading process when making duplicates or translating the information:</p>
<div>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td>
<p align="center">UUU</p>
</td>
<td>
<p align="center">Phe</p>
</td>
<td>
<p align="center">UCU</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAU</p>
</td>
<td>
<p align="center">Tyr</p>
</td>
<td>
<p align="center">UGU</p>
</td>
<td>
<p align="center">Cys</p>
</td>
</tr>
<tr>
<td>
<p align="center">UUC</p>
</td>
<td>
<p align="center">Phe</p>
</td>
<td>
<p align="center">UCC</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAC</p>
</td>
<td>
<p align="center">Tyr</p>
</td>
<td>
<p align="center">UGC</p>
</td>
<td>
<p align="center">Cys</p>
</td>
</tr>
<tr>
<td>
<p align="center">UUA</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">UCA</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAA</p>
</td>
<td>
<p align="center">Stop</p>
</td>
<td>
<p align="center">UGA</p>
</td>
<td>
<p align="center">Stop</p>
</td>
</tr>
<tr>
<td>
<p align="center">UUG</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">UCG</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAG</p>
</td>
<td>
<p align="center">Stop</p>
</td>
<td>
<p align="center">UGG</p>
</td>
<td>
<p align="center">Trp</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUU</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCU</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAU</p>
</td>
<td>
<p align="center">His</p>
</td>
<td>
<p align="center">CGU</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUC</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCC</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAC</p>
</td>
<td>
<p align="center">His</p>
</td>
<td>
<p align="center">CGC</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUA</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCA</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAA</p>
</td>
<td>
<p align="center">Gln</p>
</td>
<td>
<p align="center">CGA</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUG</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCG</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAG</p>
</td>
<td>
<p align="center">Gln</p>
</td>
<td>
<p align="center">CGG</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUU</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">ACU</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAU</p>
</td>
<td>
<p align="center">Asn</p>
</td>
<td>
<p align="center">AGU</p>
</td>
<td>
<p align="center">Ser</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUC</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">ACC</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAC</p>
</td>
<td>
<p align="center">Asn</p>
</td>
<td>
<p align="center">AGC</p>
</td>
<td>
<p align="center">Ser</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUA</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">ACA</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAA</p>
</td>
<td>
<p align="center">Lys</p>
</td>
<td>
<p align="center">AGA</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUG</p>
</td>
<td>
<p align="center">Met</p>
</td>
<td>
<p align="center">ACG</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAG</p>
</td>
<td>
<p align="center">Lys</p>
</td>
<td>
<p align="center">AGG</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUU</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">GCU</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAU</p>
</td>
<td>
<p align="center">Asp</p>
</td>
<td>
<p align="center">GGU</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUC</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">GCC</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAC</p>
</td>
<td>
<p align="center">Asp</p>
</td>
<td>
<p align="center">GGC</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUA</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">GCA</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAA</p>
</td>
<td>
<p align="center">Glu</p>
</td>
<td>
<p align="center">GGA</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUG</p>
</td>
<td>
<p align="center">Val*</p>
</td>
<td>
<p align="center">GCG</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAG</p>
</td>
<td>
<p align="center">Glu</p>
</td>
<td>
<p align="center">GGG</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
</tbody>
</table>
<table border="0" cellpadding="0">
<tbody>
<tr>
<td>* GUG may also code for the initiator Met. This triplet is therefore &#8220;ambiguous&#8221;.</td>
</tr>
</tbody>
</table>
</div>
<p align="justify">Regulation of expression is encoded as specific patterns that are to be recognized by the translation machinery under appropriate circumstances.</p>
<p align="justify"><strong>Sequence databases:</strong></p>
<p align="justify">For overview of database &#8211; <a href="http://www.geocities.com/bioinformaticsweb/data.html">click here </a></p>
<p align="justify">For complete sequence database listing &#8211; <a href="http://www.geocities.com/bioinformaticsweb/datalink.html">click here </a></p>
<p align="justify"><strong>Overview of sequence analysis tools</strong></p>
<p align="justify"><a>Sequence Comparison </a></p>
<p align="justify">An alignment is an arrangement of two sequences, which shows where the two sequences are similar, and where they differ. An optimal alignment, of course, is one that exhibits the most similarities, and the least differences. Broadly, there are three categories of methods for sequence comparison.</p>
<p align="justify">•  <strong>Segment methods </strong> compare all overlapping segments of a predetermined length (e.g., 10 amino acids) from one sequence to all segments from the other. This is the approach used in dotplots.</p>
<p align="justify">•  <strong>Optimal global alignment </strong> methods allow the best overall score for the comparison of the two sequences to be obtained, including a consideration of gaps. These programs align sequences over their whole length.</p>
<p align="justify">•  <strong>Optimal local alignment </strong> algorithms seek to identify the best local similarities between two sequences also including explicit consideration of gaps. Alignment may only be over a short span of sequence.</p>
<p align="justify">
<h3><a>Dotplots </a></h3>
<p align="justify">The most intuitive representation of the comparison between two sequences is using dotplots. One sequence is represented on each axis and significant matching regions are distributed along diagonals in the matrix.</p>
<p align="justify">There are two different algorithms that are commonly used in creating dotplots. The first method involves matching identical regions of sequence and plotting a dot in these areas. The second involves using &#8220;sliding windows&#8221; to compare two sequences using a threshold score ` <a href="http://portal.rfcgr.mrc.ac.uk/Courses/Jemboss_3day/Footnotes/Chapt5Fd0e1627.html">* </a>&#8216; value. A window size is selected as a run of adjacent nucleotide or amino acid residues, and a score chosen to reflect the degree of similarity of sequence required. Each window of sequence A is compared to each window of sequence B, and a dot is only placed in that region if the match scores or exceeds the set threshold level.</p>
<p align="justify"><strong>Online tool links: </strong></p>
<p align="justify"><a href="http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html" target="_blank">Dotlet Programme </a></p>
<p align="justify"><a href="http://www.isrec.isb-sib.ch/java/dotlet/dotlet_examples.html" target="_blank">Learn dotlet by example</a></p>
<p align="justify"><a><strong>Sequence alignment </strong></a></p>
<p align="justify">The algorithms we will be using are more rigorous than those used for searching databases; so even if you have retrieved a sequence from a database using something like <strong>BLAST. </strong> The basic idea behind the sequence alignment programs is to align the two sequences in such a way as to produce the highest score &#8211; a scoring matrix is used to add points to the score for each match and subtract them for each mismatch. The matrices commonly used for scoring protein alignments are more complex than the simple match/mismatch matrices used for DNA sequences such as the one we saw earlier; the scores that form the protein matrices are designed to reflect similarity between the different amino acids rather than simply scoring identities. Over time various mutations occur in sequences; the scoring matrices attempt to cope with mutations, but insertions and deletions require some extra parameters to allow the introduction of gaps in the alignment. There are penalties both for the creation of gaps and for the extension of existing ones; the default gap parameters given in alignment programs have been found to be empirically correct with test sequences but you should experiment with different gap penalties.</p>
<h3><a>BLAST </a></h3>
<p align="justify"><strong>BLAST </strong> (Basic Local Alignment Search Tool) is a heuristic method to find the highest scoring locally optimal alignments between a query sequence and a database. Previous versions of <strong>BLAST </strong> did not allow gapped alignments, but <strong>BLAST2 </strong> (from the HGMP-RC telnet and www menus) does. A gapped <strong>BLAST </strong> search allows gaps (deletions and insertions) to be introduced into the alignments that are returned. Allowing gaps means that similar regions are not broken into several segments. The scoring of these gapped alignments tends to reflect biological relationships more closely.</p>
<p align="justify">The <strong>BLAST </strong> algorithm and family of programs rely on work on the statistics of local sequence alignments by Altschul et al[]. The statistics allow us to estimate the probability of obtaining an alignment with a particular score. The <strong>BLAST </strong> algorithm permits nearly all sequence matches above a cutoff ` <a href="http://portal.rfcgr.mrc.ac.uk/Courses/Jemboss_3day/Footnotes/Chapt6Fd0e2779.html">* </a>&#8216; to be located efficiently in a database.</p>
<p align="justify">The algorithm operates as follows:</p>
<p align="justify">•  <strong>BLAST </strong> scans the database for words (typically 3-mers for proteins) that score at least T (a designated threshold value) when aligned with a word in the query sequence &#8211; such aligned pairs are called hits.</p>
<p align="justify">•  If a second non-overlapping hit is found within a distance A of the first and on the same diagonal, the first hit is extended between the database and query sequences in both directions. Extension continues, scoring all the time, until the running score drops below the maximum score seen so far by a value X. The resulting local alignment is called an HSP (high-scoring segment pair) or MSP (maximum scoring segment pair).</p>
<p align="justify">•  If the alignment score of the HSP exceeds a given value Sg (the gapped score), then a gapped extension of the HSP is initiated.</p>
<p align="justify">Earlier versions of <strong>BLAST </strong> looked only for single hits and extended them all; however, the extensions did not incorporate gaps and thus missed some potentially interesting matches. The gapped extension currently used, takes much longer to execute, but speed is improved overall by the requirement for two non-overlapping close hits before the initial extension is triggered, and the value of Sg is chosen so that only about one extension is triggered per 50 database sequences.</p>
<p align="justify">These modifications to <strong>BLAST </strong> mean that it now runs three times faster than earlier versions and in trials it found more statistically significant alignments than the old <strong>BLAST </strong>.</p>
<h3><em>BLAST FAMILY OF PROGRAMS </em></h3>
<p align="justify">The <strong>BLAST </strong> family of programs allows all combinations of DNA or protein query sequences with searches against DNA or protein databases. (Most of the time use of these is behind an interface.)</p>
<p align="justify">•  <strong>blastp: </strong> compares an amino acid query sequence against a protein sequence database.</p>
<p align="justify">•  <strong>blastn: </strong> compares a nucleotide query sequence against a nucleotide sequence database.</p>
<p align="justify">•  <strong>blastx: </strong> compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.</p>
<p align="justify">•  <strong>tblastn: </strong> compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).</p>
<p align="justify">•  <strong>tblastx: </strong> compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.</p>
<p align="justify">•  <strong>PSI-Blast: </strong> Position-Specific Iterated <strong>BLAST </strong>. This is potentially a very sensitive method to pull out significant hits in a protein-protein database search. This first performs a gapped BLAST database search and then uses the information from any significant alignments returned to construct a position-specific score matrix, which replaces the query sequence for the next round of database searching. <strong>PSI-Blast </strong> may be iterated until no new significant alignments are found. We&#8217;ll look at this tomorrow when we do some protein analysis.</p>
<p align="justify"><strong>Online tool links: </strong></p>
<p align="justify"><a href="http://www.ncbi.nlm.nih.gov/BLAST/" target="_blank">NCBI &#8211; Blast </a></p>
<p align="justify"><a href="http://www.ebi.ac.uk/blast2/index.html" target="_blank">WU &#8211; Blast@ EBI </a></p>
<p align="justify">
<p align="justify"><a><strong>Global sequence alignment </strong></a></p>
<p align="justify">A global alignment is one that compares the two sequences over their entire lengths, and is appropriate for comparing sequences that are expected to share similarity over the whole length. The alignment maximises regions of similarity and minimises gaps using the scoring matrices and gap parameters provided to the program.</p>
<p align="justify"><strong>Online tool link: </strong></p>
<p align="justify"><a href="http://www.ebi.ac.uk/clustalw/index.html" target="_blank">Clustalw @ EBI </a></p>
<p align="justify"><a><strong>Local sequence alignment </strong></a></p>
<p align="justify">global sequence alignment algorithms align sequences over their entire lengths. You do need to think about whether that type of alignment makes sense for your sequences. For our example, where we expect each exon to be represented in the sequences and in the same order, it has worked well &#8211; however, how well do you think this approach would work with, for example, multidomain proteins that share one domain but not others, or sequences where there have been regions of duplication? A second comparison method, local alignment, searches for regions of local similarity and need not include the entire length of the sequences.</p>
<p align="justify"><strong>Online tool links: </strong></p>
<p align="justify"><a href="http://www.ebi.ac.uk/emboss/align/index.html" target="_blank">Pairwise alignment at EBI </a></p>
<p align="justify"><a href="http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi" target="_blank">Pairwise alignment at NCBI </a></p>
<h3><a>Protein Sequence Analyisis </a></h3>
<p align="justify">You can get a variety of clues by looking for patterns and motifs in your sequence:</p>
<p align="justify">•  These are often derived from multiple sequence alignments.</p>
<p align="justify">•  Conserved protein domains or regions can be very useful in trying to determine which protein family a sequence belongs to, catalytic sites, carbohydrate binding sites etc.</p>
<p align="justify">•  Various research groups have created their own databases and search tools; it might be worth using a variety of these.</p>
<h3><em>FIND HOMOLOGOUS ( PARALOGOUS AND ORTHOLOGOUS) SEQUENCES </em></h3>
<p align="justify">Using a database similarity search can give you a great deal of information:</p>
<p align="justify">•  Homologues may be well annotated and their function documented in the literature.</p>
<p align="justify">•  Simply comparing your sequence with homologues can tell you a lot.</p>
<p align="justify">•  Phylogenetic analysis may reveal evolutionary relationships between proteins and help you decide which family or super family a protein belongs to.</p>
<p align="justify">•  N.B. Be aware of convergent evolution.</p>
<h3><em>HAVING SOME IDEA OF STRUCTURE MAY HELP YOU PREDICT POSSIBLE FUNCTIONS </em></h3>
<p align="justify">Knowing the protein fold(s) together with conserved domains (or even residues) may tell you what type of functions this protein could have.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-sequence-analysis-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to Microarray</title>
		<link>http://bioinformatics.me/introduction-to-microarray/</link>
		<comments>http://bioinformatics.me/introduction-to-microarray/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 00:16:28 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/introduction-to-microarray/</guid>
		<description><![CDATA[
Microarray-Definition
A 2D array, typically on a glass, filter, or silicon wafer, upon which genes or gene fragments are deposited or synthesized in a predetermined spatial order allowing them to be made available as probes in a high-throughput, parallel manner.
Microarrays that consist of ordered sets of DNA fixed to solid surfaces provide pharmaceutical firms with a [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/microarray_introduction.jpg" alt="microarray_introduction" width="439" height="427" class="aligncenter size-full wp-image-394" /><br />
<strong>Microarray-Definition</strong><br />
A 2D array, typically on a glass, filter, or silicon wafer, upon which genes or gene fragments are deposited or synthesized in a predetermined spatial order allowing them to be made available as probes in a high-throughput, parallel manner.</p>
<div>Microarrays that consist of ordered sets of DNA fixed to solid surfaces provide pharmaceutical firms with a means to identify drug targets.<br />
In the future, the emerging technology promises to help physicians decide the most effective drug treatments for individual patients.</div>
<p align="justify">Microarrays are simply ordered sets of DNA molecules of known sequence. Usually rectangular, they can consist of a few hundred to hundreds of thousands of sets. Each individual feature goes on the array at precisely defined location on the substrate. The identity of the DNA molecule fixed to each feature never changes. Scientists use that fact in calculating their experimental results. Microarray analysis permits scientists to detect thousands of genes in a small sample simultaneously and to analyze the expression of those genes. As a result, it promises to enable biotechnology and pharmaceutical companies to identify drug targets &#8211; the proteins with which drugs actually interact. Since it can also help identify individuals with similar biological patterns, microarray analysis can assist drug companies in choosing the most appropriate candidates for participating in clinical trials of new drugs. In the future, this emerging technology has the potential to help medical professionals select the most effective drugs, or those with the fewest side effects, for individual patients.</p>
<p align="justify"><strong>Potential of Microarray analysis:<br />
</strong>The academic research community stands to benefit from microarray technology just as much as the pharmaceutical industry. The ability to use it in place of existing technology will allow researchers to perform experiments faster and more cheaply, and will enable them to concentrate on analyzing the results of microarray experiments rather than simply performing the experiments. This research could then lead to a better understanding of the disease process. That will require many different levels of research. While the field of expression has received most attention so far, looking at the gene copy level and protein level is just as important. Microarray technology has potential applications in each of these three levels.</p>
<p align="justify">Identifying drug targets provided the initial market for the microarrays. A good drug target has extraordinary value for developing pharmaceuticals. By comparing the ways in which genes are expressed in a normal and diseased heart, for example, scientists might be able to identify the genes and hence the associated proteins &#8212; that are part of the disease process. Researchers could then use that information to synthesize drugs that interact with these proteins, thus reducing the disease&#8217;s effect on the body.</p>
<p align="justify">Gene sequences can be measured simultaneously and calculated instantly when an ordered set of DNA molecules of known sequence a microarray is used. Consequently, scientists can evaluate an entire set of genes at once, rather than looking at physiological changes one gene at a time. For example, Genetics Institute, a biotechnology company in Cambridge, Massachusetts, built an array consisting of genes for cytokines, which are proteins that affect cell physiology during the inflammatory response, among other effects. The full set of DNA molecules contained more than 250 genes. While that number was not large by current standards of microarrays, it vastly outnumbered the one or two genes examined in typical pre-microarray experiments. The Genetics Institute scientists used the array to study how changes experienced by cells in the immune system during the inflammatory response are reflected in the behavior of all 250 genes at the same time. This experiment established the potential for using the patterns of response to help locate points in the body at which drugs could prove most effective.</p>
<p align="justify"><strong>Microarray Products:<br />
</strong>Within that basic technological foundation, microarray companies have created a variety of products and services. They range in price, and involve several different technical approaches. A kit containing a simple array with limited density can cost as little as $1,100, while a versatile system favored by R&amp;D laboratories in pharmaceutical and biotechnology companies costs more than $200,000. The differences among products lies in the basic components and the precise nature of the DNA on the arrays.</p>
<p align="justify">The type of molecule placed on the array units also varies according to circumstances. The most commonly used molecule is cDNA, or complementary DNA, which is derived from messenger RNA and cloned. Since they are derived from a distinct messenger RNA, each feature represents an expressed gene.</p>
<p align="justify"><strong>Microarray-Identifying interactions: </strong></p>
<p align="justify">To detect interactions at microarray features, scientists must label the test sample in such a way that an appropriate instrument can recognize it. Since the minute size of microarray features limits the amount of material that can be located at any feature, detection methods must be extremely sensitive.</p>
<p align="justify">Other than a few low-end systems that use radioactive or chemiluminescent tagging, most microarrays use fluorescent tags as their means of identification. These labels can be delivered to the DNA units in several different ways. One simple and flexible approach involves attaching a fluorophore such as fluorescein or Cy3 to the oligonucleotide layer. While relatively simple, this approach has low sensitivity because it delivers only one unit of label per interaction. Technologists can achieve more sensitivity by multiplexing the labeled entity &#8212; that is, delivering more than one unit of label per interaction.</p>
<p align="justify"><strong>Microarrays and bioinformatics</strong><br />
Experimental Design Due to the biological complexity of gene expression, the considerations of experimental design that are discussed in the expression profiling article are of critical importance if statistically and biologically valid conclusions are to be drawn from the data. Standardization<br />
The lack of standardization in arrays presents an interoperability problem in bioinformatics, which hinders the exchange of array data. Various grass-roots open-source projects are attempting to facilitate the exchange and analysis of data produced with non-proprietary chips. The &#8220;Minimum Information About a Microarray Experiment&#8221; (MIAME) checklist helps define the level of detail that should exist and is being adopted by many journals as a requirement for the submission of papers incorporating microarray results. MIAME describes possible content but is not a format, many formats can in turn support the MIAME requirements yet there is no way to computationally determine semantic compliance.<br />
There is currently an ongoing project being conducted by the FDA to develop standards and quality control metrics which will eventually allow the use of MicroArray data in drug discovery, clinical practice and regulatory decision-making.</p>
<p align="justify"><strong>Statistical analysis</strong><br />
The analysis of DNA microarrays poses a large number of statistical problems, including the normalization of the data. There are dozens of proposed normalization methods in the published literature; as in many other cases where authorities disagree, a sound conservative approach is to try a number of popular normalization methods and compare the conclusions reached: how sensitive are the main conclusions to the method chosen? From a hypothesis-testing perspective, the large number of genes present on a single array means that the experimenter must take into account a multiple testing problem: even if each gene is extremely unlikely to randomly yield a result of interest, the combination of all the genes is likely to show at least one or a few occurrences of this result which are false positives.<br />
A basic difference between microarray data analysis and much traditional biomedical research is the dimensionality of the data. A large clinical study might collect, say, 100 data items per patient for thousands of patients. A medium-size microarray study will obtain many thousands of numbers per sample for perhaps a hundred samples. Many analysis techniques treat each sample as a single point in a space with thousands of dimensions, then attempt by various techniques to reduce the dimensionality of the data to something humans can visualize.</p>
<p align="justify"><strong>Relation between probe and gene</strong><br />
The relation between a probe and the mRNA that it is expected to detect is problematic. On the one hand, some mRNAs may cross-hybridize probes in the array that are supposed to detect another mRNA. On the other hand, probes that are designed to detect the mRNA of a particular gene may be relying on genomic EST information that is incorrectly associated with that gene.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/introduction-to-microarray/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Proteomics-Introduction</title>
		<link>http://bioinformatics.me/proteomics-introduction/</link>
		<comments>http://bioinformatics.me/proteomics-introduction/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 00:11:54 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/proteomics-introduction/</guid>
		<description><![CDATA[ 
Proteomics-Introduction
Definition: &#8220;The analysis of complete complements of proteins. Proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and, ultimately, their function. Initially encompassing just two- dimensional (2D) gel electrophoresis for protein separation and identification, proteomics now refers to any procedure that characterizes large [...]]]></description>
			<content:encoded><![CDATA[<p> <img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/proteomics_introduction.jpg" alt="Proteomics-Introduction" width="368" height="350" class="aligncenter size-full wp-image-391" /><br />
Proteomics-Introduction<br />
Definition: &#8220;The analysis of complete complements of proteins. Proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and, ultimately, their function. Initially encompassing just two- dimensional (2D) gel electrophoresis for protein separation and identification, proteomics now refers to any procedure that characterizes large sets of proteins. The explosive growth of this field is driven by multiple forces &#8211; genomics and its revelation of more and more new proteins; powerful protein technologies, such as newly developed mass spectrometry approaches, global [yeast] two- hybrid techniques, and spin- offs from DNA arrays; and innovative computational tools and methods to process, analyze, and interpret prodigious amounts of data.&#8221;</p>
<p>The theme of molecular biology research, in the past, has been oriented around the gene rather than the protein. This is not to say that researchers have neglected to study proteins, but rather that the approaches and techniques most commonly used have looked primarily at the nucleic acids and then later at the protein(s) implicated.</p>
<p>The main reason for this has been that the technologies available, and the inherent characteristics of nucleic acids, have made the genes the low hanging fruit. This situation has changed recently and continues to change as larger scale, higher throughput methods are developed for both nucleic acids and proteins. The majority of processes that take place in a cell are not performed by the genes themselves, but rather by the proteins that they code for.</p>
<p>A disease can arise when a gene/protein is over- or under-expressed, or when a mutation in a gene results in a malformed protein, or when post translational modifications alter a protein&#8217;s function. Thus to truly understand a biological process, the relevant proteins must be studied directly. But there are more challenges when studying proteins compared to studying genes, due to their complex 3-D structure which is related to the function, analogous to a machine.</p>
<p>Proteomics is defined as the systematic large-scale analysis of protein expression under normal and perturbed (stressed, diseased, and/or drugged) states, and generally involves the separation, identification, and characterization of all of the proteins in a cell or tissue sample. The meaning of the term has also been expanded, and is now used loosely to refer to the approach of analyzing which proteins a particular type of cell synthesizes, how much the cell synthesizes, how cells modify proteins after synthesis, and how all of those proteins interact.</p>
<p>There are orders of magnitude more proteins than genes in an organism &#8211; based on alternative splicing (several per gene) and post translational modifications (over 100 known), there are estimated to be a million or more.</p>
<p>Fortunately there are features such as folds and motifs, which allow them to be categorized into groups and families, making the task of studying them more tractable. There is a broad range of technologies used in proteomics, but the central paradigm has been the use of 2-D gel electrophoresis (2D-GE) followed by mass spectrometry (MS). 2D-GE is used to first separate the proteins by isoelectric point and then by size.</p>
<p>The individual proteins are subsequently removed from the gel and prepared, then analyzed by MS to determine their identity and characteristics. There are various types of mass analyzers used in proteomics MS including quadrupole, time-of-flight (TOF), and ion trap, and each has its own particular capabilities. Tandem arrangements are often used, such as quadrupole-TOF, to provide more analytical power. The recent development of soft ionization techniques, namely matrix-assisted laser desorption ionization (MALDI) and electro-spray ionization (ESI), has allowed large biomolecules to be introduced into the mass analyzer without completely decomposing their structures, or even without breaking them at all, depending on the design of the experiment.</p>
<p>There are techniques which incorporate liquid chromatography (LC) with MS, and others that use LC by itself. Robotics have been applied to automate several steps in the 2DGE-MS process such as spot excision and enzyme digests. To determine a protein&#8217;s structure, XRD and NMR techniques are being improved to reach higher throughput and better performance.</p>
<p>For example, automated high-throughput crystallization methods are being used upstream of XRD to alleviate that bottleneck. For NMR, cryo-probes and flow probes shorten analysis time and decrease sample volume requirements. The hope is that determining about 10,000 protein structures will be enough to characterize the estimated 5,000 or so folds, which will feed into more reliable in silico structural prediction methods.</p>
<p>Structure by itself does not provide all of the desired information, but is a major step in the right direction. Protein chips are being developed for many of the processes in proteomics. For example, researchers are developing protocols for protein microarrays at institutions such as Harvard and Stanford as well as at several companies. These chips &#8211; grids of attached peptide fragments, attached antibodies, or gel &#8220;pads&#8221; with proteins suspended inside &#8211; will be used for various experiments such as protein-protein interaction studies and differential expression analysis.</p>
<p>They can also be used to filter out high abundance proteins before further experiments; one of the major challenges in proteomics is isolating and analyzing the low abundance proteins, which are thought to be the most important. There are many other types of protein chips, and the number will continue to grow. For example, microfluidics chips can combine the sample preparation steps prior to MS, such as enzyme digests, with nanoelectrospray ionization, all on the one chip. Or, the samples can be ionized directly off of the surface of the chip, similar to a MALDI target. Microfluidics chips are also being combined with NMR.</p>
<p>In the next few years, various protein chips will be used increasingly in diagnostic applications as well. The bioinformatics side of proteomics includes both databases and analysis software. There are many public and private databases containing protein data ranging from sequences, to functions, to post translational modifications. Typically, a researcher will first perform 2D-GE followed by MS; this will result in a fingerprint, molecular weight, or even sequence for each protein of interest, which can then be used to query databases for similarities or other information.</p>
<p>Swiss-Prot and TrEMBL, developed in a collaboration between the Swiss Institute of Bioinformatics and the European Bioinformatics Institute, are currently the major databases dedicated to cataloging protein data, but there are dozens of more specialized databases and tools. New bioinformatics approaches are constantly being introduced. Recent customized versions of PSI-BLAST can, for example, utilize not only the curated protein entries in Swiss-Prot but also linguistic analyses of biomedical journal articles to help determine protein family relationships. Publicly available databases and tools are popular, but there are also several companies offering subscriptions to proprietary databases, which often include protein-protein interaction maps generated using the yeast two-hybrid (Y2H) system.</p>
<p>The proteomics market is comprised of instrument manufacturers, bioinformatics companies, laboratory product suppliers, service providers, and other biotech related companies which can defy categorization. A given company can often overlap more than one of these areas. Many of the companies involved in the proteomics market are actually doing drug discovery as their major focus, while partnering, or providing services or subscriptions, to other companies to generate short term revenues. The market for proteomics products and services was estimated to be $1.0B in 2000, growing at a CAGR of 42% to about $5.8B in 2005.</p>
<p>The major drivers will continue to be the biopharmaceutical industry&#8217;s pursuit of blockbuster drugs and the recent technological advances which have allowed large-scale studies of genes and proteins. Alliances are becoming increasingly important in this field, because it is challenging for companies to find all of the necessary expertise to cover the different activities involved in proteomics. Synergies must be created by combining forces. For example, many companies working with mass spectrometry, both the manufacturers and end user labs, are collaborating with protein chip related companies. The technologies are a natural fit for many applications, such as microfluidic chips which provide nanoelectrospray ionization into a mass spectrometer.</p>
<p>There are many combinations of diagnostics, instrumentation, chip, and bioinformatics companies which create effective partnerships. In general, proteomics appears to hold great promise in the pursuit of biological knowledge. There has been a general realization that the large-scale approach to biology, as opposed to the strictly hypothesis-driven approach, will rapidly generate much more useful information.</p>
<p>The two approaches are not mutually exclusive, and the happy medium seems to be the formation of broad hypotheses which are subsequently investigated by designing large-scale experiments and selecting the appropriate data. Proteomics and genomics, and other varieties of &#8216;omics&#8217;, will all continue to complement each other in providing the tools and information for this type of research. </p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/proteomics-introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Genome Projects</title>
		<link>http://bioinformatics.me/genome-projects/</link>
		<comments>http://bioinformatics.me/genome-projects/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 23:58:28 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/genome-projects/</guid>
		<description><![CDATA[
Genome Projects
Genome projects are scientific endeavours that aim to map the genome of a living being or of a species (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus), that is, the complete set of genes caried by this living being or virus. The Human Genome Project [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/genomeprojects_bioinformatics.jpg" alt="genomeprojects_bioinformatics" width="400" height="373" class="aligncenter size-full wp-image-387" /><br />
<strong>Genome Projects</strong></p>
<p align="justify">Genome projects are scientific endeavours that aim to map the genome of a living being or of a species (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus), that is, the complete set of genes caried by this living being or virus. The Human Genome Project was such a project. Some have argued that the era of genomics is one of the more fundamental advances in human history.</p>
<p align="justify"><strong>Genome sequencing</strong></p>
<p align="justify">There are essentially two ways to sequence a genome. The BAC-to-BAC method, the first to be employed in human genome studies, is slow but sure. The BAC-to-BAC approach, also referred to as the map-based method, evolved from procedures developed by a number of researchers during the late 1980s and 90s and that continues to develop and change.*</p>
<p align="justify">The other technique, known as whole genome shotgun sequencing, brings speed into the picture, enabling researchers to do the job in months to a year. The shotgun method was developed by J. Craig Venter in 1996.</p>
<p align="justify"><strong>1.BAC to BAC Sequencing</strong></p>
<p align="justify">The BAC to BAC approach first creates a crude physical map of the whole genome before sequencing the DNA. Constructing a map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of DNA before taking a closer look and sequencing all the fragments.</p>
<p align="justify">1.Several copies of the genome are randomly cut into pieces base pairs (bp) long.</p>
<p align="justify">2.Each of these fragments is inserted into a BAC-a bacterial artificial chromosome. A BAC is a man made piece of DNA that can replicate inside a bacterial cell. The whole collection of BACs containing the entire human genome is called a BAC library, because each BAC is like a book in a library that can be accessed and copied.</p>
<p align="justify">3.These pieces are fingerprinted to give each piece a unique identification tag that determines the order of the fragments. Fingerprinting involves cutting each BAC fragment with a single enzyme and finding common sequence landmarks in overlapping fragments that determine the location of each BAC along the chromosome. Then overlapping BACs with markers every 100,000 bp form a map of each chromosome.</p>
<p>Each BAC is then broken randomly into 1,500 bp pieces and placed in another artificial piece of DNA called M13. This collection is known as an M13 library.</p>
<p align="justify">All the M13 libraries are sequenced. 500 bp from one end of the fragment are sequenced generating millions of sequences.These sequences are fed into a computer program called PHRAP that looks for common sequences that join two fragments together.</p>
<p align="justify"><strong>2.Whole Genome Shotgun Sequencing</strong></p>
<p align="justify">The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map. Therefore, it is much faster.</p>
<p align="justify">1.Multiple copies of the genome are randomly shredded into pieces that are 2,000 base pairs (bp) long by squeezing the DNA through a pressurized syringe. This is done a second time to generate pieces that are 10,000 bp long.</p>
<p align="justify">2.Each 2,000 and 10,000 bp fragment is inserted into a plasmid, which is a piece of DNA that can replicate in bacteria. The two collections of plasmids containing 2,000 and 10,000 bp chunks of human DNA are known as plasmid libraries.</p>
<p align="justify">3.Both the 2,000 and the 10,000 bp plasmid libraries are sequenced. 500 bp from each end of each fragment are decoded generating millions of sequences. Sequencing both ends of each insert is critical for the assembling the entire chromosome.</p>
<p align="justify">Computer algorithms assemble the millions of sequenced fragments into a continuous stretch resembling each chromosome.</p>
<p>comprehensive access to information regarding complete and ongoing genome projects, as well as metagenomes and metadata,<a href="http://genomesonline.org/index2.htm" target="_blank">Visit Gold</a>
</p>
<p align="justify"><span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/genome-projects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics-Virtual Drug Development</title>
		<link>http://bioinformatics.me/bioinformatics-virtual-drug-development/</link>
		<comments>http://bioinformatics.me/bioinformatics-virtual-drug-development/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 23:36:34 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-virtual-drug-development/</guid>
		<description><![CDATA[
These days, computers are an integral part of genomics-based drug discovery, helping researchers find drug targets by comparing databases of genomic information with annotations about functional information, by analyzing the data that comes in from various wetlab experiments, and by simply keeping track of the huge amounts of biological data being unearthed in life sciences [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/virutal_drug_development-300x255.jpg" alt="virutal_drug_development" width="500" height="255" class="aligncenter size-medium wp-image-384" /></p>
<p>These days, computers are an integral part of genomics-based drug discovery, helping researchers find drug targets by comparing databases of genomic information with annotations about functional information, by analyzing the data that comes in from various wetlab experiments, and by simply keeping track of the huge amounts of biological data being unearthed in life sciences research. This is the role of bioinformatics, a field that has exploded in importance over the last few years as companies have begun to realize they are drowning in raw data.</p>
<p align="justify">But now the uses of computers for other parts of the discovery and development process are coming to the fore. Theoretically, researchers could now test virtual drug compounds against virtual protein targets, study the virtual pharmacokinetics of their optimized virtual lead in what amounts to virtual animals, study its effects on virtual organs, design a virtual clinical trial to test assumptions and variances, and even answer some regulatory questions through simulation. Somewhere in that process, a chemist has to actually mix up a compound and conduct some experiment but buckets of silicon are being added to the discovery and development process every day, with the hope that the wet lab will one day become as dry as a sand box.</p>
<p align="justify">Twentieth century biology has been about cataloging the elements of life. Every day, we have a little more of the recipe of life, stretching before us as an almost endless line of As, Gs, Cs, and Ts&#8211;forming general sequences common to most living organisms, gene sequences common to most humans, polymorphisms peculiar to small subpopulations. But this static information amounts to little more than a parts catalog, a shopping list for a living organism. A vital thrust of 21st century biology will be the animation of these static parts.</p>
<p align="justify">After all, a long string of base pair letters is like well a long string of letters. It makes for a less interesting read than a telephone directory, and while it tells you how dial up all sorts of important proteins, most sequences alone tell you little more about a person than does their phone number. We cannot yet predict protein folding from amino acid sequence, nor can we accurately predict protein function from protein shape. We can, of course, correlate certain polymorphisms with likely disease outcomes, and we learn more every day. But the more we learn about the importance of these new variables, the more we have to take into consideration when developing clinical strategies, undertaking drug development, and designing clinical trials. And gene sequence, even when linked to functional information, will only be one of many variables to consider in optimally designing therapeutic interventions and treating disease.</p>
<p align="justify">One way of animating our growing store of static information is through computer simulation. It is an area that is beginning to emerge slowly in the life sciences, with only a handful of academic and commercial players active in the area. But for a fledging discipline, there is a great variety in the scope of work being undertaken. While academic labs try to create accurate simulations of red blood cells and simple bacteria, the private companies are taking on bolder projects&#8211;simulating human organs and even human diseases in their entirety.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-virtual-drug-development/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics and drug discovery</title>
		<link>http://bioinformatics.me/bioinformatics-and-drug-discovery/</link>
		<comments>http://bioinformatics.me/bioinformatics-and-drug-discovery/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 23:29:06 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-and-drug-discovery/</guid>
		<description><![CDATA[
In recent years, we have seen an explosion in the amount of biological information that is available. Various databases are doubling in size every 15 months and we now have the complete genome sequences of more than 100 organisms. It appears that the ability to generate vast quantities of data has surpassed the ability to [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/drug_discovery_bionformatics-300x200.jpg" alt="drug_discovery_bionformatics" width="500" height="300" class="aligncenter size-medium wp-image-380" /></p>
<p>In recent years, we have seen an explosion in the amount of biological information that is available. Various databases are doubling in size every 15 months and we now have the complete genome sequences of more than 100 organisms. It appears that the ability to generate vast quantities of data has surpassed the ability to use this data meaningfully. The pharmaceutical industry has embraced genomics as a source of drug targets. It also recognises that the field of bioinformatics is crucial for validating these potential drug targets and for determining which ones are the most suitable for entering the drug development pipeline.</p>
<p align="justify">Recently, there has been a change in the way that medicines are being developed due to our increased understanding of molecular biology. In the past, new synthetic organic molecules were tested in animals or in whole organ preparations. This has been replaced with a molecular target approach in which in-vitro screening of compounds against purified, recombinant proteins or genetically modified cell lines is carried out with a high throughput. This change has come about as a consequence of better and ever improving knowledge of the molecular basis of disease.</p>
<p align="justify">All marketed drugs today target only about 500 gene products. The elucidation of the human genome which has an estimated 30,000 to 40,000 genes, presents immense new opportunities for drug discovery and simultaneously creates a potential bottleneck regarding the choice of targets to support the drug discovery pipeline. The major advances in genomics and sequencing means that finding an attractive target is no longer a problem but finding the targets that are most likely to succeed has become the challenge. The focus of bioinformatics in the drug discovery process has therefore shifted from target identification to target validation.</p>
<p align="justify">A lot of factors need to be taken into account concerning a candidate target from a multitude of heterogeneous resources. The types of information that one needs to gather about potential targets include nucleotide and protein sequencing information, homologues, mapping information, function prediction, pathway information, disease associations, variants, structural information, gene and protein expression data and species/taxonomic distribution among others. Different bioinformatics tools can be used to gather this information. The accumulation of this information into databases about potential targets means that the pharmaceutical companies can save themselves much time, effort and expense exerting bench efforts on targets that will ultimately fail. The information that is gathered helps to characterise the different targets into families and subfamilies. It also classifies the behaviour of the different molecules in a biochemical and cellular context. Decisions about which families provide the best potential targets is guided by a number of criteria. It is important that the potential target has a suitable structure for interacting with drug molecules. Structural genomics helps to prioritise the families in terms of their 3D structures.</p>
<p align="justify">Sometimes we want to develop broad spectrum drugs that are effective against a wide range of pathogenic species while at other times we want to develop narrow spectrum drugs that are highly specific to a particular organism. Comparative genomics helps to find protein families that are widely taxonomically dispersed and those that are unique to a particular organism.</p>
<p align="justify">For example, when we want to develop a broad spectrum antibiotic, we are looking for targets that are present in a large number of bacteria yet have no similar homologues in human. This means that the antibiotic will be effective against many bacteria killing them while causing no harm to the human. In order to determine the role our potential drug target plays in a particular disease mechanism we use DNA and protein chips. These chips can measure the amount of transcript or protein expressed by a cell at different times or in different states (healthy versus diseased).</p>
<p align="justify">Clustering algorithms are used to organise this expression data into different biologically relevant clusters. We can then compare the expression profiles from the diseased and healthy cells to help us understand the role our gene or protein plays in a disease process. All of these computational tools can help to compose a detailed picture about a protein family, its involvement in a disease process and its potential as a possible drug target.</p>
<p align="justify">Following on from the genomics explosion and the huge increase in the number of potential drug targets, there has been a move from the classical linear approach of drug discovery to a non linear and high throughput approach. The field of bioinformatics has become a major part of the drug discovery pipeline playing a key role for validating drug targets. By integrating data from many inter-related yet heterogeneous resources, bioinformatics can help in our understanding of complex biological processes and help improve drug discovery.</p>
<p align="justify"><strong><em>Source: </em></strong><em> 2can </em></p>
<p align="justify"><strong>Drug Design based on Bioinformatics Tools </strong></p>
<p align="justify">The processes of designing a new drug using bioinformatics tools have open a new area of research. However, computational techniques assist one in searching drug target and in designing drug in silco, but it takes long time and money. In order to design a new drug one need to follow the following path.</p>
<div>
<ul>
<li><strong>Identify Target Disease: </strong> One needs to know all about the disease and existing or traditional remedies. It is also important to look at very similar afflictions and their known treatments.<br />
Target identification alone is not sufficient in order to achieve a successful treatment of a disease. A real drug needs to be developed.This drug must influence the target protein in such a way that it does not interfere with normal metabolism. One way to achieve this is to block activity of the protein with a small molecule. Bioinformatics methods have been developed to virtually screen the target for compounds that bind and inhibit the protein. Another possibility is to find other proteins that regulate the activity of the target by binding and formiong a complex.</li>
<li><strong>Study Interesting Compounds: </strong> One needs to identify and study the lead compounds that have some activity against a disease. These may be only marginally useful and may have severe side effects. These compounds provide a starting point for refinement of the chemical structures.</li>
<li><strong>Detect the Molecular Bases for Disease: </strong>If it is known that a drug must bind to a particular spot on a particular protein or nucleotide then a drug can be tailor made to bind at that site. This is often modeled computationally using any of several different techniques. Traditionally, the primary way of determining what compounds would be tested computationally was provided by the researchers&#8217; understanding of molecular interactions. A second method is the brute force testing of large numbers of compounds from a database of available structures.</li>
<li><strong>Rational drug design techniques: </strong> These techniques attempt to reproduce the researchers&#8217; understanding of how to choose likely compounds built into a software package that is capable of modeling a very large number of compounds in an automated way. Many different algorithms have been used for this type of testing, many of which were adapted from artificial intelligence applications. The complexity of biological systems makes it very difficult to determine the structures of large biomolecules. Ideally experimentally determined (x-ray or NMR) structure is desired, but biomolecules are very difficult to crystallize.</li>
<li><strong>Refinement of compounds: </strong> Once you got a number of lead compounds have been found, computational and laboratory techniques have been very successful in refining the molecular structures to give a greater drug activity and fewer side effects. This is done both in the laboratory and computationally by examining the molecular structures to determine which aspects are responsible for both the drug activity and the side effects.</li>
<li><strong>Quantitative Structure Activity Relationships (QSAR): </strong>This computational technique should be used to detect the functional group in your compound in order to refine your drug. This can be done using QSAR that consists of computing every possible number that can describe a molecule then doing an enormous curve fit to find out which aspects of the molecule correlate well with the drug activity or side effect severity. This information can then be used to suggest new chemical modifications for synthesis and testing.</li>
<li><strong>Solubility of Molecule: </strong> One need to check whether the target molecule is water soluble or readily soluble in fatty tissue will affect what part of the body it becomes concentrated in. The ability to get a drug to the correct part of the body is an important factor in its potency. Ideally there is a continual exchange of information between the researchers doing QSAR studies, synthesis and testing. These techniques are frequently used and often very successful since they do not rely on knowing the biological basis of the disease which can be very difficult to determine.</li>
<li><strong>Drug Testing: </strong>Once a drug has been shown to be effective by an initial assay technique, much more testing must be done before it can be given to human patients. Animal testing is the primary type of testing at this stage. Eventually, the compounds, which are deemed suitable at this stage, are sent on to clinical trials. In the clinical trials, additional side effects may be found and human dosages are determined.<br />
<strong><em>Source: </em></strong><em>By Dr.G. P. S. Raghava, Institute of Microbial Technology Sector 39-A, Chandigarh, India . </em></li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-and-drug-discovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics- Sequence analysis</title>
		<link>http://bioinformatics.me/bioinformatics-sequence-analysis/</link>
		<comments>http://bioinformatics.me/bioinformatics-sequence-analysis/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 23:19:32 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-sequence-analysis/</guid>
		<description><![CDATA[
Sequence analysis is the application of Information Technologies to Molecular Biology. It deals with biological sequences, and processes them to extract significant information that may yield new insights and guidelines in the understanding of biological organisms
Basics for sequence analysis 
Proteins
A protein is typically built of a series of basic blocks called amino acids , chained [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/sequence_analysis_bioinformatics.gif" alt="sequence_analysis_bioinformatics" width="472" height="292" class="aligncenter size-full wp-image-377" /></p>
<p align="justify">Sequence analysis is the application of Information Technologies to Molecular Biology. It deals with biological sequences, and processes them to extract significant information that may yield new insights and guidelines in the understanding of biological organisms</p>
<p align="justify"><strong>Basics for sequence analysis </strong></p>
<h3>Proteins</h3>
<p>A protein is typically built of a series of basic blocks called amino acids , chained together in a linear sequence of blocks. Amino acids may come in a variety of shapes and properties: they may be small or bulky, hidrophobic or hidrophyllic, electrically charged or neutral, etc&#8230; hence allowing for very complex shapes and interactions to be produced.</p>
<p align="justify">Amino acids are commonly referred to by name or by an abbreviation, usually in three or one letter. This allows for more efficient descriptions of how they are chained together to build a protein:</p>
<div>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td>
<p align="center">Neutral-Nonpolar</p>
</td>
<td>
<p align="center">3-letter</p>
</td>
<td>
<p align="center">1-letter</p>
</td>
</tr>
<tr>
<td>
<p align="center">Glycine</p>
</td>
<td>
<p align="center">Gly</p>
</td>
<td>
<p align="center">G</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Alanine</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">A</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Valine</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">V</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Isoleucine</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">I</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Leucine</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">L</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Phenylalanine</p>
</td>
<td>
<p align="center">Phe</p>
</td>
<td>
<p align="center">F</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Proline</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">P</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Methionine</p>
</td>
<td>
<p align="center">Met</p>
</td>
<td>
<p align="center">M</p>
</td>
</tr>
<tr>
<td>
<p align="center">Neutral-Polar</p>
</td>
<td>
<p align="center">
</td>
<td>
<p align="center">
</td>
</tr>
<tr>
<td>
<p align="center">L-Serine</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">S</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Threonine</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">T</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Tyrosine</p>
</td>
<td>
<p align="center">Tyr</p>
</td>
<td>
<p align="center">Y</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Tryptophan</p>
</td>
<td>
<p align="center">Trp</p>
</td>
<td>
<p align="center">W</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Asparagine</p>
</td>
<td>
<p align="center">Asn</p>
</td>
<td>
<p align="center">N</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Glutamine</p>
</td>
<td>
<p align="center">Gln</p>
</td>
<td>
<p align="center">Q</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Cysteine</p>
</td>
<td>
<p align="center">Cys</p>
</td>
<td>
<p align="center">C</p>
</td>
</tr>
<tr>
<td>
<p align="center">Acidic</p>
</td>
<td>
<p align="center">
</td>
<td>
<p align="center">
</td>
</tr>
<tr>
<td>
<p align="center">L-Aspartic</p>
</td>
<td>
<p align="center">Asp</p>
</td>
<td>
<p align="center">D</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Glutamic</p>
</td>
<td>
<p align="center">Glu</p>
</td>
<td>
<p align="center">E</p>
</td>
</tr>
<tr>
<td>
<p align="center">Basic</p>
</td>
<td>
<p align="center">
</td>
<td>
<p align="center">
</td>
</tr>
<tr>
<td>
<p align="center">L-Lysine</p>
</td>
<td>
<p align="center">Lys</p>
</td>
<td>
<p align="center">K</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Arginine</p>
</td>
<td>
<p align="center">Arg</p>
</td>
<td>
<p align="center">R</p>
</td>
</tr>
<tr>
<td>
<p align="center">L-Histidine</p>
</td>
<td>
<p align="center">His</p>
</td>
<td>
<p align="center">H</p>
</td>
</tr>
</tbody>
</table>
</div>
<h3>Nucleic Acids</h3>
<p align="justify">For them the number of basic building blocks is a lot smaller, each nucleic acid chain being composed of series of only four possible different nucleotides which furthermore provide for a very limited set of interactions.</p>
<p align="justify">Nucleic acids come in two flavors: DNA (DeoxyriboNucleic Acid) and RNA (RiboNucleic Acid). Both of them consist of a series of nucleotides that are glued one after the other to constitute the sequence of blocks that make up the functional chain.</p>
<p align="justify">Nucleotides are composed of a phosphate group, a sugar (ribose in RNA, and deoxyribose in DNA) and a base which marks the specific difference among nucleotides. The base may be one of guanine, cytosine, adenine and thymine in the case of DNA or guanine, cytosine, adenine or uracil for RNA. They can be referred to by their one letter abbreviations G, C, A, T and U. Interactions are mainly driven by the stablishment of hydrogen bonds, which can only be established among thymine (or uracil) and adenine (two hydrogen bonds) and cytosine and guanine (three hydrogen bonds).</p>
<p align="justify">As we said previously, the main role of nucleic acids is to convey all the genetic information needed to make proteins and control the building process. Protein sequences are coded by nucleic acids using groups three of nucleotides that code for a given amino acid: the code is more or less universal with little exceptions, and includes redundancy to increase the fidelity of the reading process when making duplicates or translating the information:</p>
<div>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td>
<p align="center">UUU</p>
</td>
<td>
<p align="center">Phe</p>
</td>
<td>
<p align="center">UCU</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAU</p>
</td>
<td>
<p align="center">Tyr</p>
</td>
<td>
<p align="center">UGU</p>
</td>
<td>
<p align="center">Cys</p>
</td>
</tr>
<tr>
<td>
<p align="center">UUC</p>
</td>
<td>
<p align="center">Phe</p>
</td>
<td>
<p align="center">UCC</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAC</p>
</td>
<td>
<p align="center">Tyr</p>
</td>
<td>
<p align="center">UGC</p>
</td>
<td>
<p align="center">Cys</p>
</td>
</tr>
<tr>
<td>
<p align="center">UUA</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">UCA</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAA</p>
</td>
<td>
<p align="center">Stop</p>
</td>
<td>
<p align="center">UGA</p>
</td>
<td>
<p align="center">Stop</p>
</td>
</tr>
<tr>
<td>
<p align="center">UUG</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">UCG</p>
</td>
<td>
<p align="center">Ser</p>
</td>
<td>
<p align="center">UAG</p>
</td>
<td>
<p align="center">Stop</p>
</td>
<td>
<p align="center">UGG</p>
</td>
<td>
<p align="center">Trp</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUU</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCU</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAU</p>
</td>
<td>
<p align="center">His</p>
</td>
<td>
<p align="center">CGU</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUC</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCC</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAC</p>
</td>
<td>
<p align="center">His</p>
</td>
<td>
<p align="center">CGC</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUA</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCA</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAA</p>
</td>
<td>
<p align="center">Gln</p>
</td>
<td>
<p align="center">CGA</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">CUG</p>
</td>
<td>
<p align="center">Leu</p>
</td>
<td>
<p align="center">CCG</p>
</td>
<td>
<p align="center">Pro</p>
</td>
<td>
<p align="center">CAG</p>
</td>
<td>
<p align="center">Gln</p>
</td>
<td>
<p align="center">CGG</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUU</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">ACU</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAU</p>
</td>
<td>
<p align="center">Asn</p>
</td>
<td>
<p align="center">AGU</p>
</td>
<td>
<p align="center">Ser</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUC</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">ACC</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAC</p>
</td>
<td>
<p align="center">Asn</p>
</td>
<td>
<p align="center">AGC</p>
</td>
<td>
<p align="center">Ser</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUA</p>
</td>
<td>
<p align="center">Ile</p>
</td>
<td>
<p align="center">ACA</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAA</p>
</td>
<td>
<p align="center">Lys</p>
</td>
<td>
<p align="center">AGA</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">AUG</p>
</td>
<td>
<p align="center">Met</p>
</td>
<td>
<p align="center">ACG</p>
</td>
<td>
<p align="center">Thr</p>
</td>
<td>
<p align="center">AAG</p>
</td>
<td>
<p align="center">Lys</p>
</td>
<td>
<p align="center">AGG</p>
</td>
<td>
<p align="center">Arg</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUU</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">GCU</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAU</p>
</td>
<td>
<p align="center">Asp</p>
</td>
<td>
<p align="center">GGU</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUC</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">GCC</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAC</p>
</td>
<td>
<p align="center">Asp</p>
</td>
<td>
<p align="center">GGC</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUA</p>
</td>
<td>
<p align="center">Val</p>
</td>
<td>
<p align="center">GCA</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAA</p>
</td>
<td>
<p align="center">Glu</p>
</td>
<td>
<p align="center">GGA</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
<tr>
<td>
<p align="center">GUG</p>
</td>
<td>
<p align="center">Val*</p>
</td>
<td>
<p align="center">GCG</p>
</td>
<td>
<p align="center">Ala</p>
</td>
<td>
<p align="center">GAG</p>
</td>
<td>
<p align="center">Glu</p>
</td>
<td>
<p align="center">GGG</p>
</td>
<td>
<p align="center">Gly</p>
</td>
</tr>
</tbody>
</table>
<table border="0" cellpadding="0">
<tbody>
<tr>
<td>* GUG may also code for the initiator Met. This triplet is therefore &#8220;ambiguous&#8221;.</td>
</tr>
</tbody>
</table>
</div>
<p align="justify">Regulation of expression is encoded as specific patterns that are to be recognized by the translation machinery under appropriate circumstances.</p>
<p align="justify"><strong>Sequence databases:</strong></p>
<p align="justify">For overview of database &#8211; <a href="http://www.geocities.com/bioinformaticsweb/data.html">click here </a></p>
<p align="justify">For complete sequence database listing &#8211; <a href="http://www.geocities.com/bioinformaticsweb/datalink.html">click here </a></p>
<p align="justify"><strong>Overview of sequence analysis tools</strong></p>
<p align="justify"><a>Sequence Comparison </a></p>
<p align="justify">An alignment is an arrangement of two sequences, which shows where the two sequences are similar, and where they differ. An optimal alignment, of course, is one that exhibits the most similarities, and the least differences. Broadly, there are three categories of methods for sequence comparison.</p>
<p align="justify">•  <strong>Segment methods </strong> compare all overlapping segments of a predetermined length (e.g., 10 amino acids) from one sequence to all segments from the other. This is the approach used in dotplots.</p>
<p align="justify">•  <strong>Optimal global alignment </strong> methods allow the best overall score for the comparison of the two sequences to be obtained, including a consideration of gaps. These programs align sequences over their whole length.</p>
<p align="justify">•  <strong>Optimal local alignment </strong> algorithms seek to identify the best local similarities between two sequences also including explicit consideration of gaps. Alignment may only be over a short span of sequence.</p>
<p align="justify">
<h3><a>Dotplots </a></h3>
<p align="justify">The most intuitive representation of the comparison between two sequences is using dotplots. One sequence is represented on each axis and significant matching regions are distributed along diagonals in the matrix.</p>
<p align="justify">There are two different algorithms that are commonly used in creating dotplots. The first method involves matching identical regions of sequence and plotting a dot in these areas. The second involves using &#8220;sliding windows&#8221; to compare two sequences using a threshold score ` <a href="http://portal.rfcgr.mrc.ac.uk/Courses/Jemboss_3day/Footnotes/Chapt5Fd0e1627.html">* </a>&#8216; value. A window size is selected as a run of adjacent nucleotide or amino acid residues, and a score chosen to reflect the degree of similarity of sequence required. Each window of sequence A is compared to each window of sequence B, and a dot is only placed in that region if the match scores or exceeds the set threshold level.</p>
<p align="justify"><strong>Online tool links: </strong></p>
<p align="justify"><a href="http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html" target="_blank">Dotlet Programme </a></p>
<p align="justify"><a href="http://www.isrec.isb-sib.ch/java/dotlet/dotlet_examples.html" target="_blank">Learn dotlet by example</a></p>
<p align="justify"><a><strong>Sequence alignment </strong></a></p>
<p align="justify">The algorithms we will be using are more rigorous than those used for searching databases; so even if you have retrieved a sequence from a database using something like <strong>BLAST. </strong> The basic idea behind the sequence alignment programs is to align the two sequences in such a way as to produce the highest score &#8211; a scoring matrix is used to add points to the score for each match and subtract them for each mismatch. The matrices commonly used for scoring protein alignments are more complex than the simple match/mismatch matrices used for DNA sequences such as the one we saw earlier; the scores that form the protein matrices are designed to reflect similarity between the different amino acids rather than simply scoring identities. Over time various mutations occur in sequences; the scoring matrices attempt to cope with mutations, but insertions and deletions require some extra parameters to allow the introduction of gaps in the alignment. There are penalties both for the creation of gaps and for the extension of existing ones; the default gap parameters given in alignment programs have been found to be empirically correct with test sequences but you should experiment with different gap penalties.</p>
<h3><a>BLAST </a></h3>
<p align="justify"><strong>BLAST </strong> (Basic Local Alignment Search Tool) is a heuristic method to find the highest scoring locally optimal alignments between a query sequence and a database. Previous versions of <strong>BLAST </strong> did not allow gapped alignments, but <strong>BLAST2 </strong> (from the HGMP-RC telnet and www menus) does. A gapped <strong>BLAST </strong> search allows gaps (deletions and insertions) to be introduced into the alignments that are returned. Allowing gaps means that similar regions are not broken into several segments. The scoring of these gapped alignments tends to reflect biological relationships more closely.</p>
<p align="justify">The <strong>BLAST </strong> algorithm and family of programs rely on work on the statistics of local sequence alignments by Altschul et al[]. The statistics allow us to estimate the probability of obtaining an alignment with a particular score. The <strong>BLAST </strong> algorithm permits nearly all sequence matches above a cutoff ` <a href="http://portal.rfcgr.mrc.ac.uk/Courses/Jemboss_3day/Footnotes/Chapt6Fd0e2779.html">* </a>&#8216; to be located efficiently in a database.</p>
<p align="justify">The algorithm operates as follows:</p>
<p align="justify">•  <strong>BLAST </strong> scans the database for words (typically 3-mers for proteins) that score at least T (a designated threshold value) when aligned with a word in the query sequence &#8211; such aligned pairs are called hits.</p>
<p align="justify">•  If a second non-overlapping hit is found within a distance A of the first and on the same diagonal, the first hit is extended between the database and query sequences in both directions. Extension continues, scoring all the time, until the running score drops below the maximum score seen so far by a value X. The resulting local alignment is called an HSP (high-scoring segment pair) or MSP (maximum scoring segment pair).</p>
<p align="justify">•  If the alignment score of the HSP exceeds a given value Sg (the gapped score), then a gapped extension of the HSP is initiated.</p>
<p align="justify">Earlier versions of <strong>BLAST </strong> looked only for single hits and extended them all; however, the extensions did not incorporate gaps and thus missed some potentially interesting matches. The gapped extension currently used, takes much longer to execute, but speed is improved overall by the requirement for two non-overlapping close hits before the initial extension is triggered, and the value of Sg is chosen so that only about one extension is triggered per 50 database sequences.</p>
<p align="justify">These modifications to <strong>BLAST </strong> mean that it now runs three times faster than earlier versions and in trials it found more statistically significant alignments than the old <strong>BLAST </strong>.</p>
<h3><em>BLAST FAMILY OF PROGRAMS </em></h3>
<p align="justify">The <strong>BLAST </strong> family of programs allows all combinations of DNA or protein query sequences with searches against DNA or protein databases. (Most of the time use of these is behind an interface.)</p>
<p align="justify">•  <strong>blastp: </strong> compares an amino acid query sequence against a protein sequence database.</p>
<p align="justify">•  <strong>blastn: </strong> compares a nucleotide query sequence against a nucleotide sequence database.</p>
<p align="justify">•  <strong>blastx: </strong> compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.</p>
<p align="justify">•  <strong>tblastn: </strong> compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).</p>
<p align="justify">•  <strong>tblastx: </strong> compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.</p>
<p align="justify">•  <strong>PSI-Blast: </strong> Position-Specific Iterated <strong>BLAST </strong>. This is potentially a very sensitive method to pull out significant hits in a protein-protein database search. This first performs a gapped BLAST database search and then uses the information from any significant alignments returned to construct a position-specific score matrix, which replaces the query sequence for the next round of database searching. <strong>PSI-Blast </strong> may be iterated until no new significant alignments are found. We&#8217;ll look at this tomorrow when we do some protein analysis.</p>
<p align="justify"><strong>Online tool links: </strong></p>
<p align="justify"><a href="http://www.ncbi.nlm.nih.gov/BLAST/" target="_blank">NCBI &#8211; Blast </a></p>
<p align="justify"><a href="http://www.ebi.ac.uk/blast2/index.html" target="_blank">WU &#8211; Blast@ EBI </a></p>
<p align="justify">
<p align="justify"><a><strong>Global sequence alignment </strong></a></p>
<p align="justify">A global alignment is one that compares the two sequences over their entire lengths, and is appropriate for comparing sequences that are expected to share similarity over the whole length. The alignment maximises regions of similarity and minimises gaps using the scoring matrices and gap parameters provided to the program.</p>
<p align="justify"><strong>Online tool link: </strong></p>
<p align="justify"><a href="http://www.ebi.ac.uk/clustalw/index.html" target="_blank">Clustalw @ EBI </a></p>
<p align="justify"><a><strong>Local sequence alignment </strong></a></p>
<p align="justify">global sequence alignment algorithms align sequences over their entire lengths. You do need to think about whether that type of alignment makes sense for your sequences. For our example, where we expect each exon to be represented in the sequences and in the same order, it has worked well &#8211; however, how well do you think this approach would work with, for example, multidomain proteins that share one domain but not others, or sequences where there have been regions of duplication? A second comparison method, local alignment, searches for regions of local similarity and need not include the entire length of the sequences.</p>
<p align="justify"><strong>Online tool links: </strong></p>
<p align="justify"><a href="http://www.ebi.ac.uk/emboss/align/index.html" target="_blank">Pairwise alignment at EBI </a></p>
<p align="justify"><a href="http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi" target="_blank">Pairwise alignment at NCBI </a></p>
<h3><a>Protein Sequence Analyisis </a></h3>
<p align="justify">You can get a variety of clues by looking for patterns and motifs in your sequence:</p>
<p align="justify">•  These are often derived from multiple sequence alignments.</p>
<p align="justify">•  Conserved protein domains or regions can be very useful in trying to determine which protein family a sequence belongs to, catalytic sites, carbohydrate binding sites etc.</p>
<p align="justify">•  Various research groups have created their own databases and search tools; it might be worth using a variety of these.</p>
<h3><em>FIND HOMOLOGOUS ( PARALOGOUS AND ORTHOLOGOUS) SEQUENCES </em></h3>
<p align="justify">Using a database similarity search can give you a great deal of information:</p>
<p align="justify">•  Homologues may be well annotated and their function documented in the literature.</p>
<p align="justify">•  Simply comparing your sequence with homologues can tell you a lot.</p>
<p align="justify">•  Phylogenetic analysis may reveal evolutionary relationships between proteins and help you decide which family or super family a protein belongs to.</p>
<p align="justify">•  N.B. Be aware of convergent evolution.</p>
<h3><em>HAVING SOME IDEA OF STRUCTURE MAY HELP YOU PREDICT POSSIBLE FUNCTIONS </em></h3>
<p align="justify">Knowing the protein fold(s) together with conserved domains (or even residues) may tell you what type of functions this protein could have.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-sequence-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Genomic glossary</title>
		<link>http://bioinformatics.me/genomic-glossary/</link>
		<comments>http://bioinformatics.me/genomic-glossary/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 23:12:47 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/genomic-glossary/</guid>
		<description><![CDATA[
biological atlas:  Maps describing different aspects of protein function should be compiled into a &#8220;biological atlas&#8221; By integrating the information contained in the atlas, increasingly meaningful biological hypotheses could be formulated.
cDNA maps:  Shows the locations of expressed DNA regions (exons) on the chromosomal map. Because they represent expressed genomic regions, cDNAs are thought [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-367" src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/bioinformatics_glossary.jpeg" alt="bioinformatics_glossary" width="600" height="345" /></p>
<p align="justify"><strong>biological atlas: </strong> Maps describing different aspects of protein function should be compiled into a &#8220;biological atlas&#8221; By integrating the information contained in the atlas, increasingly meaningful biological hypotheses could be formulated.</p>
<p align="justify"><strong>cDNA maps: </strong> Shows the locations of expressed DNA regions (exons) on the chromosomal map. Because they represent expressed genomic regions, cDNAs are thought to identify the parts of the genome with the most biological and medical significance. A cDNA map can provide the chromosomal location for genes whose functions are currently unknown. For disease- gene hunters, the map can also suggest a set of candidate genes to test when the approximate location of a disease gene has been mapped by genetic linkage techniques.</p>
<p align="justify"><strong>cell mapping: </strong> The determination of the subcellular location of proteins and of protein- protein interactions by the purification of organelles or protein complexes followed by mass- spectrometric identification of the components. Most proteins are thought to exist in the cell not as free entities but as part of ï¿½cellular machines&#8217; which perform cellular functions cooperatively. Systematic identification of protein complexes would permit these machines to be defined and allow ï¿½physical maps&#8217; to be created for a variety of cell types and states. Such information is of great value for the assignment of protein function.</p>
<p align="justify"><strong>cell maps: </strong> A cell map specifies the proteins that constitute a given organelle within a given cell type. Cell maps for normal and diseased cells can be constructed which give insight into the role proteins have in disease and can guide the drug development process.</p>
<p align="justify"><strong>chromosomal maps: </strong> Genes or other identifiable DNA fragments are assigned to their respective chromosomes, with distances measured in base pairs. These markers can be physically associated with particular bands (identified by cytogenetic staining) primarily by in situ hybridization, a technique that involves tagging the DNA marker with an observable label (e.g., one that fluoresces or is radioactive). The location of the labeled probe can be detected after it binds to its complementary DNA [cDNA] strand in an intact chromosome.</p>
<p align="justify"><strong>chromosome mapping: </strong>Any method used for determining the location of and relative distances between genes on a chromosome.</p>
<p align="justify"><strong>clone-based maps: </strong> The physical map of the human genome published by Nature is a clone- based physical map of 3.2 gigabases (25 times larger than any previously mapped genome). This approach involved generating an overlapping series of clones for the whole genome. With a fingerprinted BAC map clones could be selected for sequencing ensuring comprehensive coverage of the genome.</p>
<p align="justify"><strong>comparative genome mapping: </strong> Comparative genome mapping in the sequence-based era: early experience with human chromosome 7</p>
<p align="justify"><strong>contig mapping </strong>: Overlapping of cloned or sequenced DNA to construct a continuous region of a gene, chromosome or genome.</p>
<p align="justify"><strong>contig maps </strong>: Contig maps are important because they provide the ability to study a complete, and often large segment of the genome by examining a series of overlapping clones which then provide an unbroken succession of information about that region.</p>
<p align="justify">The bottom- up approach involves cutting the chromosome into small pieces, each of which is cloned and ordered. The ordered fragments form contiguous DNA blocks (contigs). Currently, the resulting library of clones varies in size from 10,000 bp to 1 Mb. An advantage of this approach is the accessibility of these stable clones to other researchers Contig construction can be verified by FISH [fluorescence in situ hybridization], which localizes cosmids to specific regions within chromosomal bands. Consist of a linked library of small overlapping clones representing a complete chromosomal segment. While useful for finding genes localized to a small area (under 2 Mb), contig maps are difficult to extend over large stretches of a chromosome because all regions are not clonable. DNA probe techniques can be used to fill in the gaps, but they are time consuming.</p>
<p align="justify"><strong>cosmid maps: </strong> &#8220;Constructing chromosome- and region-specific cosmid maps of the human genome&#8221;</p>
<p align="justify"><strong>cytogenetic maps: </strong> The visual appearance of a chromosome when stained and examined under a microscope. Particularly important are visually distinct regions, called light and dark bands, which give each of the chromosomes a unique appearance. This feature allows a person&#8217;s chromosomes to be studied in a clinical test known as a karyotype, which allows scientists to look for chromosomal alterations.</p>
<p align="justify"><strong>epitope mapping </strong>: Methods used for studying the interactions of antibodies with specific regions of protein antigens. Important applications of epitope mapping are found within the area of immunochemistry.</p>
<p align="justify"><strong>evolutionary genetics: </strong> Evolutionary study of genes has been purely theoretical, but it can provide useful information for guiding gene mapping. People are now finding, for example, that a lot of things are not true associations; instead, they are an artifact of association. You can make such mistakes when you are looking at two individuals who share a common ancestry. Understanding the phylogeny helps us, for example, understand horizontal gene transfer between microorganisms. For humans or other sexually reproducing organisms, the use of phylogenetic information improves resolution for making associations by helping to avoid type I errors &#8211; that is, finding an association that is actually merely due to sharing a recent common ancestor, or, in other words, being closely related.</p>
<p align="justify"><strong>expression imbalance map EIM </strong>: A new visualization method, for detecting mRNA expression imbalance regions, reflecting genomic losses and gains at a much higher resolution than conventional technologies such as comparative genomic hybridization (CGH). Simple spatial mapping of the microarray expression profiles on chromosomal location provides little information about genomic structure, because mRNA expression levels do not completely reflect genomic copy number and some microarray probes would be of low quality. The EIM, which does not employ arbitrary selection of thresholds in conjunction with hypergeometric distribution- based algorithm, has a high tolerance of these complex factors.</p>
<p align="justify"><strong>expression mapping: </strong> The creation of quantitative maps of protein expression from cell or tissue extracts, akin to the EST maps commercially available. This approach relies on 2D gel maps and image analysis, and opens up the possibility of studying cellular pathways and their perturbation by disease, drug action or other biological stimuli at the whole- proteome level ï¿½ Expression mapping is a valuable tool in the discovery of disease markers and its use in gaining information in toxicological and drug- action studies seems assured. It is unclear at present how successful this approach will be in elucidating cellular pathways and their importance in disease processes, and how much the precise measurement of protein levels matters when compared with the rough guide provided by the measurement of mRNA levels ï¿½ the ability to measure protein- level changes directly would seem to carry inherent advantages and it seems likely that expression proteomics will be a useful tool in drug target discovery and in studying the effects of various biological stimuli on the cell.</p>
<p align="justify"><strong>functional maps: </strong>In addition to the raw data, it will be important to design the proper visualization tools to graphically represent the functional relationships contained in different maps &#8230; Finally, it will be important to consider the possibility that functional maps need to be related back to particular tissues or even cell types.</p>
<p align="justify"><strong>gene mapping: </strong> Determination of the relative positions of genes on a DNA molecule (chromosome or plasmid) and of the distance, in linkage units or physical units, between them. [DOE]</p>
<p align="justify"><strong>genetic linkage map: </strong>Shows the relative locations of specific DNA markers along the chromosome. Any inherited physical or molecular characteristic that differs among individuals and is easily detectable in the laboratory is a potential genetic marker. [Primer on Molecular Genetics, Oak Ridge National Lab, US] <a href="http://www.geocities.com/bioinformaticsweb/genetic%20linkage%20map:%20Shows%20the%20relative%20locations%20of%20specific%20DNA%20markers%20along%20the%20chromosome.%20Any%20inherited%20physical%20or%20molecular%20characteristic%20that%20differs%20among%20individuals%20and%20is%20easily%20detectable%20in%20the%20laboratory%20is%20a%20potential%20genetic%20marker.%20%20%5BPrimer%20on%20Molecular%20Genetics,%20Oak%20Ridge%20National%20Lab,%20US%5D%20http://www.ornl.gov/hgmis/publicat/primer/prim2.html#1%20%20Related%20term%20linkage%20maps.%20">http://www.ornl.gov/hgmis/publicat/primer/prim2.html#1 </a> Related term linkage maps.</p>
<p align="justify"><strong>genetic maps: </strong> Also known as a linkage map. A chromosome map of a species that shows the position of its known genes and/ or markers relative to each other, rather than as specific physical points on each chromosome.</p>
<p align="justify">The value of the genetic map is that an inherited disease can be located on the map by following the inheritance of a DNA marker present in affected individuals (but absent in unaffected individuals), even though the molecular basis of the disease may not yet be understood nor the responsible gene identified. Genetic maps have been used to find the exact chromosomal location of several important disease genes, including cystic fibrosis, sickle cell disease, Tay- Sachs disease, fragile X syndrome, and myotonic dystrophy. [Primer on Molecular Genetics, Oak Ridge National Lab, US] <a href="http://www.ornl.gov/hgmis/publicat/primer/prim2.html#1%20">http://www.ornl.gov/hgmis/publicat/primer/prim2.html </a></p>
<p align="justify"><strong>genome control maps: </strong>Would identify all the components of the transcriptional machinery that have roles at any particular promoter and the contribution that specific components make to coordinate regulation of genes. The map will facilitate modeling of the molecular mechanisms that regulate gene expression and implicate components of the transcription apparatus in functional interactions with gene-specific regulators.</p>
<p align="justify"><strong>genome fingerprint map: </strong> The collection of all fingerprint clone contigs placed in a genome- wide map.</p>
<p align="justify"><strong>genome map: </strong> A reconstruction of the entire set of chromosomes for a given organism, showing the relative position of every gene.</p>
<p align="justify"><strong>genome scale metabolic maps: </strong>Annotated genomic data, along with legacy data on the cell&#8217;s biochemistry and physiology, can be used to construct genome- scale metabolic maps. The challenge now is to formulate reliable mathematical descriptions of the integrated function of these maps. It has proven difficult, if not impossible, to formulate detailed theory-based models of these genome-scale maps. An alternative approach that is data driven and constraints based will be described. It is an iterative model-building process.</p>
<p align="justify"><strong>genomic cartography: </strong> [Fry's] research focuses on methods of visualizing large amounts of data from dynamic information sources. The work uses ideas from distributed and adaptive systems to form organic representations that react and respond to the input data. This work is currently directed towards Genomic Cartography which is a study into new methods to represent the data found in the human genome.</p>
<p align="justify"><strong>genomic mapping: </strong>While a few technologies for functional analysis on a genomic basis are being developed at present, additional approaches and technologies for genomic interpretation that can be applied efficiently and economically at the level of an entire genome will be required for comprehensive analyses. Informatics will continue to play an important role in achieving all of these goals, as well as in ensuring the maintenance and accessibility of the forthcoming data. The development and application of new technologies for acquisition, management, analysis, and dissemination of genomic data are still required.</p>
<p align="justify"><strong>haplotype map: </strong> Francis Collins, director of the NHGRI, speaking at BIO 2001 (San Diego CA, US, June 2001) announced plans for a public- private effort to create a human haplotype map. Creators hope this so- called haplotype map will be a tool for pinning down the genes that contribute to the development of complex diseases such as cancer, diabetes, and mental illness.</p>
<p align="justify"><strong>haplotype mapping: </strong> Is often carried out as part of a genome scan. In a population isolate, the appearance of a rare Mendelian disease is almost always attributable to a single founder gene or mutation. The disease allele can be identified by searching for a common haplotype signature shared among patients. As the ancestral haplotype signature is passed from generation to generation, it is disrupted by recombination. Partial conservation of the haplotype signature in a patient strongly suggests that the disease locus resides in the conserved region of the haplotype.</p>
<p align="justify"><strong>high-resolution genetic maps: </strong>2-5 cM [centiMorgans]. Genetic mapping resolution has been increased through the application of recombinant DNA technology, including in vitro radiation- induced chromosome fragmentation and cell fusions (joining human cells with those of other species to form hybrid cells) to create panels of cells with specific and varied human chromosomal components.</p>
<p align="justify"><strong>high- resolution physical mapping: </strong> The two current approaches are termed top- down (producing a macrorestriction map) and bottom- up (resulting in a contig map). With either strategy the maps represent ordered sets of DNA fragments that are generated by cutting genomic DNA with restriction enzymes. The fragments are then amplified by cloning or by polymerase chain reaction (PCR) methods. Electrophoretic techniques are used to separate the fragments according to size into different bands, which can be visualized by direct DNA staining or by hybridization with DNA probes of interest. The use of purified chromosomes separated either by flow sorting from human cell lines or in hybrid cell lines allows a single chromosome to be mapped.</p>
<p align="justify"><strong>homology map: </strong> The Davis Human/ Mouse Homology Map, a table comparing genes in homologous segments of DNA from human and mouse sources, sorted by position in each genome. A total of 1793 loci are presented, most of which are genes. The authors did not include pseudogenes, members of multigene families where specific homology relationships could not be determined, nor any other genes for which homology was in doubt. In addition, for 568 of the loci there are provisional assignments of markers that link the homology map with that of the Gene Map of the Human Genome. . These links also provide a rough approximation of the position of markers in the Genethon linkage map. In constructing this table, the authors first ordered genes so as to best maintain order according to both human cytogenetic position and mouse genetic map position. Within these homologous regions, genes were ordered according to the mouse genetic mapping data.</p>
<p align="justify"><strong>International SNP Map Working Group: </strong> Identifies and localizes 1.42 millions SNPs in the human genome. ["A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms" International SNP Map Working Group Nature 409: 928- 933, 15 Feb. 2001] <a href="http://www.nature.com/cgi-taf/DynaPage.taf?%20">http://www.nature.com/cgi-taf </a></p>
<p align="justify"><strong>linkage disequilibrium: </strong>Evidence for linkage disequilibrium can be helpful in mapping disease genes since it suggests that the two [alleles] may be very close to one another</p>
<p align="justify"><strong>linkage maps: </strong> A map of the relative positions of genetic loci on a chromosome, determined on the basis of how often the loci are inherited together. Distance is measured in centimorgans (cM).</p>
<p align="justify"><strong>localizome mapping: </strong> One can imagine comprehensive mapping projects of the &#8220;localizome&#8221;, with the goal of recording not only where all proteins of a proteome can be found but also when.</p>
<p align="justify"><strong>locus: </strong>Any genomic site, whether functional or not, that can be mapped through formal genetic analysis.</p>
<p align="justify"><strong>macrorestriction map: </strong> Describes the order and distance between enzyme cutting (cleavage) sites &#8230; In top- down mapping, a single chromosome is cut (with rare- cutter restriction enzymes) into large pieces, which are ordered and subdivided; the smaller pieces are then mapped further. The resulting macro- restriction maps depict the order of and distance between sites at which rare- cutter enzymes cleave. This approach yields maps with more continuity and fewer gaps between fragments than contig maps, but map resolution is lower and may not be useful in finding particular genes; in addition, this strategy generally does not produce long stretches of mapped sites. Currently, this approach allows DNA pieces to be located in regions measuring about 100,000 bp to 1 Mb.</p>
<p align="justify"><strong>mapping: </strong> The determination of the relative positions of genes within the chromosomes or of restriction sites along a DNA molecule.The process of determining the position of a locus on the chromosome relative to other loci.</p>
<p align="justify"><strong>nucleotide mapping: </strong> Two- dimensional separation and analysis of nucleotides.</p>
<p align="justify"><strong>optical mapping: </strong>An enabling technology for whole genome analysis which involves the capture of individual DNA molecules, obtained directly from genomic DNA, followed by digestion in situ by selected restriction endonucleases. The resulting fragments are then visualized directly to produce detailed optical restriction maps. This methodology allows patterns of sequence variation to be detected across entire genomes, without the need for DNA amplification, and, unlike other genomewide scanning methods, provides detailed haplotype information by analyzing individual DNA molecules. OpGen will provide optical mapping services to three main markets ï¿½ i.e., genome sequencing projects, cancer diagnostics, and genetic association studies.</p>
<p align="justify"><strong>peptide mapping: </strong>Two- dimensional separation and analysis of peptides.The characteristic pattern of fragments formed by the separation of a mixture of peptides resulting from hydrolysis of a protein or peptide.</p>
<p align="justify"><strong>peptide maps: </strong> One way to research the structure of a protein is to seek to identify the specific sequence of amino acids which form the protein. Tr. at 202-3. An initial approach is to cut the protein into fragments, called peptides, using enzymes or chemicals which reliably divide the chain at predictable points. Tr. at 523. These fragments can be isolated and analyzed to create &#8220;peptide maps,&#8221; which show the pattern of pieces from these breaks as a unique &#8220;fingerprint.&#8221; &#8230; A peptide map does not disclose the precise order of the amino acids in a protein, although it may point to areas of difference between similar proteins.</p>
<p align="justify"><strong>phenome mapping: </strong> The conceptual matrix for a comprehensive &#8220;phenome&#8221; mapping project would be as follows: one axis represents all available knockouts while the other represents a large series of standardized phenotypes that can be screened.</p>
<p align="justify"><strong>phenome maps: </strong> Can be thought of as lists of similar phenotypes that could be referred to as &#8220;pheno- clusters&#8221;.</p>
<p align="justify"><strong>physical mapping: </strong>The procedure of physical mapping coarsely divides into two steps First large pieces of DNA (contigs), a library of cloned fragments are ordered according to their position in the genome. Different experimental techniques are used to do that. Roughly, these are clone- probe hybridization mapping, restriction mapping, radiation- hybrid mapping and optical mapping. &#8230; Second the cloned fragments are cut by restriction enzymes, smaller DNA fragments are obtained which are sequenced in detail (shotgun- sequencing), and the overall sequence in detail is obtained by Sequence Assembly.</p>
<p align="justify"><strong>physical maps: </strong> A map of the locations of identifiable landmarks on DNA (e.g. restriction enzyme cutting sites, genes) regardless of inheritance. Distance is measured in base pairs. For the human genome, the lowest- resolution physical map is the banding patterns on the 24 different chromosomes; the highest resolution map would be the complete nucleotide sequence of the chromosomes.</p>
<p align="justify">A chromosome map of a species that shows the specific physical locations of its genes and/ or markers on each chromosome. Physical maps are particularly important when searching for disease genes by positional cloning strategies and for DNA sequencing.</p>
<p align="justify"><strong>positional cloning: </strong> Requires a genetic map with a large number of markers (especially in the region of interest), and the use of physical mapping and DNA sequencing technologies to isolate and sequence the targeted gene.</p>
<p align="justify"><strong>protein expression map: </strong> Since 2D Electrophoresis gel patterns reveal not only the amounts of protein, but is unrivaled in its ability to detect post- translational modifications, the 2DE protein map provides much more relevant information about cellular dynamics than the corresponding expression map at the mRNA level. By comparing the 2DE gel patterns of samples exposed to different physiological conditions or different drug treatments it is possible to identify groups of proteins with related functions or whose expression is interdependent (expression proteomics).</p>
<p align="justify"><strong>Protein Expression Mapping PEM: </strong> Details the distribution and abundance of protein in specific samples, under defined physiological conditions. [CHI Proteomics] Quantitative study of global changes in protein expression in tissues, cells or body fluids using 2D gels and image analysis. Currently carried out by 2D gel electrophoresis, though alternatives are under investigation.</p>
<p align="justify"><strong>protein interaction maps: </strong> Hybrigenics&#8221; comprehensive protein interaction maps using automated yeast- two- hybrid methodology in pathogens and in cDNA of normal and diseased tissues.</p>
<p align="justify"><strong>protein linkage maps: </strong>With respect to a genome- wide use of the two- hybrid assay in the case of yeast, the goal is to find which proteins in the yeast genome interact with every other protein. This process would generate protein linkage maps, delineating large networks of interacting proteins. The approximately 6,000 yeast proteins can potentially interact in 18 million pairwise combinations.</p>
<p align="justify"><strong>proteome map: </strong> A number of organizations have announced plans to produce a map of the proteome, including Myriad Genetics, Large Scale Biology, CuraGen and others.</p>
<p align="justify"><strong>QTL mapping Quantitative Trait Loci mapping: </strong> A phenotype driven approach to gene function. As such it permits the discovery of new genes and can be contrasted with gene- driven approaches such knock-out and knock-in mice which allow for the study of known genes. QTL reflect natural genetic variations as they exist in the mouse strains under study. We are limited to detecting those genes that vary among the available strains. However the natural variations among mouse strains are vast and largely untapped.</p>
<p align="justify"><strong>RFLP (Restriction Fragment Length Polymorphism): </strong> See Genetic variations glossary Polymorphic sequences that result in RFLPs are used as markers on both physical maps and genetic linkage maps. RFLPs are usually caused by mutation at a cutting site.</p>
<p align="justify"><strong>Radiation Hybrid RH maps: </strong> Chromosome maps that are calculated from RH score vectors. An RH score vector is the pattern of assay results of a particular STS (marker) on a particular panel. The vector consists of 1&#8217;s (did amplify) and 0&#8217;s (did not amplify). Simplistically speaking, the more similar two score vectors are, the closer the markers are on the chromosome.</p>
<p align="justify"><strong>radiation hybrid mapping: </strong> A method for ordering genetic loci along CHROMOSOMES. The method involves fusing irradiated donor cells with host cells from another species. Following cell fusion, fragments of DNA from the irradiated cells become integrated into the chromosomes of the host cells. Molecular probing of DNA obtained from the fused cells is used to determine if two or more genetic loci are located within the same fragment of donor cell DNA.</p>
<p align="justify"><strong>restriction map: </strong>A description of restriction endonuclease cleavage sites within a piece of DNA. Generating such a map is usually the first step in characterizing an unknown DNA, and a prerequisite to manipulating it for other purposes. Typically, restriction enzymes that cleave DNA infrequently (e.g. those with 6 bp recognition sites) and are relatively inexpensive are used to produce at a map.</p>
<p align="justify"><strong>restriction mapping: </strong> Use of restriction endonucleases to analyze and generate a physical map of genomes, genes, or other segments of DNA.</p>
<p align="justify"><strong>SNP maps: </strong> A collection of SNPs that can be superimposed over the existing genome map, creating greater detail and facilitating further genetic studies.</p>
<p align="justify">Current estimates indicate that a very dense marker map (30,000 &#8211; 1,000,000 variants) would be required to perform haplotype &#8211; based association studies. We have constructed a SNP map of the human genome with sufficient density to study human haplotype structure, enabling future study of human medical and population genetics.</p>
<p align="justify"><strong>telomere maps: </strong> Telomeres are the tips of the chromosomes. They are crucial in maintaining the chromosomes&#8217; stability and are important in the cell cycle and ageing. Because of the way the physical maps are constructed, many telomeres of chromosomes are left out.</p>
<p align="justify">transcript maps: In only a year or two, most human genes will be sequence- tagged and placed on various physical maps. Such a ï¿½transcript map&#8217; (or ï¿½expression map&#8217;) of the genome will be an important part of the sequencing infrastructure, as well as a critical resource for the positional candidate approach to gene cloning. One of the specific goals of the US Human Genome Project is the construction of a high resolution STS map of the genome. .. One of the early problems with gene- based STSs was that there simply were not enough unique human gene sequences to bother with. But all of that changed with the advent of EST sequencing, at which time several groups began mapping ESTs albeit on a limited scale and only to the resolution of a chromosome assignment.</p>
<p align="justify"><strong>transcriptome maps: </strong> Consist of &#8220;expression clusters&#8221; of co-regulated genes. Challenges ahead for computational biology include the integration of clusters obtained for the transcriptome, the interactome, the phenome, and the localizome.</p>
<p align="justify"><strong>whole genome clone- based maps: </strong>In their paper, the International Human Genome Mapping Consortium describe how they constructed the first whole- genome physical map, how they created the templates from which the genome was sequenced and demonstrated how the map was essential for the accurate assembly of the human genome by the publicly funded effort. Four short reports accompanying the whole- genome mapping paper (Bruls; Bentley; Kucherlaparti; Page), describe alternative mapping strategies that were implemented for chromosomes 12, 14 and Y, as well as a host of other chromosomes. Information from all these papers were integrated into the whole- genome paper and demonstrate how a rich resource of mapping information can be generated by the cooperation of international independent efforts.</p>
<p align="justify"><strong>YAC maps: </strong>Yeast artificial chromosome maps, a type of physical map.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/genomic-glossary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics Glossary</title>
		<link>http://bioinformatics.me/bioinformatics-glossary-2/</link>
		<comments>http://bioinformatics.me/bioinformatics-glossary-2/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 23:08:23 +0000</pubDate>
		<dc:creator>Waleed Ghalwash</dc:creator>
				<category><![CDATA[Bioinformatics Web]]></category>

		<guid isPermaLink="false">http://bioinformatics.me/bioinformatics-glossary-2/</guid>
		<description><![CDATA[
N 
Naked DNA 
Pure, isolated DNA devoid of any proteins that may bind to it.
NCEs (New Chemical Entity) 
Compounds identified as potential drugs that are sent from research and development into clinical trials to determine their suitability . 
Nested PCR 
The second round amplification of an already PCR-amplified sequence using a new pair of primers [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-367" src="http://www.bioinformaticsweb.org/wp-content/uploads/2009/07/bioinformatics_glossary.jpeg" alt="bioinformatics_glossary" width="600" height="345" /></p>
<p align="center"><strong>N </strong></p>
<p align="justify"><strong>Naked DNA </strong></p>
<p align="justify">Pure, isolated DNA devoid of any proteins that may bind to it.</p>
<p align="justify"><strong>NCEs (New Chemical Entity) </strong></p>
<p align="justify">Compounds identified as potential drugs that are sent from research and development into clinical trials to determine their suitability <strong>. </strong></p>
<p align="justify"><strong>Nested PCR </strong></p>
<p align="justify">The second round amplification of an already PCR-amplified sequence using a new pair of primers which are internal to the original primers. Typically done when a single PCR reaction generates insufficient amounts of product.</p>
<p align="justify"><strong>Neural net </strong></p>
<p align="justify">A neural net is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal brain. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns. Neural nets are used in bioinformatics to map data and make predictions, such as taking a multiple alignment of a protein family as a training set in order to identify novel members of the family from their sequence data alone.</p>
<p align="justify"><strong>Nonsense mutation </strong></p>
<p align="justify">A point mutation in which a codon specific for an amino-acid is converted into a nonsense codon.</p>
<p align="justify"><strong>Northern blotting </strong></p>
<p align="justify">A technique to identify RNA molecules by hybridization that is analogous to Southern blotting (see Southern blotting).</p>
<p align="justify"><strong>Nuclease </strong></p>
<p align="justify">Any enzyme that can cleave the phosphodiester bonds of nucleic acid backbones.</p>
<p align="justify"><strong>Nucleoside </strong></p>
<p align="justify">A five-carbon sugar covalently attached to a nitrogen base.</p>
<p align="justify"><strong>Nucleotide </strong></p>
<p align="justify">A nucleic acid unit composed of a five carbon sugar joined to a phosphate group and a nitrogen base.</p>
<p align="center"><strong>O </strong></p>
<p align="justify"><strong>Object-Relational Database </strong></p>
<p align="justify">Object databases combine the elements of object orientation and object-oriented programming languages with database capabilities. They provide more than persistent storage of programming language objects. Object databases extend the functionality of object programming languages (e.g., C++, Smalltalk, or Java) to provide full-featured database programming capability. The result is a high level of congruence between the data model for the application and the data model of the database.  Object-relational databases are used in Bioinformatics to map molecular biological objects (such as sequences, structures, maps and pathways) to their underlying representations (typically within the rows and columns of relational database tables.) This enables the user to deal with the biological objects in a more intuitive manner, as they would in the laboratory, without having to worry about the underlying data model of their representation.</p>
<p align="justify"><strong>Oligonucleotide </strong></p>
<p align="justify">A short molecule consisting of several linked nucleotides (typically between 10 and 60) covalently attached by phosphodiester bonds.</p>
<p align="justify"><strong>Open reading frame (ORF) </strong></p>
<p align="justify">Any stretch of DNA that potentially encodes a protein. Open reading frames start with a start codon, and end with a termination codon. No termination codons may be present internally. The identification of an ORF is the first indication that a segment of DNA may be part of a functional gene.</p>
<p align="justify"><strong>Operator </strong></p>
<p align="justify">A segment of DNA that interacts with the products of regulatory genes and facilitates the transcription of one or more structural genes.</p>
<p align="justify"><strong>Operon </strong></p>
<p align="justify">A unit of transcription consisting of one or more structural genes, an operator, and a promoter.</p>
<p align="justify"><strong>Ortholog </strong></p>
<p align="justify">Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. (See also Paralogs.)</p>
<p align="justify"><strong>Overlapping clones </strong></p>
<p align="justify">Collection of cloned sequences made by generating randomly overlapping DNA fragments with infrequently cutting restriction enzymes.</p>
<p align="center"><strong>P </strong></p>
<p align="justify"><strong>Palindrome </strong></p>
<p align="justify">A region of DNA with a symmetrical arrangement of bases occuring about a single point such that the base sequences on either side of that point are identical (if the strands are both read in the same direction) e.g 5&#8242; GAATTC 3&#8242; whose complementary sequence is 3&#8242; CTTAAG 5&#8242;.</p>
<p align="justify"><strong>Pattern </strong></p>
<p align="justify">Molecular biological patterns usually occur at the level of the characters making up the gene or protein sequence. A pattern language must be defined in order to apply different criteria to different positions of a sequence. In order to have position-specific comparison done by a computer, a pattern-matching algorithm must allow alternative residues at a given position, repetitions of a residue, exclusion of alternative residues, weighting, and ideally, combinatorial representation.</p>
<p align="justify"><strong>Pathways </strong></p>
<p align="justify">Bioinformatics strives to define representations of key biological datatypes, algorithms and inference procedures, including sequences, structures, biological pathways and reactions. Representing and computing with biological pathways requires ontologies for representing pathway knowledge; User interfaces to these databases; Physico-chemical properties of enzymes and their substrates in pathways; And pathway analysis of whole genomes including identifying common patterns across species and species differences.</p>
<p align="justify"><strong>Paralog </strong><br />
Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.</p>
<p align="justify"><strong>Parameters </strong></p>
<p align="justify">Parameters are user-selectable values, typically experimentally determined, that govern the boundaries of an algorithm or program. For instance, selection of the appropriate input parameters governs the success of a search algorithm. Some of the most common search parameters in bioinformatics tools include the stringency of an alignment search tool, and the weights (penalties) provided for mismatches and gaps.</p>
<p align="justify"><strong>Peptide </strong></p>
<p align="justify">A short stretch of amino acids each covalently coupled by a peptide (amide) bond.</p>
<p align="justify"><strong>Peptide bond (amide bond) </strong></p>
<p align="justify">A covalent bond formed between two amino acids when the amino group of one is linked to the carboxy group of another (resulting in the elimination of one water molecule).</p>
<p align="justify"><strong>Phage (Bacteriophage) </strong></p>
<p align="justify">A virus that infects bacterial cells and serves as a useful vector for introducing genes into bacteria for a number of purposes.</p>
<p align="justify"><strong>Phage display </strong></p>
<p align="justify">A technique in which phage are engineered to fuse a foreign peptide or protein with their capsid (surface) proteins and hence display it on their cell surfaces. The immobilized phage may then be used as a screen to see what ligands bind to the expressed fusion protein exhibited (displayed) on the phage surface.</p>
<p align="justify"><strong>Pharmacogenomics </strong></p>
<p align="justify">The use of (DNA-based) genotyping in order to target pharmaceutical agents to specific patient populations. Genetic differences are known to affect responses to many types of drug therapy, and pharmacogenomics analysis serves to customize the use of pharmaceuticals for specific subgroups of patients.The rationale for this approach is that observed gene expression differences may correlate with, and explain, the differences in side effects and efficacy to drugs in humans.</p>
<p align="justify"><strong>Pharmacophore </strong></p>
<p align="justify">The three dimensional spatial arrangment of atoms, substituents, functional groups, or chemical features that together are sufficient to describe the pharmacologically active components of a drug molecule or molecule series.</p>
<p align="justify"><strong>Phenotype </strong></p>
<p align="justify">Any observable feature of an organism that is the result of one or more genes.</p>
<p align="justify"><strong>Phylum </strong></p>
<p align="justify">The segmentation of the animal kingdom into about 30 major groups collectively known as phyla. The members of each phylum share the same basic structure and organization. For instance, fish, birds, and human beings belong to one phylum &#8211; the Chordata &#8211; because all have spinal cords.</p>
<p align="justify"><strong>Physical map </strong></p>
<p align="justify">A physical map consists of a linearly ordered set of DNA fragments encompassing the genome or region of interest. Physical maps are of two types, macro-restriction maps and ordered clone maps. The former consists of an ordered set of large DNA fragments generated by using restriction enzymes whose recognition sequences are infrequently represented in the genome. An ordered clone map consists of an overlapping collection of cloned DNA fragments. The DNA may be cloned into any one of the available vector systems&#8211;YACs, cosmids, phage, or even plasmids. Major advantages of ordered clone<br />
maps are that they are of high resolution and directly provide the clones for further study.</p>
<p align="justify"><strong>Plasmid </strong></p>
<p align="justify">Any replicating DNA element that can exist in the cell independently of the chromosomes. Synthetic plasmids are used for DNA cloning. Most commonly found in bacterial cells.</p>
<p align="justify"><strong>Pleitropy </strong></p>
<p align="justify">The multiple effects on an organism&#8217;s phenotype due to a single gene or allele e.g the cytokines which can bind to multiple cellular receptors and effect growth and multiple immune pathways.</p>
<p align="justify"><strong>Point mutation </strong></p>
<p align="justify">A mutation in which a single nucleotide in a DNA sequence is substituted by another nucleotide.</p>
<p align="justify"><strong>Poly(A) tail </strong></p>
<p align="justify">The stretch of Adenine (A) residues at the 3&#8242; end of eukaryotic mRNA that is added to the pre-mRNA as it is processed, before its transport from the nucleus to the cytoplasm and subsequent translation at the ribosome.</p>
<p align="justify"><strong>Polyadenylation site </strong></p>
<p align="justify">A site on the 3&#8242;-end of messenger RNA (mRNA) that signals the addition of a series of Adenines during the RNA processing step and before the mRNA migrates to the cytoplasm.  These so-called poly(A) &#8220;tails&#8221; increase mRNA stability andallow one to isolate mRNA from cells by PCR-amplification using poly(T) primers.</p>
<p align="justify"><strong>Polygenic inheritance </strong></p>
<p align="justify">Inheritance involving alleles at many genetic loci.</p>
<p align="justify"><strong>Polymerase chain reaction (PCR ) </strong></p>
<p align="justify">Technique used to amplify or generate large amounts of replica DNA of a segment of any DNA whose &#8220;flanking&#8221; sequences are known. Oligonucleotide primers which bind these flanking sequences are used by an enzyme (Taq polymerase) to copy the sequence in between the primers. Cycles of heat to break apart the DNA strands, cooling to allow the primers to bind, and heating again to allow the enzyme to copy the intervening sequence lead to a doubling of DNA at each cycle. The reactions are typically carried out on a regulated heating block and consist of 30-35 cycles of repeated amplification of all the DNA present. Single molecules of &#8220;target&#8221; DNA can be amplified to microgram amounts of DNA. The target DNA can be of any origin.</p>
<p align="justify"><strong>Polymorphism </strong></p>
<p align="justify">(lit. many forms) The existence of a gene in a population in at least two different forms at a frequency far higher than that attributable to recurrent mutation alone. Variations in a population may be measured by determining the rate of mutation in polymorphic genes (see SNPs).</p>
<p align="justify"><strong>Polypeptide </strong></p>
<p align="justify">A single chain of covalently attached amino acids joined by peptide bonds. Polypeptide chains usually fold into a compact, stable form (a domain) that is part (or all) of the final protein.</p>
<p align="justify"><strong>Positional cloning </strong></p>
<p align="justify">Method used to define the location of a gene on a chromosome and use this information to identify and clone the gene. The location of the gene is determined by linkage analysis of DNA from a large family containing afflicted and normal members to identify linkages between the transmission of the disease gene and observable genetic markers. This information is then used to screen (by chromosomal jumping and walking) the location for putative genes. The disease gene must be compared between the afflicted and normal family members and be shown to be different in the two groups. The full sequencing of the gene will then provide information regarding the characteristics and function of the gene product, and a potential explanation for the cause of the disease.</p>
<p align="justify"><strong>Post-transcriptional modification </strong></p>
<p align="justify">Alterations made to pre-mRNA before it leaves the nucleus and becomes mature mRNA.</p>
<p align="justify"><strong>Post-translational modification </strong></p>
<p align="justify">Alterations made to a protein after its synthesis at the ribosome. These modifications, such as the addition of carbohydrate or fatty acid chains, may be critical to the function of the protein.</p>
<p align="justify"><strong>Primary sequence (protein) </strong></p>
<p align="justify">The linear sequence of a polypeptide or protein.</p>
<p align="justify"><strong>Primary structure (protein) </strong></p>
<p align="justify">see primary sequence.</p>
<p align="justify"><strong>Primer </strong></p>
<p align="justify">A short oligonucleotide that provides a free 3&#8242; hydroxyl for DNA or RNA synthesis by the appropriate polymerase (DNA polymerase or RNA polymerase).</p>
<p align="justify"><strong>Probe </strong></p>
<p align="justify">Any biochemical that is labelled or tagged in some way so that it can be used to identify or isolate a gene, RNA, or protein.</p>
<p align="justify"><strong>Profile </strong></p>
<p align="justify">Sequence profiles are usually derived from multiple alignments of sequences with a known relationship, and consist of tables of position-specific scores and gap-penalties. Each position in the profile contains scores for all of the possible amino acids, as well as one penalty score for opening and one for continuing a gap at the specified position. Attempts have been made to further improve the sensitivity of the profile by refining the procedures to construct a profile starting from a given multiple alignment. Other representations for sequence domains or motifs do not necessarily require the presence of a correct and complete multiple alignment, such as hidden Markov models.</p>
<p align="justify"><strong>Prokaryote </strong></p>
<p align="justify">An organism or cell that lacks a membrane-bounded nucleus. Bacteria and blue-green algae are the only surviving prokaryotes (cf. Eukaryote).</p>
<p align="justify"><strong>Promoter (site) </strong></p>
<p align="justify">A promoter site is defined by its recognition by eukaryotic RNA polymerase II; its activity in a higher eukaryote; by experimentally evidence, or homology and sufficient similarity to an experimentally defined promoter; and by observed biological function.</p>
<p align="justify"><strong>Protein families </strong></p>
<p align="justify">Sets of proteins that share a common evolutionary origin reflected by their relatedness in function which is usually reflected by similarities in sequence, or in primary, secondary or tertiary structure. Subsets of proteins with related structure and function.</p>
<p align="justify"><strong>Proteome </strong></p>
<p align="justify">The entire protein complement of a given organism.</p>
<p align="justify"><strong>Proteomics </strong></p>
<p align="justify">The study of the proteome. Typically, the cataloging of all the expressed proteins in a particular cell or tissue type, obtained by identifying the proteins from cell extracts using a combination of 2D gel electrophoresis and mass spectrometry. The large scale analysis of the protein composition and function. (cf genomics)</p>
<p align="justify"><strong>Purine </strong></p>
<p align="justify">A nitrogen-containing compound with a double-ring structure. The parent compound of Adenine and Guanine.</p>
<p align="justify"><strong>Pyrimidine </strong></p>
<p align="justify">A nitrogen-containing compound with a single six-membered ring structure. The parent compound of Thymidine and Cytosine.</p>
<p align="justify">
<p align="center"><strong>Q </strong></p>
<p align="justify"><strong>Query (sequence) </strong></p>
<p align="justify">A DNA, RNA of protein sequence used to search a sequence database in order to identify close or remote family members (homologs) of known function, or sequences with similar active sites or regions (analogs), from whom the function of the query may be deduced.</p>
<p align="center"><strong>R </strong></p>
<p align="justify"><strong>Rational drug design (Structure based drug design) </strong></p>
<p align="justify">The development of drugs based on the 3-dimensional molecular structure of a particular target.</p>
<p align="justify"><strong>Reading frame </strong></p>
<p align="justify">A sequence of codons beginning with an intiation codon and ending with a termination codon, typically of at least 150 bases (50 amino acids) coding for a polypeptide or protein chain (see ORF and URF).</p>
<p align="justify"><strong>Reagents </strong></p>
<p align="justify">Sources of biological or chemical material that can be used as the starting blocks in laboratory experiments. Reagents can range from chemicals needed to perform a particular chemical reaction, constituents of a laboratory protocol, or clones to be used in a large-scale gene expression study.</p>
<p align="justify"><strong>Recessive </strong></p>
<p align="justify">Any trait that is expressed phenotypically only when present on both alleles of a gene (cf dominant).</p>
<p align="justify"><strong>Recombinant DNA (rDNA) </strong></p>
<p align="justify">DNA molecules resulting from the fusion of DNA from different sources. The technology employed for splicing DNA from different sources and for amplifying the resultant heterogenous DNA.</p>
<p align="justify"><strong>Recombination </strong></p>
<p align="justify">A new combination of alleles resulting from the rearrangement occuring by crossing-over or by independent assortment (see crossing over).</p>
<p align="justify"><strong>Recursion </strong></p>
<p align="justify">An algorithmic procedure whereby an algorithm calls on itself to perform a calculation until the result exceeds a threshold, in which case the algorithm exits. Recursion is a powerful procedure with which to process data and is computationally quite efficient.</p>
<p align="justify"><strong>Regulatory gene </strong></p>
<p align="justify">A DNA sequence that functions to control the expression of other genes by producing a protein that modulates the synthesis of their products (typically by binding to the gene promoter). (cf. Structural gene).</p>
<p align="justify"><strong>Relational Database </strong></p>
<p align="justify">A database that follows E. F. Codd&#8217;s 11 rules, a series of mathematical and logical steps for the organization and systemization of data into a software system that allows easy retrieval, updating, and expansion. An RDBMS stores data in a database consisting of one or more tables of rows and columns. The rows correspond to a record (tuple); the columns correspond to attributes (fields) in the record. In an RDBMS, a view, defined as a subset of the database that is the result of the evaluation of a query, is a table. RDBMSs use Structured Query Language (SQL) for data definition, data management, and data access and retrieval. Relational and object-relational databases are used extensively in bioinformatics to store sequence and other biological data.</p>
<p align="justify"><strong>Relational Database Management Systems (RDBMS) </strong></p>
<p align="justify">A software system that includes a database architecture, query language, and data loading and updating tools and other ancillary software that together allow the creation of a relational database application.</p>
<p align="justify"><strong>Repeats (repeat sequences) </strong></p>
<p align="justify">Repeat sequences and approximate repeats occur throughout the DNA of higher organisms (mammals). For example, the <em>Alu </em> sequences of length about 300 characters, appear hundreds of thousands of times in Human DNA with about 87% homology to a consensus <em>Alu </em> string. Some short substrings such as TATA-boxes, poly-A and (TG)* also appear more often than by chance. Repeat sequences may also occur within genes, as mutations or alterations to those genes. Repetitive sequences, especially mobile elements, have many applications in genetic research. DNA transposons and retroposons are routinely used for insertional mutagenesis, gene mapping, gene tagging, and gene transfer in several model systems.</p>
<p align="justify"><strong>Repetitive elements </strong></p>
<p align="justify">Repetitive elements provide important clues about chromosome dynamics, evolutionary forces, and mechanisms for exchange of genetic information between organisms The most ubiquitous class of repetitive elements in the DNA sequence in primate genomes is the <em>Alu </em> family of interspersed repeats which have arisen in the last 65 million years of evolution <em>Alu </em> repeats belong to a class of sequences defined as short interspersed elements (SINEs). Approximately 500,000 <em>Alu </em> SINEs exist within the human genome, representing about 5% of the genome by mass.</p>
<p align="justify"><strong>Replication </strong></p>
<p align="justify">The synthesis of an informationally identical macromolecule (e.g. DNA) from a template molecule.</p>
<p align="justify"><strong>Repressor </strong></p>
<p align="justify">The protein product of a regulatory gene that combines with a specific operator (regulatory DNA sequence) and hence blocks the transcription of genes in an operon.</p>
<p align="justify"><strong>Restriction enzyme (restriction endonuclease) </strong></p>
<p align="justify">A type of enzyme that recognizes specific DNA sequences (usually palindromic sequences 4, 6, 8 or 16 base pairs in length) and produces cuts on both strands of DNA containing those sequences only. The &#8220;molecular scissors&#8221; of rDNA technology.</p>
<p align="justify"><strong>Restriction fragment length polymorphisms (RFLPs) </strong></p>
<p align="justify">Variation within the DNA sequences of organisms of a given species that can be identified by fragmenting the sequences using restriction enzymes, since the variation lies within the restriction site. RFLPs can be used to measure the diversity of a gene in a population.</p>
<p align="justify"><strong>Restriction map </strong></p>
<p align="justify">A physical map or depiction of a gene (or genome) derived by ordering overlapping restriction fragments produced by digestion of the DNA with a number of restriction enzymes.</p>
<p align="justify"><strong>Reverse Genetics </strong></p>
<p align="justify">The use of protein information to elucidate the genetic sequence encoding that protein. Used to describe the process of gene isolation starting with a panel of afflicted patients (see positional cloning) <strong>. </strong></p>
<p align="justify"><strong>Reverse transcriptase </strong></p>
<p align="justify">A DNA polymerase that can synthesise a complementary DNA (cDNA) strand using RNA as a template &#8211; a so-called RNA-dependent DNA polymerase.</p>
<p align="justify"><strong>Reverse transcriptase-PCR (RT-PCR) </strong></p>
<p align="justify">Procedure in which PCR amplification is carried out on DNA that is first generated by the conversion of mRNA to cDNA using reverse transcriptase.</p>
<p align="justify"><strong>Ribonucleic acid (RNA) </strong></p>
<p align="justify">A category of nucleic acids in which the component sugar is ribose and consisting of the four nucleotides Thymidine, Uracil, Guanine, and Adenine. The three types of RNA are messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA).</p>
<p align="center"><strong>S </strong></p>
<p align="justify"><strong>Secondary structure (protein) </strong></p>
<p align="justify">The organization of the peptide backbone of a protein that occurs as a result of hydrogen bonds e.g alpha helix, Beta pleated sheet.</p>
<p align="justify"><strong>Selectivity </strong></p>
<p align="justify">Selectivity of bioinformatics similarity search algorithms is defined as the significance threshold for reporting database sequence matches. As an example, for BLAST searches, the parameter E is interpreted as the upper bound on the expected frequency of chance occurrence of a match within the context of the entire database search.  E may be thought of as the number of matches one expects to observe by chance alone during the database search.</p>
<p align="justify"><strong>Sense strand </strong></p>
<p align="justify">The strand of double-stranded DNA that acts as the template strand for RNA synthesis. Typically only one gene product is produced per gene, reading from the sense strand only. (Some viruses have open reading frames in both the sense and the antisense strands).</p>
<p align="justify"><strong>Sensitivity </strong></p>
<p align="justify">Sensitivity of bioinformatics similarity search algorithms centers around two areas: First, how well can the method detect biologically meaningful relationships between two related sequences in the presence of mutations and sequencing errors; Secondly how does the heuristic nature of the algorithm affect the probability that a matching sequence will not be detected. At the user&#8217;s discretion, the speed of most similarity search programs can be sacrificed in exchange for greater sensitivity &#8211; with an emphasis on detecting lower scoring matches.</p>
<p align="justify"><strong>Sequence Tagged Site (STS) </strong></p>
<p align="justify">A unique sequence from a known chromosomal location that can be amplified by PCR. STSs act as physical markers for genomic mapping and cloning.</p>
<p align="justify"><strong>Sexual PCR (Molecular Diversity) </strong></p>
<p align="justify">Sexual PCR is a form of PCR in which similar, but not identical, DNA sequences are reassembled to obtain novel juxtapositions, simulating the result of genetic recombination. The result is the creation of an array of related genes which may possess improved characteristics. By repeated rounds of recombination, selection and PCR-based amplification vastly improved gene-products, such as enzymes with greater activity, may be generated and selected.</p>
<p align="justify"><strong>Shotgun cloning </strong></p>
<p align="justify">The cloning of an entire gene segment or genome by generating a random set of fragments using restriction endonucleases to create a gene library that can be subsequently mapped and sequenced to reconstruct the entire genome.</p>
<p align="justify"><strong>Similarity (homology) search </strong></p>
<p align="justify">Given a newly sequenced gene, there are two main approaches to the prediction of structure and function from the amino acid sequence. Homology methods are the most powerful and are based on the detection of significant extended sequence similarity to a protein of known structure, or of a sequence pattern characteristic of a protein family. Statistical methods are less successful but more general and are based on the derivation of structural preference values for single residues, pairs of residues, short oligopeptides or short sequence patterns. The transfer of structure/function information to a potentially homologous protein is straightforward when the sequence similarity is high and extended in length, but the assessment of the structural significance of sequence similarity can be difficult when sequence similarity is weak or restricted to a short region.</p>
<p align="justify"><strong>Signal sequence (leader sequence) </strong></p>
<p align="justify">A short sequence added to the amino-terminal end of a polypeptide chain that forms an amphipathic helix allowing the nascent polypeptide to migrate through membranes such as the endoplasmic reticulum or the cell membrane. It is cleaved from the polypeptide after the protein has crossed the membrane.</p>
<p align="justify"><strong>Single nucleotide polymorphisms (SNPs) </strong></p>
<p align="justify">Variations of single base pairs scattered throughout the human genome that serve as measures of the genetic diversity in humans. About 1 million SNPs are estimated to be present in the human genome, and SNPs are useful markers for gene mapping studies.</p>
<p align="justify"><strong>Single-pass sequencing </strong></p>
<p align="justify">Rapid sequencing of large segments of the genome of an organism by isolating as many expressed (cDNA) sequences as possible and performing single sequencer runs on their 5&#8242; or 3&#8242; ends. Single-pass sequencing typically results in individual, error-prone sequencing reads of 400-700 bases, depending on the type of sequencer used. However, if many of these are generated from numerous clones from different tissues, they may be overlapped and assembled to remove the errors and generate a contiguous sequence for the entire expressed gene.</p>
<p align="justify"><strong>Site </strong></p>
<p align="justify">Sites in sequences can be located either in DNA (e.g. binding sites, cleavage sites) or in proteins. In order to identify a site in DNA, ambiguity symbols are used to allow several different symbols at one position. Proteins, however, need a different mechanism (see Pattern). Restriction enzyme cleavage sites, for instance, have the following properties:  limited length (typically, less than 20 base pairs); definition of the cleavage site and its appearance (3&#8242;, 5&#8242; overhang or blunt); definition of the binding site.</p>
<p align="justify"><strong>Southern blotting </strong></p>
<p align="justify">A procedure for the identification of DNA by transmitting a fragment isolated on an agarose gel to a nitrocellulose filter where it can be hybridized with a complementary &#8220;probe&#8221; sequence.</p>
<p align="justify"><strong>Splice site </strong></p>
<p align="justify">The sequence found at the 5&#8242; and 3&#8242; region of exon/intron boundaries, usually defined by a consensus sequence:</p>
<p align="justify"><em>Intron </em></p>
<p align="justify">5&#8242; CAGGTAAGT&#8212;&#8212;&#8212;TNCAGG 3&#8242;</p>
<p align="justify">A G C T</p>
<p align="justify">N represents any nucleotide; the bottom line represents alternative nucleotides at the indicated positions.</p>
<p align="justify"><strong>Splice form </strong></p>
<p align="justify">By using alternative splicing, a single message precursor from DNA can generate an entire family of mRNAs and proteins. This can be utilized to create specificity in cell-cell or cell-ligand interactions. A cell may produce a given protein, but it will be a different splice-form of the protein than that produced by an adjacent cell. In this manner, the two cells have the potential to interact differently with other cells or molecules. Two places where this has been extremely important is in the production of cell-surface specificity proteins in the immune and nervous systems.</p>
<p align="justify"><strong>Splicing </strong></p>
<p align="justify">The joining together of separate DNA or RNA component parts. For example, RNA splicing in eukaryotes involves the removal of introns and the stitching together of the exons from the pre-mRNA transcript before maturation.</p>
<p align="justify"><strong>Solvent accessibility </strong></p>
<p align="justify">The surface area (typically measured in square angstroms) of a biological molecule, usually a protein, that is exposed to solvent in its native, folded form. Determining the solvent accessibility of a protein helps define which amino acids in its molecular sequence are on the exterior of the molecule, and thus available to participate in interactions with other molecules.</p>
<p align="justify"><strong>Structural gene </strong></p>
<p align="justify">Gene which encodes a structural protein (cf. Regulatory gene).</p>
<p align="justify"><strong>Structure prediction </strong></p>
<p align="justify">Algorithms that predict the secondary, tertiary and sometimes even quarternary structure of proteins from their sequences.  Determining protein structure from sequence has been dubbed &#8220;the second half of the Genetic Code&#8221; since it is the folded tertiary structure of a protein that governs how it functions as a gene product.  As yet most structure prediction methods are only partially successful, and typically work best for certain well-defined classes of proteins.</p>
<p align="justify"><strong>Substitution matrix </strong></p>
<p align="justify">A model of protein evolution at the sequence level resulting in the development of a set of widely used substitution matrices. These are frequently called Dayhoff, MDM (Mutation Data Matrix), BLOSUM or PAM (Percent Accepted Mutation) matrices. They are derived from global alignments of closely related sequences.  Matrices for greater evolutionary distances are extrapolated from those for lesser ones.</p>
<p align="justify"><strong>Subtraction library </strong></p>
<p align="justify">A cDNA library that only contains cDNAs uniquely expressed in a given cell or tissue. e.g T cells and B cells will express many common RNAs, as well as a very small percentage which will be unique for T cells and B cells respectively. To make a T cell subtraction library, the cDNA from a T cell library is hybridized with a vast excess of B cell RNA. The commonly expressed genes will result in RNA-cDNA hybrids which can be removed (or subtracted) to leave only T cell specific cDNAs.</p>
<p align="center"><strong>T </strong></p>
<p align="justify"><strong>Tentative Consensus (TC) </strong></p>
<p align="justify">The identification of a sequence from an EST cluster that represents part or all of a complete gene.  TCs are usually determined by clustering ESTs allowing for sequencing errors, artefacts such as chimeric clones, and naturally occuring biological phenomena such as alternative splicing.  Creation of a cluster allows one to generate a consensus sequence and then identify a long open reading frame which would suggest the possibility of that consensus representing a <em>bona fide </em> gene.</p>
<p align="justify"><strong>Tentative Human Consensus sequences (THCs) </strong></p>
<p align="justify">A consensus sequence generated from human EST fragments. THCs may be validated by comparison against databases of known human gene sequences, human genomic sequences, or by identification of the ORFs or other sequence features contained within the consensus as belonging to a known human gene product.</p>
<p align="justify"><strong>Tertiary structure </strong></p>
<p align="justify">Folding of a protein chain via interactions of its sideschain molecules including formation of disulphide bonds between cysteine residues.</p>
<p align="justify"><strong>Thymine </strong></p>
<p align="justify">A pyrimidine base found in DNA but not in RNA.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="justify"><strong>Transcript </strong></p>
<p align="justify">The single-stranded mRNA chain that is assembled from a gene template.</p>
<p align="justify"><strong>Transcription </strong></p>
<p align="justify">The assembly of complementary single-stranded RNA on a DNA template.</p>
<p align="justify"><strong>Transcription factors </strong></p>
<p align="justify">A group of regulatory proteins that are required for transcription in eukaryotes. Transcription factors bind to the promoter region of a gene and facilitate transcription by RNA polymerase.</p>
<p align="justify"><strong>Transfer RNA (tRNA) </strong></p>
<p align="justify">A small RNA molecule that recognizes a specific amino acid, transports it to a specific codon in the mRNA, and positions it properly in the nascent polypeptide chain.</p>
<p align="justify"><strong>Transformation </strong></p>
<p align="justify">A genetic alteration to a cell as a result of the incorporation of DNA from a genetically diferent cell or virus; can also refer to the introduction of DNA into bacterial cells for genetic manipulation.</p>
<p align="justify"><strong>Transgene </strong></p>
<p align="justify">A foreign gene that is introduced into a cell or whole organism (eg.transgenic mice) for therapeutic or experimental purposes.</p>
<p align="justify"><strong>Translation </strong></p>
<p align="justify">The process of converting RNA to protein by the assembly of a polypeptide chain from an mRNA molecule at the ribosome.</p>
<p align="justify"><strong>Transmembrane region </strong></p>
<p align="justify">The region of a transmembrane protein that actually spans the membrane.  Transmembrane regions are usually hydrophobic in order to be thermodynamically compatible with the lipid bilayer portion of the membrane.  They may consist of either alpha-helical or beta-strand secondary structure elements, but in either case the external residues (the ones facing the membrane) are invariably hydrophobic while the internal residues may be hydrophilic (as in the case of a pore or channel) or polar.  One common transmembrane structural domain is the seven-helix bundle seen in numerous channel proteins.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="justify"><strong>Tentative Consensus (TC) </strong></p>
<p align="justify">The identification of a sequence from an EST cluster that represents part or all of a complete gene.  TCs are usually determined by clustering ESTs allowing for sequencing errors, artefacts such as chimeric clones, and naturally occuring biological phenomena such as alternative splicing.  Creation of a cluster allows one to generate a consensus sequence and then identify a long open reading frame which would suggest the possibility of that consensus representing a <em>bona fide </em> gene.</p>
<p align="justify"><strong>Tentative Human Consensus sequences (THCs) </strong></p>
<p align="justify">A consensus sequence generated from human EST fragments. THCs may be validated by comparison against databases of known human gene sequences, human genomic sequences, or by identification of the ORFs or other sequence features contained within the consensus as belonging to a known human gene product.</p>
<p align="justify"><strong>Tertiary structure </strong></p>
<p align="justify">Folding of a protein chain via interactions of its sideschain molecules including formation of disulphide bonds between cysteine residues.</p>
<p align="justify"><strong>Thymine </strong></p>
<p align="justify">A pyrimidine base found in DNA but not in RNA.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="justify"><strong>Transcript </strong></p>
<p align="justify">The single-stranded mRNA chain that is assembled from a gene template.</p>
<p align="justify"><strong>Transcription </strong></p>
<p align="justify">The assembly of complementary single-stranded RNA on a DNA template.</p>
<p align="justify"><strong>Transcription factors </strong></p>
<p align="justify">A group of regulatory proteins that are required for transcription in eukaryotes. Transcription factors bind to the promoter region of a gene and facilitate transcription by RNA polymerase.</p>
<p align="justify"><strong>Transfer RNA (tRNA) </strong></p>
<p align="justify">A small RNA molecule that recognizes a specific amino acid, transports it to a specific codon in the mRNA, and positions it properly in the nascent polypeptide chain.</p>
<p align="justify"><strong>Transformation </strong></p>
<p align="justify">A genetic alteration to a cell as a result of the incorporation of DNA from a genetically diferent cell or virus; can also refer to the introduction of DNA into bacterial cells for genetic manipulation.</p>
<p align="justify"><strong>Transgene </strong></p>
<p align="justify">A foreign gene that is introduced into a cell or whole organism (eg.transgenic mice) for therapeutic or experimental purposes.</p>
<p align="justify"><strong>Translation </strong></p>
<p align="justify">The process of converting RNA to protein by the assembly of a polypeptide chain from an mRNA molecule at the ribosome.</p>
<p align="justify"><strong>Transmembrane region </strong></p>
<p align="justify">The region of a transmembrane protein that actually spans the membrane.  Transmembrane regions are usually hydrophobic in order to be thermodynamically compatible with the lipid bilayer portion of the membrane.  They may consist of either alpha-helical or beta-strand secondary structure elements, but in either case the external residues (the ones facing the membrane) are invariably hydrophobic while the internal residues may be hydrophilic (as in the case of a pore or channel) or polar.  One common transmembrane structural domain is the seven-helix bundle seen in numerous channel proteins.</p>
<p align="justify"><strong>Tissue </strong></p>
<p align="justify">Section of an organ that consists of a largely homogenous population of cell types. Since many organs are multifunctional, they have developed highly specialized cell types to perform different functions. Identifying the section of an organ that is homogenous for a particular cell type ensures that the gene expression profiles extracted from those cells will accurately resemble the class of cells that make up the tissue.</p>
<p align="center"><strong>U </strong></p>
<p align="justify"><strong>Unidentified reading frame (URF) </strong></p>
<p align="justify">An open reading frame encoding a protein of undefined function <strong>. </strong></p>
<p align="justify"><strong>Uracil </strong></p>
<p align="justify">Nitrogenous pyrimidine base found in RNA but not DNA.</p>
<p align="justify">
<p align="justify">
<p align="justify"><strong>Variable numbers of tandem repeats (VNTRs) </strong></p>
<p align="justify">DNA sequence blocks of 2-60 base pairs which are repeated from two to more than 20 times in different individuals. This polymorphism makes VNTRs very useful DNA markers used in genomic mapping, linkage analysis and also DNA fingerprinting.</p>
<p align="justify"><strong>Variation (genetic) </strong></p>
<p align="justify">Variation in genetic sequences and the detection of DNA sequence variants genome-wide allow studies relating the distribution of sequence variation to a population history. This in turn allows one to determine the density of SNPS or other markers needed for gene mapping studies.  Quantitation of these variations together with analytical tools for studying sequence variation also relate genetic variations to phenotype.</p>
<p align="justify"><strong>Vector </strong></p>
<p align="justify">Any agent that transfers material (typically DNA) from one host to another. Typically DNA vectors are autonomous DNA elements (such as plasmids) that can be manipulated and integrated into a host&#8217;s DNA or recombinant viruses.</p>
<p align="justify"><strong>Virtual libraries </strong></p>
<p align="justify">The creation and storage of vast collections of molecular structures in an electronic database. These databases may be queried for subsets that exhibit specific physicochemical features, or may be &#8220;virtually screened&#8221; for their ability to bind a drug target. This process may be performed prior to the synthesis and testing of the molecules themselves.</p>
<p align="justify"><strong>Visualization </strong></p>
<p align="justify">Visualization is the process of representing abstract scientific data as images that can aid in understanding the meaning of the data.</p>
<p align="center"><strong>V </strong></p>
<p align="justify"><strong>Weight matrix </strong></p>
<p align="justify">The density of binding sites in a gene or sequence can be used to derive a ratio of density for each element in a pattern of interest. The combined individual density ratios of all elements are then collectively used to build a scoring profile known as a weight matrix. This profile can be used to test the prediction of the identification of the selected pattern and the ability of the algorithm to discriminate them from non-pattern sequences.</p>
<p align="justify"><strong>Western blot </strong></p>
<p align="justify">Technique in which specific antibodies are used to identify their antigens from a mixture of proteins. Typically, these proteins mixtures are first separated by electrophoresis and then transfered onto nylon sheets by electrotransfer. Radiolabeled or enzyme-linked antibodies are incubated with the sheets and unbound antibodies washed away allowing the position of the bound antibody to be revealed by autoradiography or color which is formed upon addition of a substrate.</p>
<p align="justify"><strong>Wild type<br />
</strong>Form of a gene or allele that is considered the &#8220;standard&#8221; or most common.</p>
<p align="center"><span>X </span></p>
<p align="justify"><strong>X chromosome </strong></p>
<p align="justify">In mammals, the sex chromosome that is found in two copies in the homogametic sex (female in humans) and one copy in the hererogametic sex (male in humans).</p>
<p align="justify"><strong>Y </strong></p>
<p align="justify"><strong>Yeast 2-hybrid system </strong></p>
<p align="justify">A yeast-based method used to simultaneously identify, and clone the gene for, proteins interacting with a known protein. The basis of this method is a &#8220;transcriptional reporter assay&#8221; (see definition) in which reporter gene expression is dependent on two domains. The first domain is linked to the known protein. The second domain is genetically linked to a library. If the library is screened against the known protein the two domains will interact only if a protein from the library binds the known protein, resulting in transcription activation of the reporter gene, and a blue color. The &#8220;blue yeast clone&#8221; will contain the gene encoding the newly identified protein.</p>
<p align="center"><strong>Z </strong></p>
<p align="justify"><strong>Z-DNA </strong></p>
<p align="justify">A conformation of DNA existing as a left-handed double helix (the phosphate-sugar backbone forms a left-handed zig-zag course), which may play a role in gene regulation.</p>
<p align="justify"><strong>Zinc fingers </strong></p>
<p align="justify">A protein motif formed by the interaction of repeated cysteine and histidine residues with a zinc ion. The spacing of the repeats results in finger like arrangements of the protein loops formed from the interaction which interact with DNA. These motifs are typically found in transcription factors.</p>
]]></content:encoded>
			<wfw:commentRss>http://bioinformatics.me/bioinformatics-glossary-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

