The CompBio Dude: January 2009

Saturday, January 31, 2009

Google Talk: Current Issues in Computational Biology and Bioinformatics

Gary Bader, an Assistant Professor at the Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR) at the University of Toronto, gave a talk on Bioinformatics and Computational Biology featured in the Google Talk Talks series.

It's an intro talk, aimed at introducing the ideas of bioinformatics to people with a computational background.

Friday, January 30, 2009

Protein Wikis

Genome Biology has published an article: "Proteopedia - a scientific 'wiki' bridging the rift between three-dimensional structure and function of biomacromolecules"

New Media is making inroads to scientific publishing. The traditional model of scientific publishing is peer-review then publish, but the Wiki model is publish then peer-review. While it's not to the point that a Wiki-edit would show up on someone's publication list, Wiki style articles for high-throughput genomics/proteomics experiments make sense.
A protein structure used to be enough work and new research material to support an entire PhD dissertation, but now with high throughput protein structure determination pipelines like PSI, a more informal publishing structure is needed. Wiki's offer the ability for scientific notes to published for each of the catalogued protein structures, without having to do a publication for each and every one of the +50K protein structures stored in the Protein Data Bank.

There are a few different Protein Structure Wiki's that are getting started:

Proteopedia : Seeks to annotate known protein structures with Biologically relevant information.

Topsan : Sub-project of the Protein Structure Initiative and The Joint Center of Structural Genomics. Used to annotate proteins generated in high throughput Protein structure determination pipeline. Many of the targets were originally selected in batches and have no known biological information.

PDBWiki : Seems more geared toward discussion of the of the characteristics of the models themselves (ie density maps and collision errors)

Amazing DNA animation

From IO9

Drew Berry of the Walter and Eliza Hall Institute of Medical Research has produced a rather amazing video detailing the life of DNA. From nucleosome wrapping, to DNA replication, to Amino acid production, the animations provide a rather amazing view of the molecular processes behind life.

Tuesday, January 13, 2009

Online searches and drug combinations

Ars Technica has a nice writeup on a PLoS Computational Biology paper, "Search Algorithms as a Framework for the Optimization of Drug Combinations".

The abstract:

Combination therapies are often needed for effective clinical outcomes in the management of complex diseases, but presently they are generally based on empirical clinical experience. Here we suggest a novel application of search algorithms—originally developed for digital communication—modified to optimize combinations of therapeutic interventions. In biological experiments measuring the restoration of the decline with age in heart function and exercise capacity in Drosophila melanogaster, we found that search algorithms correctly identified optimal combinations of four drugs using only one-third of the tests performed in a fully factorial search. In experiments identifying combinations of three doses of up to six drugs for selective killing of human cancer cells, search algorithms resulted in a highly significant enrichment of selective combinations compared with random searches. In simulations using a network model of cell death, we found that the search algorithms identified the optimal combinations of 6–9 interventions in 80–90% of tests, compared with 15–30% for an equivalent random search. These findings suggest that modified search algorithms from information theory have the potential to enhance the discovery of novel therapeutic drug combinations. This report also helps to frame a biomedical problem that will benefit from an interdisciplinary effort and suggests a general strategy for its solution.

Monday, January 12, 2009

Personal Genomes

The price of SNP analysis based on DNA microarrays has fallen to the point to make a $400 test commercially viable. The New York Time ran an article about a reporter getting his personal genome analyzed. One of the themes that some have picked up on, is that currently this type of analysis is still a very nascent technology. Correlations between specific SNP markers and particular diseases have been suggested in literature, but we are still a long way from true understanding. The recent call to arms for improvements in Systems Biology research illustrates how much is left to be done.
The Personal Genome Project, which seeks to fully sequence the exomic content of 100,000 personal genomes and provide the data openly with correlated medical histories. Preliminarily information on the first ten subjects has been released. And while the actual sequencing data is not downloadable off the web site, they do encourage you to contact them if you are interested in reserch collaberations.

Nova recently aired a show about the project:

Top 25 most Dangerous Programming Errors

The SANS has published a list of the top 25 most dangerous programming errors. What does this have to do with computational biology? From my observation, it seems that software in compbio labs goes by the following time-line:

1) Research and Develop a new technique
2) Setup web-server before paper goes to print
3) Profit!!! (writing more grants)

The software is written during Research and Development of a new analytical technique. This means very little software design goes into its development. Once there is a working technique, a paper is written and a web server to provide the tool is setup. The web service is mostly just advertising for the paper and helps argue the point that the lab is worthy of more grant money. Very little time is actually spent on proper software engineering, and even less time on security analysis. This could turn a lot of compbio labs into rather soft targets for hackers.

Via Information week

Friday, January 09, 2009

Systems Biology is important, starting.... Now

Systems Biology is apparently important, at least that is what 110 scientists from Europe are saying. Science Daily, Genome Web Daily News, are reporting that scientist from the European Science Foundation have published a report entitled, "Advancing Systems Biology for Medical Applications" (SSA LSSG-CT-2006-037673). This paper stresses the importance of developing systems biology techniques for improving medicine.

For the layman, Systems Biology refers the system of biochemical interactions, both the core components and the complex network of reactions that occur between them. Its genius occurred around the time that the human genome project was completing and the total estimate of protein coding genes was rapidly plummeting from initial expectations. Given the complexity of the human body, predictions reached up to 150K. But after all was said and done, estimates pegged the number at around 20K. Only twenty thousand genes to make a human, and it takes 41 000 genes to make rice.

If the complexity didn't come from the total number of genes, then it came from the complex network of those genes interacting.

For the cynic, basically nothing has changed. Everyone already knew that systems biology was important and the next step in understanding complex organisms. All this is about is reminding politicians that if they want results, they better be willing to sign some rather large checks.
...Read more

Thursday, January 08, 2009

Grep'ing Green Genes by TaxonID

16s RNA is a component in the prokaryotic ribosomal system. It is necessary for survival, so it is very well conserved in prokaryotic genomes. It also has some 'hyper variable' regions that tend to mutate as a species evolves. Because of these two reasons it is a good marker for phylogenetic mapping. Green Genes is a project to provide a comprehensive database of sampled 16s sequences. Sometimes you want to start from a NCBI Taxon ID, for example from E. Coli, which has the NCBI Taxon ID code of 562 and obtain a list of associated 16s RNA sequences.

Start by obtaining a copy of the Green Genes database at http://greengenes.lbl.gov/Download/Sequence_Data/Greengenes_format/greengenes16SrRNAgenes.txt.gz

Assuming we have a list of taxon codes in a file 'taxon.list'

gunzip -c greengenes16SrRNAgenes.txt.gz | ./green_genes_taxon_grep.py taxon.list

'green_genes_taxon_grep.py' code:


#!/usr/bin/python


import sys
import re
import string


def get_fasta(title, seq):
 out_str = ">%s\n" % title
 for i in (range(0, len(seq)+1, 60)):
  out_str += "%s\n" % seq[i:i+60]
 return out_str


taxon_list = {}
taxon_file = open( sys.argv[1] )
for a in taxon_file.xreadlines():
 taxon_list[ string.rstrip(a) ] = 1
taxon_file.close()

file = sys.stdin

re_begin   = re.compile(r'^BEGIN')
re_end     = re.compile(r'^END')
re_seq     = re.compile(r'aligned_seq=(.*)')
re_ncbi_gi = re.compile(r'ncbi_gi=(.*)')
re_dot     = re.compile(r'[\.\-]')
re_taxon_id = re.compile( r'^ncbi_tax_id=(.*)' )
re_name     = re.compile(r'^prokMSAname=(.*)')
re_msa_id   = re.compile(r'^prokMSA_id=(.*)')
re_ncbi_acc = re.compile(r'^ncbi_acc_w_ver=(.*)')
report = 0
for a in file.xreadlines():

 if ( re_begin.search( a ) ):
  report = 0
 elif ( re_end.search( a ) ):
  if report:
   title_str = "%s %s %s" % (cur_msa_id, cur_ncbi_acc, cur_name)
   print get_fasta( title_str, cur_seq )
 elif ( re_seq.search( a ) ):
  cur_seq = re_seq.search( string.rstrip(a) ).group(1)
  cur_seq = re_dot.sub("", cur_seq )
 elif ( re_ncbi_gi.search( a ) ):
  cur_ncbi_gi = re_ncbi_gi.search( string.rstrip(a) ).group(1)
 elif ( re_name.search(a) ):
  cur_name = re_name.search( string.rstrip(a) ).group(1)
 elif ( re_msa_id.search( a ) ):
  cur_msa_id = re_msa_id.search( string.rstrip(a) ).group(1)
 elif ( re_ncbi_acc.search( a ) ):
  cur_ncbi_acc = re_ncbi_acc.search( string.rstrip(a) ).group(1)
 elif ( re_taxon_id.search( string.rstrip(a) ) ):
  taxon_id = re_taxon_id.search( string.rstrip(a) ).group(1)
  if taxon_list.has_key( taxon_id ):
   report = 1
   cur_taxon = taxon_id

Coding Organisms

Drew Endy, From MIT and OpenWetware, is featured on ForaTV, giving talk on designing organisms. We've previously mentioned his talks on genetic design.

Wednesday, January 07, 2009

Linux Watch: Open Discovery

Open Discovery is a Fedora 9 derived USB based distribution with open source Bioinformatics tools pre-installed.

It's nice that they are bundling all of this in a USB bootable distribution for all those Bioinformatians that prefer Windows. However, I'm curious why they chose to go for a whole new distribution rather then simple creating a new YUM repository, like RPM Fusion, that can be added to an existing standard Fedora install.

If you are interested Open Discovery includes:

Via Bioinformatics.org

...Read more

Homegrown Molecular Biology

From Yahoo News:

Using homemade lab equipment and the wealth of scientific knowledge available online, these hobbyists are trying to create new life forms through genetic engineering — a field long dominated by Ph.D.s toiling in university and corporate laboratories.

What a negative view of science.... I wouldn't call what I do toiling. Of course I work at a computer terminal, not in the wet lab.

Weird to think that the game year that the Nobel Prize is given out for the work done for Green Fluorescent Protein (GFP), you can use it for home projects.

But if you are interested in setting up a DNA lab like the one mentioned in the article, check out the projects mentioned in 'Make Magazine'

Volume 07

HMMER 3.0 Alpha Incoming

The Eddy's lab blog Cryptogenomicon has posted a note about the incoming HMMER 3.0 Alpha . Sounds like their hoping for a "won’t explode and kill you" alpha, but with claim like "HMMER is now about as fast as BLAST". This may be an alpha you want to get in on.

The alpha drops Monday Janurary 12th, 2009.

The CompBio Dude

Saturday, January 31, 2009

Google Talk: Current Issues in Computational Biology and Bioinformatics

Friday, January 30, 2009

Protein Wikis

Amazing DNA animation

Tuesday, January 13, 2009

Online searches and drug combinations

Monday, January 12, 2009

Personal Genomes

Top 25 most Dangerous Programming Errors

Friday, January 09, 2009

Systems Biology is important, starting.... Now

Thursday, January 08, 2009

Grep'ing Green Genes by TaxonID

Coding Organisms

Wednesday, January 07, 2009

Linux Watch: Open Discovery

Homegrown Molecular Biology

HMMER 3.0 Alpha Incoming

Blog Archive

Links