Friday, October 20, 2006


Spent today re-familiarizing myself with the power combo of Bison/Flex. Maybe because I'm a masochist, maybe cause I wanted to prove that CS class in programming languages, where we learned how to design compilers, was useful.

Bison is based off of YACC, which stands for Yet Another Compiler Compiler. And flex is a little program that creates C programs that tokenizes text based on regular expressions. What that means is that you take a piece of text and scan it with flex, and it turns it into a series of identified tokens. i.e. "1 + 2 is good" can be turned into the token string Number(1) Operator(+) Number(2) Word(is) Word (good). Then you use Bison to parse that string of tokens according to a finite state machine.

You can find nice little example at

Anyway, I have a set of features derived from several different aspects of protein sequence/structure alignment. I want to apply machine learning to these features figure out if these alignment are any good. But I haven't quite settled on the set of features I want to use. So I'm writing these little parse that will take configuration lines, like "%feature_1 / %feature_15 + %feature_2", and run through the database of samples, and produce the training files I will need for the machine learning.
Plus, I figure if I write this well enough, I can use it again later for other stuff.

No comments: