Today while working on a project I needed to get access to the Gene Ontology hierarchy. While there a number of GO browsers such as Amigo, I needed access to the raw data to generate a graph that I could then slice and dice. A few minutes with Python led to a simple solution.
The program parses the OBO 1.2 formatted GO data file (either by directly downloading it or from a local file) and outputs a flat dictionary listing the term ID’s, names, namespace etc and a network representation of the GO hierarchy in ncol format. It uses a simple (and relatively non-robust) class to represent the data as an undirected graph (not really correct), though it’d be easy to use something like igraph to start doing some real network analysis. It’s certainly not a comprehensive solution, but I thought I’d put it out there.