The ONSChallenge has been running for some time now and the simple web query form that tied in the data from Google Docs along with web services from IU has turned out to be pretty handy. With more and more data becoming available, I had done some initial exploratory analysis of the measured solubilities. One thing that is useful to the experimentalists is a suggestion of which compound to test next. This could be made on the basis of many factors – availability, ease of synthesis and so on. But one way to look at it is to examine what types of compounds have been tested previously, and suggest that the subsequent compounds be very different from those that have been tested.
Archive for December, 2008
Being fond of cooking, I’ve tended to collect recipes, utensils and gadgets. One thing that had been missing was a cast iron skillet. I’d been hearing about the wonders of these (naturally non-stick over time, holds heat, evenly distributes heat) for a long time and have been disillusioned with the non-stick stuff (though a small non-stick pan for eggs is handy). So we finally decided to pick up a Lodge cast iron skillet. Though it’s sold as pre-seasoned, we seasoned it once before use.
Our first attempt at using it was to make pan seared steak for Christmas lunch, using directions (1, 2) from Alton Brown. A juicy 12 oz ribeye, seasoned with kosher salt and coarse ground pepper. Seared for 90 s on the oven top and then put into a 500F oven for 3 minutes each side resulted in a beautiful medium steak. While the steak was resting, we put together a simple sauce with red wine, shallots and the brown bits from the pan.
The result was heavenly! Looks like cooking will be fun with the new skillet.
Over the last few years there has been a lot of activity in the area of Open Source cheminformatics software. Being a contributor to the CDK as well as a supporter of Open Source and Open Data efforts in general, I was delighted to be given the chance to talk about these topics at the BioIT World Conference & Expo. I’ll be talking about the state of art in Open Source cheminformatics, highlighting the advantages and pitfalls of using this type of software, using examples from toolkits, workbenches, pipelining tools and so on. In addition, I’ll be talking a little bit about Open Data and it’s importance and the possibilities that arise from combining Open Source software and Open Data.
Here’s the announcement of the actual meeting:
Join the life sciences community in Boston, MA next April 27-29, 2009 for the 7th Annual Bio-IT World Conference & Expo (www.bio-itworldexpo.com). Since its debut in 2002, Bio-IT has established itself as a premier event showcasing the myriad applications of IT and informatics to biomedical research and the drug discovery enterprise. The 2009 program will feature best practice case studies and joint partner presentations relevant to the technologies, research, and regulatory issues of life science, pharmaceutical, clinical, health, and IT professionals.
News of the ChemSpider Journal of Chemistry has been posted in various places. This effort is interesting as it is a combination of features that are currently available in different forms. Like other Open Access journals, the CJC will be follow the BOAI and hence be Open Access. In addition it will exhibit markup of the text, such as done by the RSC journals (which are not OA). I’m especially interested in this latter feature for automated processing of articles. While it is good to see the combination of these features, it also interesting to see that the journal will use a just-in-time (JIT) approach, and allow online peer review, commentaries. In this sense, it can be expected to be an especially good venue for ONS style projects.
I think this effort will be an interesting experiment, especially given that many “traditional” chemists may not have blogs and wiki’s to support a JIT approach, and that a journal might be more acceptable. I recently joined the editorial board. I’m eager to see how the journal evolves and am pleased to be able to contribute to this effort and encourages to do so as well.
A few days back, Hari on FriendFeed had asked how one could get a a CAS number from a PubChem compound ID (CID). The reverse, that is finding a CID for a given CAS number is generally quite easy as shown by Rich here and here. Since I was trying to get some writing done, this was a good excuse for a quick hack to solve the problem.
I met with Jean-Claude Bradley yesterday and we had a pretty useful hack session, allowing him to easily incorporate chemical and cheminformatics functionality into a GoogleDocs spreadsheet.
A common task that Jean-Claude wanted to automate was the calculation of milligrams (or milliliters) of a chemical required for a certain molarity. So what we need for this calculation is the compound name, desired molarity, molecular weight and the density. Importantly, the people who’d like to use this will provide compound names and not a directly parseable SMILES. So we’d also like to (optionally) get the SMILES. Finally, he wanted to be able to do this in a Google spreadsheet – rather than a specific web page or stand alone program.
A while back I wrote about some updates I had made to the CDK fingerprinting code to improve performance. Recently Egon and Jonathan Alvarsson (Uppsala) had made even more improvements. Some of them are simple fixes (making a String final, using Set rather than List) while others are more significant (efficient caching of paths). In combination, they have improved performance by over 50%, compared to my last update. Egon has put up a nice summary of performance runs here. Excellent work guys!