Joerg has made a nice blog post on the use of Open Source software and data to analyse the occurence of antithrombotics. More specifically he was trying to answer the question
Which XRay ligands are closest to the Fontaine et al. structure-activity relationship data for allowing structure-based drug design?
using Blue Obelisk tools and ChemSpider and where [...]
Posts Tagged ‘performance’
Quick Comments on an Analysis of Antithrombotics
Posted in cheminformatics, tagged database, factor Xa, inchikey, performance, pubchem, python, REST on January 5, 2009 | Leave a Comment »
The Speedups Keep on Coming
Posted in cheminformatics, software, tagged benchmark, cdk, fingerprint, performance on December 4, 2008 | 7 Comments »
A while back I wrote about some updates I had made to the CDK fingerprinting code to improve performance. Recently Egon and Jonathan Alvarsson (Uppsala) had made even more improvements. Some of them are simple fixes (making a String[] final, using Set rather than List) while others are more significant (efficient caching of paths). In [...]
Brute Force – Inelegant, But Sometimes Useful
Posted in research, software, tagged benchmark, database, nearest neighbor, performance, postgres, similarity, spatial index on November 20, 2008 | 1 Comment »
A few days back I posted on improving query times in Pub3D by going from a monolithic database (17M rows), to a partitioned version (~ 3M rows in 6 separate databases) and then performing queries in parallel. I also noted that we were improving query times by making use of an R-tree spatial index.
Andrew Dalke [...]
Java Port of VFLib Works and it’s Blazing
Posted in cheminformatics, software, tagged benchmark, isomorphism, matching, performance, substructure, vf2 on November 18, 2008 | 6 Comments »
Sometime back I described how I was porting the VFLib algorithms to Java, so that we could use it for substructure search, since the current UniversalIsomorphismTester is pretty slow for this task, in general. While I had translated the Ullman algorithm implementation of VFLib and shown that it outperformed the CDK method, it turned out [...]
Multi-threaded Database Access with Python
Posted in software, tagged database, parallel, performance, postgres, python, threads on November 14, 2008 | 8 Comments »
Pub3D contains about 17.3 million 3D structures for PubChem compounds, stored in a Postgres database. One of the things we wanted to do was 3D similarity searching and to achieve that we’ve been employing the Ballester and Graham-Richards method. In this post I’m going to talk about performance – how we went from a single [...]
Faster Substructure Search in the CDK
Posted in software, tagged cdk, graph, isomorphism, performance, substructure, Ullman on September 19, 2008 | 7 Comments »
The CDK uses the UniversalIsomorphismTester to perform graph and subgraph isomorphism. However it’s not very efficient and this shows when performing substructure searches over large collections. A quick test where I compared the CDK code to OpenBabel’s obgrep showed that the CDK is nearly forty times slower than OpenBabel. Improvements in this code will enhance [...]
Faster Fingerprinting
Posted in software, tagged cdk, dfs, fingerprint, hash, optimize, path, performance on September 12, 2008 | 3 Comments »
In my last post I had reported some timing measurements for various operations. One of them was fingerprinting using the path-based hashing Fingerprinter class in the CDK. As reported, it took nearly 4 minutes to process a 1000-molecule subset of ZINC. Not good.
So I spent a little time last night hacking on the code, primarily [...]
CDK Performance Measurements
Posted in software, tagged cdk, performance, profiling on September 11, 2008 | 4 Comments »
As part of a larger project, I’ve been doing some profiling on various aspects of the CDK, focusing on core cheminformatics operations. I’m using the excellent YourKit profiler to do the tests. They tests are run on a Macbook Pro (2.16GHz) with 1GB RAM, using the latest trunk version of the CDK and JDK 1.5.
The [...]