Over the past few days I’ve been developing some predictive models in R, for the solubility data being generated as part of the ONS Solubility Challenge. As I develop the models I put up a brief summary of the results on the wiki. In the end however, we’d like to use these models to predict [...]
Posts Tagged ‘cdk’
Deploying Predictive Models
Posted in cheminformatics, tagged cdk, python, qsar, R, REST, rpy2, solubility, web service on January 14, 2009 | 1 Comment »
Improved CDK Depiction Service
Posted in software, tagged cdk, depiction on January 14, 2009 | Leave a Comment »
The folks at the EBI have been doing some great work on the CDK. A major effort is underway to revamp JChemPaint and part of this involves improving the rendering of 2D depictions. While not complete I rebuilt a version of the CDK 1.2.x branch with the latest rendering code from the jchempaint-primary branch and [...]
Playing with REST Descriptor Services
Posted in cheminformatics, software, tagged cdk, descriptor, google, javascript, REST, web service on January 7, 2009 | 4 Comments »
As part of my work at IU I have been implementing a number of cheminformatics web services. Initially these were SOAP, but I realized that REST interfaces make life much easier. (also see here) As a result, a number of these services have simple REST interfaces. One such service provides molecular descriptor calculations, using the [...]
The Speedups Keep on Coming
Posted in cheminformatics, software, tagged benchmark, cdk, fingerprint, performance on December 4, 2008 | 7 Comments »
A while back I wrote about some updates I had made to the CDK fingerprinting code to improve performance. Recently Egon and Jonathan Alvarsson (Uppsala) had made even more improvements. Some of them are simple fixes (making a String[] final, using Set rather than List) while others are more significant (efficient caching of paths). In [...]
Conformational Envelopes
Posted in cheminformatics, research, software, visualization, tagged cdk, conformer, MBR, mds, shape, similarity on November 8, 2008 | Leave a Comment »
Joe Leonard posted a question on the CCL mailing list today regarding “conformation envelopes”. More specifically, he asked
Has there been work on creating visualizations of “conformer envelopes”, graphical representations of the conformational space occupied (or available) to molecules. Particularly when such visualizations are used to (quickly/visually) compare whether 2 molecules can adopt the same shape [...]
Depicting SMILES Dynamically
Posted in cheminformatics, software, tagged 2d, cdk, depiction, dynamic, html, smiles on October 28, 2008 | 2 Comments »
Sometime back I was playing around with dynamic HTML and cam across a tutorial that described how to implement the dynamic suggestion feature that is commonly found on many websites (such as Google and Amazon). This set me wondering how I could use this mechanism to dynamically depict a SMILES string as I type it.
Do the CDK Fingerprints Work?
Posted in cheminformatics, software, tagged benchmark, cdk, enrichment, fingerprint, pubchem, similarity on October 11, 2008 | 4 Comments »
In a previous post, I dicussed virtual screening benchmarks and some new public datasets for this purpose. I recently improved the performance of the CDK hashed fingerprints and the next question that arose is whether the CDK fingerprints are any good. With these new datasets, I decided to quantitatively measure how the CDK fingerprints compare [...]
Faster Substructure Search in the CDK
Posted in software, tagged cdk, graph, isomorphism, performance, substructure, Ullman on September 19, 2008 | 7 Comments »
The CDK uses the UniversalIsomorphismTester to perform graph and subgraph isomorphism. However it’s not very efficient and this shows when performing substructure searches over large collections. A quick test where I compared the CDK code to OpenBabel’s obgrep showed that the CDK is nearly forty times slower than OpenBabel. Improvements in this code will enhance [...]
Faster Fingerprinting
Posted in software, tagged cdk, dfs, fingerprint, hash, optimize, path, performance on September 12, 2008 | 3 Comments »
In my last post I had reported some timing measurements for various operations. One of them was fingerprinting using the path-based hashing Fingerprinter class in the CDK. As reported, it took nearly 4 minutes to process a 1000-molecule subset of ZINC. Not good.
So I spent a little time last night hacking on the code, primarily [...]
CDK Performance Measurements
Posted in software, tagged cdk, performance, profiling on September 11, 2008 | 4 Comments »
As part of a larger project, I’ve been doing some profiling on various aspects of the CDK, focusing on core cheminformatics operations. I’m using the excellent YourKit profiler to do the tests. They tests are run on a Macbook Pro (2.16GHz) with 1GB RAM, using the latest trunk version of the CDK and JDK 1.5.
The [...]