A while back I wrote about some updates I had made to the CDK fingerprinting code to improve performance. Recently Egon and Jonathan Alvarsson (Uppsala) had made even more improvements. Some of them are simple fixes (making a String[] final, using Set rather than List) while others are more significant (efficient caching of paths). In [...]
Posts Tagged ‘fingerprint’
The Speedups Keep on Coming
Posted in cheminformatics, software, tagged benchmark, cdk, fingerprint, performance on December 4, 2008 | 7 Comments »
Do the CDK Fingerprints Work?
Posted in cheminformatics, software, tagged benchmark, cdk, enrichment, fingerprint, pubchem, similarity on October 11, 2008 | 4 Comments »
In a previous post, I dicussed virtual screening benchmarks and some new public datasets for this purpose. I recently improved the performance of the CDK hashed fingerprints and the next question that arose is whether the CDK fingerprints are any good. With these new datasets, I decided to quantitatively measure how the CDK fingerprints compare [...]
Working With Fingerprints in R (can’t beat C!)
Posted in cheminformatics, software, tagged benchmark, c++, CRAN, fingerprint, R, similarity on October 11, 2008 | Leave a Comment »
Since I do a lot of cheminformatics work in R, I’ve created various functions and packages that make life easier for me as do my modeling and analysis. Most of them are for private consumption. However, I’ve released a few of them to CRAN since they seem to be generally useful.
One of them is the [...]
Which Bits are Important for Similarity Searches?
Posted in cheminformatics, research, tagged fingerprint, maccs, similarity, tanimoto on October 6, 2008 | Leave a Comment »
The recent paper by Wang and Bajorath is an interesting approach to identifying the important bits in a fingerprint, with respect to a dataset.
Their discussion focuses on the structural key type fingerprints (such as MACCS and the BCI fingerprints) and the problem they are trying to address is the fact that certain structural features may [...]
Faster Fingerprinting
Posted in software, tagged cdk, dfs, fingerprint, hash, optimize, path, performance on September 12, 2008 | 3 Comments »
In my last post I had reported some timing measurements for various operations. One of them was fingerprinting using the path-based hashing Fingerprinter class in the CDK. As reported, it took nearly 4 minutes to process a 1000-molecule subset of ZINC. Not good.
So I spent a little time last night hacking on the code, primarily [...]