In a previous post, I dicussed virtual screening benchmarks and some new public datasets for this purpose. I recently improved the performance of the CDK hashed fingerprints and the next question that arose is whether the CDK fingerprints are any good. With these new datasets, I decided to quantitatively measure how the CDK fingerprints compare to some other well known fingerprints.
Update – there was a small bug in the calculations used to generate the enrichment curves in this post. The bug is now fixed. The conclusions don’t change in a significant way. To get the latest (and more) results you should take a look here.
Benchmarking strategies to compare fingerprint performance have been described by Bender & Glen and Godden et al. We take an active molecule as the query. We then create a “target” collection of non-active molecules (decoys) and also add several other actives. We then evaluate the Tanimoto similarity between the query and each compound in the target collection. The target collection is then ranked in order of decreasing similarity. The hope is that a good fingerprint will cause the actives in the target collection to be highly ranked. The extent to which this occurs is a measure of the effectiveness of the fingerprint. The Curious Wavefunction has a good post on datasets and performance measurements for virtual screening methods.
How do we measure performance quantitatively? One way to do this is by evaluating enrichment curves. The process first looks at, say, the top 1% of the ranked target collection and counts what fraction of all the actives lie within this portion. The process is then repeated with increasing percentages. The goal of a fingerprint (or any other virtual screening method) is to have a large fraction of the actives when we consider a low percentage of the target collection, since this would theoretically allow us to look at just, say, 5% of the whole target collection to get the all actives.
Note, while the use of enrichment curves is common, it is not necessarily the most appropriate measure of performance and has been discussed by Hawkins et al. Alternative methods such as ROC curves and the RIE metric are more rigorous and robust. But it took 10 minutes to write the code to get the enrichment curves (and factors), so that’s what I’ll use here.
So with this procedure in hand, I considered two of the datasets provided by Rohrer & Baumann. Specifically, I used AID’s 466 and 548 (which were cleaned locally). Each of these datasets had 30 actives and 15,000 decoys. I combined the actives with the decoys and obtained enrichment curves using each of the actives against the 15,030 compound target collection. The final enrichment curve was then obtained by averaging the 30 individual enrichment curves.
This procedure was performed using 5 different fingerprints, and in all cases the Tanimoto metric was employed.
- BCI 1052 bit structural keys
- MACCS 166 bit structural keys (implemented in the CDK)
- EState 79 bit structural keys (implemented in the CDK)
- CDK 1024 bit standard hashed (ignores cyclic systems)
- CDK 1024 bit extended hashed (considers cyclic systems)
The plots below compare the performance of the five fingerprints for the AID 466 dataset. Looking at the right hand plot, we see that overall none of the fingerprints are doing too great. More interesting is the left hand plot which focuses on the smaller percentages of the target collection. Here we see that the standard CDK hashed fingerprint is actually performing quite similarly to the BCI 1052 structural keys. Surprisingly the extended CDK fingerprints don’t seem to be doing so well, even though it takes into account more features.
The figure below shows the results for the same give fingerprints applied to the 548 dataset. As noted above, enrichment curves are dependent on the dataset being analyzed. For this dataset, all the fingerprints exhibit better performance. In this case, the CDK extended fingerprint appears to be doing the best. Surprisingly the BCI keys are outperformed by the two CDK hashed fingerprints.
It’s also useful to look at the enrichment factors for, say , the top 5% of the target collection and are listed below for the five fingerprints
|Dataset||BCI||MACCS||EState||CDK Standard||CDK Extended|
The overall performance of all the fingerprints is not great – but this is not surprising, since the datasets have been constructed to have a high degree of scaffold diversity. Such datasets are designed not to unduly favor 2D fingerprint methods. While these results are not conclusive, and should be repeated for more datasets (and I’d also like to see how circular fingerprints perform on these datasets), they do suggest that the simplistic CDK hashed fingerprints are not too bad.