Houghten, R. et al, “Strategies for the Use of Mixture-Based Synthetic Combinatorial Libraries: Scaffold Ranking, Direct Testing In Vivo, and Enhanced Deconvolution by Computational Methods”, J. Comb. Chem., 2008, 10, 3-19
Recently a collaborator pointed me to the above article by Houghten and co-workers where they describe the use of mixture-based combinatorial libraries for high-throughput screening (HTS) experiments.
Traditionally an HTS experiment will screen thousands to millions of individual molecules. Obviously, it’s all done by robots so though you have to be careful during setup it’s not like you have to do it all by hand. But the fact is, if it’s possible to reduce the actual number of individual screens, life becomes easier and cheaper. Houghten et al describe an elegant approach that does just this.
Enter the idea of mixtures. The concept is quite simple: take a mixture of compounds (ranging from hundreds to thousands) and assay the mixture rather than individual compounds. The first thing that comes to mind is that the individual compounds had better not react with each other. Houghten and his co-workers have published (here and here) a number of studies on this technique and it appears that such behavior is not prevalent or can be taken care of.
Ignoring the possibility of reactions within the mixture, how do we identify the individual molecule(s) that is active? The solution used by Houghten et al is quite simple and elegant. I’ll use the example from the paper in question. Lets say you’re trying to identify an active tripeptide and have a pool of three amino acids. This gives us 27 possible tripeptides to test. The mixture approach involves the construction of a Positional Scanning Library (PSL). Such a library is composed of three sublibraries, where one of the positions in the tripeptide is fixed. Schematically this would be represented as OXX, XOX and XXO where O represents the fixed position. So lets say our amino acid pool consisted of R, A and T. This means we’d have the following nine mixtures
Here each cell corresponds to a mixture of compounds, all which have a certain position fixed.
So now rather than screen 27 compounds we only do 9 screens since we only screen the mixtures. Now if the active tripeptide is RAT, it’s easy to see that the mixtures, RXX, XAX and XXT would show activity. In other words, the active tripeptide can be easily identified from the mixtures that show activity.
Now this is a very simplified example – there may be more than one active tripeptide and they may have differing degrees of activity. So it’s not as easy as simply picking out a single active. In general, one would need to identify multiple candidates – but a 100 compounds is certainly better than a 100,000! But it’s clear that if a mixture is inactive, that’s a whole bunch of compound you could avoid followup on. And as the numbers go up, the gains are more dramatic (2000-fold to 70,000-fold, depending on the technique as shown in Table 3 of the paper).
Of course, peptide libraries are just one possibility. One can consider more general scaffolds (such pyrrolidine bis-cyclic guianidine) with multiple positions of variation. Now since a PSL focuses on the positions of substituents, it’s possible to have a mixture containing multiple scaffolds. The authors describe a strategy that rearranges the mixtures such that the new mixtures represent individual scaffolds. Note that in such a scaffold mixture there is no “fixed position” – however what we can do now is to rank scaffolds in terms of their possible activity. Given a ranking, one can then select a scaffold and perform mixture screening with a PSL just for that scaffold.
Another nice experiment described in the paper is the use of mixture libraries for in vivo screening. They used a PSL of 125,000 tetrapeptides (three of the positions were fixed and they had a 50 amino acid pool). The goal was to identify an anti-nociceptive that was known to be in the mixture and had high affinity for the µ-opioid receptor. They tested the mixture of compounds and observed activity (compared to some control mixtures that were known to not contain the active compound). The next step was to fix the second position. On doing so, the mixture size went from 125,000 compounds to 2500 compounds and the activity of the mixture increased 4X. The key thing here is that they were screening a single mixture and not 125,000 individual compounds (which would simply not be possible) – essentially converting a low throughput assay to a pseudo high throughput format.
So it’s a pretty cool method, but some issues can crop up. I’ve already noted the possibility of compounds reacting within a mixture. This could be addressed computationally – identify a set of groups that are known to be reactive (or react with other certain groups) and identify pairs, triples, etc of compounds that have these groups and flag them as possible candidates for such reactions. Or simply have an experienced chemist not include reactive compounds.
Another issue with this approach is that the hits that arise from screening PSL’s or scaffold libraries are not necessarily diverse (less so for the former than the latter). In many scenarios this can be viewed as a drawback. But as Houghten et al note, this could be useful for identifying useful activity cliffs. A side-effect of the lack of diversity, is that one can get very nice data on SAR’s. Of course, computational diversity analysis methods can be used to enhance the experimental aspects of this procedure, as described in the paper.