The chemical space describes the space of all possible drug-like molecules and is estimated to be around $10^{60}$(1). This vast amount of potential compounds is called chemical space. I like to think of it as a multidimensional-space, where each dimension represent a porperty or feature and each point a possible molecule.

Source: 2

This means if we had the perfect set of molecular properties, we could define a desirable subspace within the vastness of chemical space. A practical way to do this is through a Multi-Parameter Optimization (MPO) approach. Instead of a single, perfect scoring function, we define this subspace as the collection of all molecules that simultaneously satisfy multiple property constraints. Using this MPO framework, we can formally define the subspace S as:

$$ S = \{ m \in C \mid p_1(m) \ge T_1 \land p_2(m) \le T_2 \land \dots \land p_k(m) \le T_k \} $$

Here, the subspace $S$ is the set of all molecules $m$ from the chemical space $C$ that satisfy a series of conditions joined by the logical “AND” operator ($land$). Each condition requires a specific molecular property $p_i​(m)$ (like potency, solubility, or toxicity) to meet a given threshold $T_i$​. This creates a well-defined region in chemical space containing molecules with a desirable, balanced profile.

As an example let’s define a subspace $S$ for molecules that are potent, soluble, and not toxic:

Potency (measured by $pIC_{50}$) should be high: $p_1(m) = {pIC}_{50} \geq 7.0$

Solubility (measured by $\log S$) should be sufficient: $p_2(m)=\log S(m) > -5.0$

Toxic (a predicted score) should be low: $p_3(m)=Tox(m) \leq 0.4$

The subspace $S$ would be then defined as:

$$ S = \{ m \in C \mid pIC_{50}(m) \geq 7.0 \land \log S(m) > -5.0 \land Tox(m) \leq 0.4 \} $$

This MPO approach provides a clear and flexible way to navigate the immense chemical space. By defining subspaces based on relevant properties, we can focus our search on molecules that have a higher probability of becoming successful drugs.

While the concept is powerful, exploring chemical space remains a significant challenge. Synthesizing and testing even a fraction of the possible molecules is impossible. This is where computational methods and artificial intelligence come into play. Generative models, for example, can be trained to generate novel molecules that lie within the desired subspace $S$.

By combining our understanding of chemical space with modern computational tools, we can accelerate discoverey.


  1. Bohacek, Regine S., et al. “The Art and Practice of Structure-Based Drug Design: A Molecular Modeling Perspective.” Medicinal Research Reviews, vol. 16, no. 1, Jan. 1996, pp. 3–50. DOI.org (Crossref)↩︎

  2. https://extrapolations.com/what-is-chemical-space/ ↩︎