In providing a diverse and large number of compounds, information tools are needed to compare groups of compounds, to search for structural analogues of hit compounds by biological evaluation, and to cluster a large number of compounds into structural groups. The basic information tool for compounds is the quantitative comparison of molecular structures.
As shown in Figure 1, when comparing Compound A and Compound B, the structure is divided into partial fragments, and the Tanimoto coefficient, which calculates the percentage of common fragments out of the total number of fragments, is used to quantitatively determine the degree of similarity. (Figure 2)
Compounds listed in the library compounds can be presented in order of similarity based on the structure of the hit compound. Figure 3 shows a group of similarities with a constant scaffolds as a partial fragment, and Figure 4 shows that the scaffolds is changed in the similarities.
Clustering can be performed to group compounds into hierarchical clusters based on the distance (similarity) between compounds. In the Ward method, a typical method of hierarchical clustering, the selection of new compounds to be included in a cluster is repeated to minimize the variance of the distance between compounds in a cluster during the cluster formation process. Thus, if a large number of compounds are hierarchically clustered, a hierarchical threshold can be set to divide the compounds into arbitrary groups with small distances between them.
This method is used to recombine the Pilot library, classify selected compounds, and select representative compounds.