Ke, diverse Selected Novel compounds Original and unique Selected, derivatives Chosen No descriptions Selected Selected, diverse Extremely diverse All-natural productusing the sdfrag command in MOE [22]. Owing to the lack from the original molecules inside the Scaffold Tree offered by the sdfrag command, the missing original molecules were added towards the SDF files with the Scaffold Tree utilizing PP eight.5 (Added file 1: File S1). The generation of the Scaffold Tree (from Level 1 to Level n) was achieved in PP 8.5 by defining the fragments at different levels for each and every molecule. Eventually, the SDF files of those fragment representations were obtained (Further file 1: File S1).Analyses of scaffold diversityNumber of all molecules in each and every library Quantity of the molecules in every single library after processed by distinct filters Basic description on the studied librariesto 700. The following analyses have been conducted based on the 12 standardized subsets.Generation of fragment presentationsA total of 7 fragment representations have been applied to characterize the structural functions and scaffolds of molecules, and they’re ring assemblies, bridge assemblies, rings, chain assemblies, Murcko frameworks [7], RECAP fragments [8], and Scaffold Tree [9]. The initial 5 sorts of fragment representations had been generated by using the Generate Fragments component in Pipeline Pilot 8.5 (PP 8.5) [20]. The RECAP fragments and Scaffold Tree for each and every molecule were generated byThe scaffold BCTC chemical information diversity of each standardized dataset was characterized by the fragment counts and PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/21300628 the cumulative scaffold frequency plots (CSFPs) or so known as cyclic method retrieval (CSR) curves [23, 24]. The duplicated fragments had been removed first, along with the numbers of distinctive fragments for every single dataset have been counted for ring assemblies, bridge assemblies, rings, chain assemblies, Murcko frameworks, RECAP fragments and Levels 01 of Scaffold Tree, in addition to the numbers of molecules they represent (referred to as the scaffold frequency). Then, the scaffolds had been sorted by their scaffold frequency from the most to the least, as well as the cumulative percentage of scaffolds was computed because the cumulative scaffold frequency divided by the total quantity of molecules [12]. Similarly, percentages of special fragments can also be calculated. Then, CSFPs with the quantity or the percentage of Murcko frameworks and Level 1 scaffolds, which may well better represent the entire molecules than the other forms of fragments, had been generated. In each CSFP, PC50C was determined for every scaffold representation to quantify the distribution of molecules more than scaffolds.Fig. 2 Box plots from the distributions of molecular weight for the 12 studied databasesShang et al. J Cheminform (2017) 9:Page five ofPC50C was defined because the percentage of scaffolds that represent 50 of molecules within a library [14].Generation of Tree MapsThe Tree Maps methodology was employed to analyze the structural similarity on the Level 1 scaffolds by utilizing the TreeMap application, which can highlight each the structural diversity of scaffolds and the distribution of compounds over scaffolds. Tree Maps has been employed as a highly effective tool to depict structure ctivity relationships (SARs) and analyze scaffold diversity [25]. Distinctive from traditional tree structure represented by a graph with all the root node and children nodes in the top towards the bottom, Tree Maps proposed by Shneiderman makes use of circles or rectangles in a 2D space-filling method to delegate a kind of house to get a clustered dat.