Toward Safer Medicines: Using Thomson Reuters Data to Refine QSAR Models

May 2013

The ability to accurately predict the toxicity of drug candidates from their chemical structure is critical for guiding experimental drug discovery toward safer medicines. In silico Quantitative Structure Activity (QSAR) approaches have successfully been applied as early screens to weed out potentially toxic chemicals or chemical scaffolds prior to further testing and development as drugs, and to identify chemicals in the environment that might present a possible health risk and prioritize them for additional testing. In two recent publications, a team from Thomson Reuters, using the company’s high-quality, manually curated biomedical databases and systems pharmacology platform MetaDrug™, has investigated the optimal design of QSAR models in two different scenarios where QSAR approaches have often failed in the past.

The first publication, “Prediction of organ toxicity endpoints by QSAR modeling based on precise chemical-histopathology annotations,” published in the September 2012 issue of Chemical Biology and Drug Design, describes how the Thomson Reuters group built a carefully curated database of organ toxicity and used it to build QSAR models to identify the potential toxicity of drugs, food additives, industrial chemicals, and other compounds. By painstaking manual annotation from scientific and regulatory reports, terms for organ damage—strictly standardized and organized into a hierarchy based on type of damage and organ structures and cell types affected—were associated to compounds that caused the toxicity. Strict guidelines on the level and type of information required to define the associations were applied. Using “atom fragment” chemical structure fingerprints derived from the chemical structures of a wide variety of compounds in the database, and a recursive-partitioning algorithm to discriminate toxic from non-toxic fingerprints, the team was able to derive models predictive of various types of damage to liver and kidney. A rigorous, two-stage validation process showed that the models held up well when challenged with chemicals of various types outside the initial training set.

The second report, “Assessment of hydroxylated metabolites of polychlorinated biphenyls as potential xenoestrogens: A QSAR comparative analysis,” published in the April 2013 issue of SAR and QSAR in Environmental Research, represents a collaboration between the Centers for Disease Control and Prevention, Thomson Reuters, and Leadscope Inc. This report addresses a different, but related, hurdle in QSAR model development. The authors demonstrate that models for estrogen receptor activation derived using a diverse set of structures of varying estrogenic potential fail to discriminate between estrogenic and non-estrogenic metabolites of polychlorinated biphenyls (PCBs), classifying all hydroxylated derivatives as estrogenic. In order to correctly predict the true estrogenic potential of these closely related structures, the authors built a focused QSAR model using a training set comprised only of OH-PCBs, and then used the model to more accurately predict the activities of 37 OH-PCBs reported in worldwide human biomonitoring studies.

QSAR models predictive of complex endpoints such as organ damage have proven difficult to build in the past. The first paper demonstrates that given a high-quality, well-defined dataset to train the model, predictors with a useful level of accuracy and precision can be derived, opening the potential application of QSAR methodologies in predicting more clinically relevant toxicities than have previously been thought possible. The second paper further demonstrates that QSAR models must be fit-for-purpose, and that careful design of the training set can be employed to build focused models with limited scope, but which are important components of in silico environmental risk assessment strategies.

drug development, pharmacology, pharmaceuticals, Quantitative Structure Activity, QSAR, drug toxicity, estrogenic drugs, drug safety

The data and citation records included in this report are from Thomson Reuters Web of ScienceTM. Web of ScienceTM is a registered trademark of Thomson Reuters. All rights reserved.