Collation and data-mining of literature bioactivity data for drug discovery-Reference-Cited by-同舟云学术

Collation and data-mining of literature bioactivity data for drug discovery

Published:2011-09-21 Issue:5 Volume:39 Page:1365-1370
ISSN:0300-5127
Container-title:Biochemical Society Transactions
language:en
Short-container-title:

Author:

Bellis Louisa J.¹,Akhtar Ruth¹,Al-Lazikani Bissan²,Atkinson Francis¹,Bento A. Patricia¹,Chambers Jon¹,Davies Mark¹,Gaulton Anna¹,Hersey Anne¹,Ikeda Kazuyoshi¹,Krüger Felix A.¹,Light Yvonne¹,McGlinchey Shaun¹,Santos Rita¹,Stauch Benjamin¹,Overington John P.¹

Affiliation:

1. Computational Chemical Biology, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, U.K.

2. Computational Biology and Chemogenomics, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey SM2 5NG, U.K.

Abstract

The challenge of translating the huge amount of genomic and biochemical data into new drugs is a costly and challenging task. Historically, there has been comparatively little focus on linking the biochemical and chemical worlds. To address this need, we have developed ChEMBL, an online resource of small-molecule SAR (structure–activity relationship) data, which can be used to support chemical biology, lead discovery and target selection in drug discovery. The database contains the abstracted structures, properties and biological activities for over 700000 distinct compounds and in excess of more than 3 million bioactivity records abstracted from over 40000 publications. Additional public domain resources can be readily integrated into the same data model (e.g. PubChem BioAssay data). The compounds in ChEMBL are largely extracted from the primary medicinal chemistry literature, and are therefore usually ‘drug-like’ or ‘lead-like’ small molecules with full experimental context. The data cover a significant fraction of the discovery of modern drugs, and are useful in a wide range of drug design and discovery tasks. In addition to the compound data, ChEMBL also contains information for over 8000 protein, cell line and whole-organism ‘targets’, with over 4000 of those being proteins linked to their underlying genes. The database is searchable both chemically, using an interactive compound sketch tool, protein sequences, family hierarchies, SMILES strings, compound research codes and key words, and biologically, using a variety of gene identifiers, protein sequence similarity and protein families. The information retrieved can then be readily filtered and downloaded into various formats. ChEMBL can be accessed online at https://www.ebi.ac.uk/chembldb.

Publisher

Portland Press Ltd.

Subject

Biochemistry

Link

https://portlandpress.com/biochemsoctrans/article-pdf/39/5/1365/551268/bst0391365.pdf

Reference36 articles.

1. ChEBI: a database and ontology for chemical entities of biological interest;Degtyarenko;Nucleic Acids Res.,2007

2. ZINC: a free database of commercially available compounds for virtual screening;Irwin;J. Chem. Inf. Model.,2005

3. An overview of the PubChem BioAssay resource;Wang;Nucleic Acids Res.,2009

4. DrugBank 3.0: a comprehensive resource for “Omics” research on drugs;Knox;Nucleic Acids Res.,2010

Cited by 27 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Big data and artificial intelligence (AI) methodologies for computer-aided drug design (CADD);Biochemical Society Transactions;2022-01-25

2. Milk-Way Algorithm applied in Imbalanced Dataset;2021-02-23

3. Design of Novel Drug-like Molecules Using Informatics Rich Secondary Metabolites Analysis of Indian Medicinal and Aromatic Plants;Combinatorial Chemistry & High Throughput Screening;2020-12-28

4. Database Resources for Drug Discovery;Computer-Aided Drug Design;2020

5. Evidence-Based Precision Oncology with the Cancer Targetome;Trends in Pharmacological Sciences;2017-12