Here is your answer, Edouard:
1. PCBA – This is an acronym for PubChem BioAssay Database.
How to Access PCBA: You can access PCBA for free! Just go to: http://ift.tt/18M7HgQ
Hot tips for using PCBA in searches –
Summary: PCBA was set up by the the US NIH and contains over 500,000 descriptions of assay
protocols covering 5,000 protein targets , 30,000 gene targets, and over 130,000
million bioactivity compounds. This by far is the biggest database of molecules, and
far outstrips ChEMBL (1.3 million bioactivity compounds) and Kaggle (164,000 compounds).
ChEMBL has 5,000 drug targes and Kaggle has 15 drug targets. Google only looked
at 128 experiments from this database. They found only 1.8% of the molecules were
Molecules included in PCBA: small molecules (synthetic and natural) and RNAi reagents.
What it does not cover: PCBA does not cover large molecules (anti-bodies, hormones, etc.)
and does not cover any of the non-coding target genes that are now thought to
account for > 50% of aging, 90% of autoimmune diseases, and 93% of
variations in human diseases. In other words, the protein targets and
protein-coding gene targets in PCBA do not cover the largest component
of aging and human disease which involve non-coding portion of the genome
(See references below and Science article from May, 2011 – below)
2. DUD-E – This is an acronym for Directory of Useful Decoys, Enhanced.
How to access DUD-E: Go to the following site: http://dude.docking.org
From there, you can just browse the DUD-E targets or download the software package.
To generate decoys, go here: http://ift.tt/1xMV2Fz
Summary: This is a database that was set up by the Dept. of Pharmacology at the University of
California, San Francisco (UCSF) and is designed to predict molecular docking of
various compounds. It is an enhanced and rebuilt version of DUD, which was the old
molecular docking program. It includes 22,886 active compounds and their affinities
against 102 targets (proteins). It includes 50 decoys. It is free and can be easily accessed
by anyone. The Googe authors looked at 102 datasets (protein targets) and found 1.6%
of these to be active.
What it does not cover: DUD-E does not look at assays (such as assay artifacts). It also does not
cover decoys that are large molecules (such as antibodies, cytokines, hormones, etc.)
It also does not cover RNA (such as RNAi, lncRNA, etc.)
3. MUV – This is an acronym for Maximum Unbiased Validation.
How to Access MUV: First go to the main page for the virtual drug screening site:
Then download the MUV datasets from the following site:
Summary: This is a virtual drug screening database that was set up by the University of Technology,
Carolo-Wilehlmina, in Braunschweig, Germany. It uses a concept called “nearest neighbor
analysis” to design benchmark data sets, based on PubChem bioactivity data. The computer
program first removes assay artifacts (Hill slop filter, Frequency of hits filter, autoflourescence
filter, and Luciferase inhibitor filter), then it removes activities not embedded in decoys (Ex:
chemical space embedding filter), then it designs datasets of actives (common spread), then
it designs datasets of decoys (common separation). The 1st MUV version contained 17 datasets
and corresponding decoys. Further datasets can be generated yourself, where you create
your own MUVs. The Google paper authors only looked at 17 datasets using MUV.
Molecules included: Inhibitors, Agonists, Antagonists, Alloersteric modulators, and allosteric inhibitors
for virtually any protein target. (Ex: GPCRs, Kinases, Nuclear receptors, RNases, Chaperones,
PPIs, Proteases, Receptors for Tyrosine Kinases, etc.)
Molecules included in MUV: http://ift.tt/1xMV2FF
4. Tox21 – This is an abbreviation for “Toxicology in the 21st Century”. It was actually a contest
where there were applicants, registration, and winners. The winners were announced
on January 26, 2015.
How to access Tox21: First go to the main page for this toxicology database:
Summary: This is one of the first scientific contests to use “crowdsource” data from independent
researchers to reveal how well they can predict compounds interference in biochemical
pathways, using only chemical structure data. The contest was set up by the NIH’s
National Center for Advancing Translational Sciences (NIH-NCAT), by the EPA, and
by the FDA. The contest had competitors from 18 countries. The winners included the
1. Team Bioinf from Johannes Kepler University in Linz, Austria. They won the Grand
Challenge, the Nuclear Receptor Panel award, and the Stress response panel award.
Team Bioinf accurately predicted their compounds in all 12 assays.
Their challenge assays included the following:
a) stress response panel, which the aryl hydrocarbon receptor, the nuclear factor
(Erythroid-derived 2) like 2 antioxidant response element (aka Nrf2ARE);
b) the estrogen receptor alpha and the Heat Shock Factor response element (HSFRE);
c) the androgen receptor ligand binding domain;
d) the nuclear receptor signaling panel, including the PPARgamma receptor
2. Team AMAZIZ – This team from the Technical University of Munich.
They wind the award for “Best Balanced Accuracy”
Team AMAZIZ figured out two assays:
– ATAD5 and one for the Mitochondrial membrane potential (MtP)
3. Team Dmlab – This team from Budapest University of Technology figured out three assays:
– Androgen receptor, aromatase, and p53
4. Team Microsomes – This team from Meiji Pharmaceutical University figured out one assay:
– Estrogen receptor alpha ligand binding domain