
Website | GitHub | NeurIPS 2021 Paper | Long Paper | Slack | TDC Mailing List
Therapeutics Data Commons is an open-science platform with AI/ML-ready datasets and learning tasks for therapeutics, spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles. All resources are integrated and accessible via an open Python library.
Our Vision: Therapeutics machine learning is an exciting field with incredible opportunities for expansion, innovation, and impact. The collection of curated datasets, learning tasks, and benchmarks in Therapeutics Data Commons (TDC) serves as a meeting point for domain and machine learning scientists. TDC is the first unifying framework to systematically access and evaluate machine learning across the entire range of therapeutics. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation.
Note
See the TDC website to learn about machine learning for drug development and discovery and get more information on datasets, tasks, leaderboards, data functions, and other features available in Therapeutics Data Commons.
Cite our NeurIPS 2021 Datasets and Benchmarks Paper:
@article{Huang2021tdc,
title={Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development},
author={Huang, Kexin and Fu, Tianfan and Gao, Wenhao and Zhao, Yue and Roohani, Yusuf and Leskovec, Jure and Coley,
Connor W and Xiao, Cao and Sun, Jimeng and Zitnik, Marinka},
journal={Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks},
year={2021}
}
Getting Started
API References
- tdc.single_pred
- tdc.single_pred.single_pred_dataset module
- tdc.single_pred.adme module
- tdc.single_pred.crispr_outcome module
- tdc.single_pred.develop module
- tdc.single_pred.epitope module
- tdc.single_pred.hts module
- tdc.single_pred.paratope module
- tdc.single_pred.qm module
- tdc.single_pred.test_single_pred module
- tdc.single_pred.tox module
- tdc.single_pred.yields module
- tdc.multi_pred
- tdc.multi_pred.bi_pred_dataset module
- tdc.multi_pred.multi_pred_dataset module
- tdc.multi_pred.antibodyaff module
- tdc.multi_pred.catalyst module
- tdc.multi_pred.ddi module
- tdc.multi_pred.drugres module
- tdc.multi_pred.drugsyn module
- tdc.multi_pred.dti module
- tdc.multi_pred.gda module
- tdc.multi_pred.mti module
- tdc.multi_pred.peptidemhc module
- tdc.multi_pred.ppi module
- tdc.multi_pred.test_multi_pred module
- tdc.generation
- tdc.benchmark_group
- tdc.utils
- tdc.utils.label module
- tdc.utils.label_name_list module
- tdc.utils.load module
atom_to_one_hot()
bi_distribution_dataset_load()
bm_download_wrapper()
bm_group_load()
dataverse_download()
distribution_dataset_load()
download_wrapper()
extract_atom_from_mol()
extract_atom_from_protein()
general_load()
generation_dataset_load()
generation_paired_dataset_load()
interaction_dataset_load()
multi_dataset_load()
oracle_download_wrapper()
oracle_load()
pd_load()
process_crossdock()
process_dude()
process_pdbbind()
process_scpdb()
property_dataset_load()
receptor_download_wrapper()
receptor_load()
three_dim_dataset_load()
zip_data_download_wrapper()
- tdc.utils.misc module
- tdc.utils.query module
- tdc.utils.retrieve module
- tdc.utils.split module
- tdc.chem_utils
- tdc.chem_utils.featurize module
- tdc.chem_utils.featurize.molconvert submodule
MolConvert
MoleculeFingerprint
atom2onehot()
atomstring2atomfeature()
bondtype2idx()
canonicalize()
distance3d()
get_atom_features()
get_mol()
mol2file2smiles()
mol2smiles()
mol_conformer2graph3d()
molfile2PyG()
molfile2smiles()
onek_encoding_unk()
raw3D2pyg()
sdffile2coulomb()
sdffile2graph3d_lst()
sdffile2mol_conformer()
sdffile2selfies_lst()
sdffile2smiles_lst()
selfies2smiles()
smiles2DGL()
smiles2ECFP2()
smiles2ECFP4()
smiles2ECFP6()
smiles2PyG()
smiles2daylight()
smiles2graph2D()
smiles2maccs()
smiles2mol()
smiles2morgan()
smiles2rdkit2d()
smiles2selfies()
smiles_lst2coulomb()
upper_atom()
xyzfile2coulomb()
xyzfile2graph3d()
xyzfile2selfies()
xyzfile2smiles()
- tdc.chem_utils.featurize.molconvert submodule
- tdc.chem_utils.oracle module
- tdc.chem_utils.oracle.filter submodule
- tdc.chem_utils.oracle.oracle submodule
AbsoluteScoreModifier
AtomCounter
ChainedModifier
ClippedScoreModifier
GaussianModifier
Isomer_scoring
Isomer_scoring_prev
LinearModifier
MPO_meta
MinMaxGaussianModifier
PyScreener_meta
SA()
SMARTS_scoring
ScoreModifier
Score_3d
SmoothClippedScoreModifier
SquaredModifier
ThresholdedLinearModifier
Vina_3d
Vina_smiles
amlodipine_mpo()
askcos()
calculateScore()
canonicalize()
cyp3a4_veith()
deco_hop()
drd2()
fexofenadine_mpo()
fingerprints_from_mol()
get_PHCO_fingerprint()
gsk3b()
ibm_rxn()
isomer_meta()
isomer_meta_prev()
jnk3
load_cyp3a4_veith()
load_drd2_model()
load_gsk3b_model()
load_pickled_model()
median_meta
molecule_one_retro
numBridgeheadsAndSpiro()
osimertinib_mpo()
parse_molecular_formula()
penalized_logp()
perindopril_mpo()
qed()
ranolazine_mpo()
readFragmentScores()
rediscovery_meta
scaffold_hop()
similarity()
similarity_meta
sitagliptin_mpo()
sitagliptin_mpo_prev()
smiles2formula()
smiles_2_fingerprint_AP()
smiles_2_fingerprint_ECFP4()
smiles_2_fingerprint_ECFP6()
smiles_2_fingerprint_FCFP4()
smiles_to_rdkit_mol()
smina()
tree_analysis()
valsartan_smarts()
zaleplon_mpo()
zaleplon_mpo_prev()
- tdc.chem_utils.evaluator module
calculate_internal_pairwise_similarities()
calculate_pc_descriptors()
canonicalize()
continuous_kldiv()
discrete_kldiv()
diversity()
fcd_distance()
fcd_distance_tf()
fcd_distance_torch()
get_fingerprints()
get_mols()
kl_divergence()
novelty()
single_molecule_validity()
unique_lst_of_smiles()
uniqueness()
validity()
- tdc.chem_utils.featurize module
- tdc.base_dataset
- tdc.evaluator
- tdc.metadata
- tdc.oracles