tdc.chem_utils

tdc.chem_utils.featurize module

tdc.chem_utils.featurize.molconvert submodule

class tdc.chem_utils.featurize.molconvert.MolConvert(src='SMILES', dst='Graph2D', radius=2, nBits=1024)[source]

Bases: object

MolConvert: convert the molecule from src formet to dst format.

Example

convert = MolConvert(src = ‘SMILES’, dst = ‘Graph2D’) g = convert(‘Clc1ccccc1C2C(=C(/N/C(=C2/C(=O)OCC)COCCN)C)C(=O)OC’) # g: graph with edge, node features g = convert([‘Clc1ccccc1C2C(=C(/N/C(=C2/C(=O)OCC)COCCN)C)C(=O)OC’,

‘CCCOc1cc2ncnc(Nc3ccc4ncsc4c3)c2cc1S(=O)(=O)C(C)(C)C’])

# g: a list of graphs with edge, node features if src is 2D, dst can be only 2D output if src is 3D, dst can be both 2D and 3D outputs src: 2D - [SMILES, SELFIES]

3D - [SDF file, XYZ file]

dst: 2D - [2D Graph (+ PyG, DGL format), Canonical SMILES, SELFIES, Fingerprints]

3D - [3D graphs (adj matrix entry is (distance, bond type)), Coulumb Matrix]

static eligible_format(src=None)[source]

given a src format, output all the available format of the src format Example MoleculeLink.eligible_format(‘SMILES’) ## [‘Graph’, ‘SMARTS’, …]

class tdc.chem_utils.featurize.molconvert.MoleculeFingerprint(fp='ECFP4')[source]

Bases: object

Example: MolFP = MoleculeFingerprint(fp = ‘ECFP6’) out = MolFp(‘Clc1ccccc1C2C(=C(/N/C(=C2/C(=O)OCC)COCCN)C)C(=O)OC’) # np.array([1, 0, 1, …..]) out = MolFp([‘Clc1ccccc1C2C(=C(/N/C(=C2/C(=O)OCC)COCCN)C)C(=O)OC’,

‘CCCOc1cc2ncnc(Nc3ccc4ncsc4c3)c2cc1S(=O)(=O)C(C)(C)C’])

# np.array([[1, 0, 1, …..],

[0, 0, 1, …..]])

Supporting FPs: Basic_Descriptors(atoms, chirality, ….), ECFP2, ECFP4, ECFP6, MACCS, Daylight-type, RDKit2D, Morgan, PubChem

tdc.chem_utils.featurize.molconvert.bondtype2idx(bond_type)[source]
tdc.chem_utils.featurize.molconvert.canonicalize(smiles)[source]
tdc.chem_utils.featurize.molconvert.distance3d(coordinate_1, coordinate_2)[source]
tdc.chem_utils.featurize.molconvert.get_atom_features(atom)[source]
tdc.chem_utils.featurize.molconvert.get_mol(smiles)[source]
tdc.chem_utils.featurize.molconvert.mol2file2smiles(molfile)[source]

convert mol2file into SMILES string

Parameters

mol2file – str, a file.

Returns

str, SMILES strings

Return type

smiles

tdc.chem_utils.featurize.molconvert.mol2smiles(mol)[source]
tdc.chem_utils.featurize.molconvert.mol_conformer2graph3d(mol_conformer_lst)[source]

convert list of (molecule, conformer) into a list of 3D graph.

Parameters

mol_conformer_lst – list of tuple (molecule, conformer)

Returns

a list of 3D graph.

each graph has (i) idx2atom (dict); (ii) distance_adj_matrix (np.array); (iii) bondtype_adj_matrix (np.array)

Return type

graph3d_lst

tdc.chem_utils.featurize.molconvert.molfile2PyG(molfile)[source]
tdc.chem_utils.featurize.molconvert.molfile2smiles(molfile)[source]

convert molfile into SMILES string

Parameters

molfile – str, a file.

Returns

str, SMILES strings

Return type

smiles

tdc.chem_utils.featurize.molconvert.onek_encoding_unk(x, allowable_set)[source]
tdc.chem_utils.featurize.molconvert.sdffile2coulomb(sdf)[source]

convert sdffile into a list of coulomb feature.

Parameters

sdffile – str, file

Returns

np.array

Return type

coulomb feature

tdc.chem_utils.featurize.molconvert.sdffile2graph3d_lst(sdffile)[source]

convert SDF file into a list of 3D graph.

Parameters

sdffile – SDF file

Returns

a list of 3D graph.

each graph has (i) idx2atom (dict); (ii) distance_adj_matrix (np.array); (iii) bondtype_adj_matrix (np.array)

Return type

graph3d_lst

tdc.chem_utils.featurize.molconvert.sdffile2mol_conformer(sdffile)[source]

convert sdffile into a list of molecule conformers.

Parameters

sdffile – str, file

Returns

a list of molecule conformers.

Return type

smiles_lst

tdc.chem_utils.featurize.molconvert.sdffile2selfies_lst(sdf)[source]

convert sdffile into a list of SELFIES strings.

Parameters

sdffile – str, file

Returns

a list of SELFIES strings.

Return type

selfies_lst

tdc.chem_utils.featurize.molconvert.sdffile2smiles_lst(sdffile)[source]

convert SDF file into a list of SMILES string.

Parameters

sdffile – str, file

Returns

a list of SMILES strings.

Return type

smiles_lst

tdc.chem_utils.featurize.molconvert.selfies2smiles(selfies)[source]

Convert selfies into smiles.

Parameters

selfies – str, a SELFIES string.

Returns

str, a SMILES string

Return type

smiles

tdc.chem_utils.featurize.molconvert.smiles2DGL(smiles)[source]

convert SMILES string into dgl.DGLGraph

Parameters
  • smiles

  • str

  • string (a SMILES) –

Returns

dgl.DGLGraph()

Return type

g

tdc.chem_utils.featurize.molconvert.smiles2ECFP2(smiles)[source]

Convert smiles into ECFP2 Morgan Fingerprint.

Parameters

smiles – str

Returns

rdkit.DataStructs.cDataStructs.UIntSparseIntVect

Return type

fp

tdc.chem_utils.featurize.molconvert.smiles2ECFP4(smiles)[source]

Convert smiles into ECFP4 Morgan Fingerprint.

Parameters

smiles – str

Returns

rdkit.DataStructs.cDataStructs.UIntSparseIntVect

Return type

fp

tdc.chem_utils.featurize.molconvert.smiles2ECFP6(smiles)[source]

Convert smiles into ECFP6 Morgan Fingerprint.

Parameters

smiles – str, a SMILES string

Returns

rdkit.DataStructs.cDataStructs.UIntSparseIntVect

Return type

fp

tdc.chem_utils.featurize.molconvert.smiles2PyG(smiles)[source]

convert SMILES string into torch_geometric.data.Data

Parameters
  • smiles

  • str

  • string (a SMILES) –

Returns

data, torch_geometric.data.Data

tdc.chem_utils.featurize.molconvert.smiles2daylight(s)[source]

Convert smiles into 2048-dim Daylight feature.

Parameters

smiles – str

Returns

numpy.array

Return type

fp

tdc.chem_utils.featurize.molconvert.smiles2graph2D(smiles)[source]

convert SMILES string into two-dimensional molecular graph feature

Parameters
  • smiles

  • str

  • string (a SMILES) –

Returns

dict, map from index to atom’s symbol, e.g., {0:’C’, 1:’N’, …} adj_matrix: np.array

Return type

idx2atom

tdc.chem_utils.featurize.molconvert.smiles2maccs(s)[source]

Convert smiles into maccs feature.

Parameters

smiles – str

Returns

numpy.array

Return type

fp

tdc.chem_utils.featurize.molconvert.smiles2mol(smiles)[source]

Convert SMILES string into rdkit.Chem.rdchem.Mol.

Parameters

smiles – str, a SMILES string.

Returns

rdkit.Chem.rdchem.Mol

Return type

mol

tdc.chem_utils.featurize.molconvert.smiles2morgan(s, radius=2, nBits=1024)[source]

Convert smiles into Morgan Fingerprint.

Parameters
  • smiles – str

  • radius – int (default: 2)

  • nBits – int (default: 1024)

Returns

numpy.array

Return type

fp

tdc.chem_utils.featurize.molconvert.smiles2rdkit2d(s)[source]

Convert smiles into 200-dim Normalized RDKit 2D vector.

Parameters

smiles – str

Returns

numpy.array

Return type

fp

tdc.chem_utils.featurize.molconvert.smiles2selfies(smiles)[source]

Convert smiles into selfies.

Parameters

smiles – str, a SMILES string

Returns

str, a SELFIES string.

Return type

selfies

tdc.chem_utils.featurize.molconvert.smiles_lst2coulomb(smiles_lst)[source]

convert a list of SMILES strings into coulomb format.

Parameters

smiles_lst – a list of SELFIES strings.

Returns

np.array

Return type

features

tdc.chem_utils.featurize.molconvert.upper_atom(atomsymbol)[source]
tdc.chem_utils.featurize.molconvert.xyzfile2coulomb(xyzfile)[source]
tdc.chem_utils.featurize.molconvert.xyzfile2graph3d(xyzfile)[source]
tdc.chem_utils.featurize.molconvert.xyzfile2selfies(xyzfile)[source]

convert xyzfile into SELFIES string.

Parameters

xyzfile – str, file

Returns

str, a SELFIES string.

Return type

selfies

tdc.chem_utils.featurize.molconvert.xyzfile2smiles(xyzfile)[source]

convert xyzfile into smiles string.

Parameters

xyzfile – str, file

Returns

str, a SMILES string

Return type

smiles

tdc.chem_utils.oracle module

tdc.chem_utils.oracle.filter submodule

class tdc.chem_utils.oracle.filter.MolFilter(filters='all', property_filters_flag=True, HBA=[0, 10], HBD=[0, 5], LogP=[- 5, 5], MW=[0, 500], Rot=[0, 10], TPSA=[0, 200])[source]

Bases: object

Molecule Filter: filter Molecule based on user-specified condition

Parameters
  • filters

  • property_filters_flag – bool,

  • HBA – [lower_bound, upper_bound]

  • HBD – [lower_bound, upper_bound]

  • LogP – [lower_bound, upper_bound]

  • MW – [lower_bound, upper_bound], Molecule weight

  • Rot – [lower_bound, upper_bound]

  • TPSA – [lower_bound, upper_bound]

Returns

list of SMILES strings that pass the filter.

tdc.chem_utils.oracle.oracle submodule

class tdc.chem_utils.oracle.oracle.AbsoluteScoreModifier(target_value: float)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Score modifier that has a maximum at a given target value, and decreases linearly with increasing distance from the target value.

class tdc.chem_utils.oracle.oracle.AtomCounter(element)[source]

Bases: object

class tdc.chem_utils.oracle.oracle.ChainedModifier(modifiers: List[tdc.chem_utils.oracle.oracle.ScoreModifier])[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Calls several modifiers one after the other, for instance:

score = modifier3(modifier2(modifier1(raw_score)))

class tdc.chem_utils.oracle.oracle.ClippedScoreModifier(upper_x: float, lower_x=0.0, high_score=1.0, low_score=0.0)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Clips a score between specified low and high scores, and does a linear interpolation in between.

This class works as follows: First the input is mapped onto a linear interpolation between both specified points. Then the generated values are clipped between low and high scores.

class tdc.chem_utils.oracle.oracle.GaussianModifier(mu: float, sigma: float)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Score modifier that reproduces a Gaussian bell shape.

class tdc.chem_utils.oracle.oracle.Isomer_scoring(target_smiles, means='geometric')[source]

Bases: object

class tdc.chem_utils.oracle.oracle.LinearModifier(slope=1.0)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Score modifier that multiplies the score by a scalar (default: 1, i.e. do nothing).

class tdc.chem_utils.oracle.oracle.MPO_meta(means)[source]

Bases: object

class tdc.chem_utils.oracle.oracle.MinMaxGaussianModifier(mu: float, sigma: float, minimize=False)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Score modifier that reproduces a half Gaussian bell shape. For minimize==True, the function is 1.0 for x <= mu and decreases to zero for x > mu. For minimize==False, the function is 1.0 for x >= mu and decreases to zero for x < mu.

class tdc.chem_utils.oracle.oracle.PyScreener_meta(receptor_pdb_file, box_center, box_size, software_class='vina', ncpu=4, **kwargs)[source]

Bases: object

Evaluate docking score

Args:

Return:

tdc.chem_utils.oracle.oracle.SA(s)[source]

Evaluate SA score of a SMILES string

Parameters

smiles – str

Returns

float

Return type

SAscore

class tdc.chem_utils.oracle.oracle.SMARTS_scoring(target_smarts, inverse)[source]

Bases: object

class tdc.chem_utils.oracle.oracle.ScoreModifier[source]

Bases: object

Interface for score modifiers.

class tdc.chem_utils.oracle.oracle.Score_3d(receptor_pdbqt_file, center, box_size, scorefunction='vina')[source]

Bases: object

Evaluate Vina score (force field) for a conformer binding to a receptor

class tdc.chem_utils.oracle.oracle.SmoothClippedScoreModifier(upper_x: float, lower_x=0.0, high_score=1.0, low_score=0.0)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Smooth variant of ClippedScoreModifier.

Implemented as a logistic function that has the same steepness as ClippedScoreModifier in the center of the logistic function.

class tdc.chem_utils.oracle.oracle.SquaredModifier(target_value: float, coefficient=1.0)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Score modifier that has a maximum at a given target value, and decreases quadratically with increasing distance from the target value.

class tdc.chem_utils.oracle.oracle.ThresholdedLinearModifier(threshold: float)[source]

Bases: tdc.chem_utils.oracle.oracle.ScoreModifier

Returns a value of min(input, threshold)/threshold.

class tdc.chem_utils.oracle.oracle.Vina_3d(receptor_pdbqt_file, center, box_size, scorefunction='vina')[source]

Bases: object

Perform docking search from a conformer.

class tdc.chem_utils.oracle.oracle.Vina_smiles(receptor_pdbqt_file, center, box_size, scorefunction='vina')[source]

Bases: object

Perform docking search from a conformer.

tdc.chem_utils.oracle.oracle.amlodipine_mpo(test_smiles)[source]
tdc.chem_utils.oracle.oracle.askcos(smiles, host_ip, output='plausibility', save_json=False, file_name='tree_builder_result.json', num_trials=5, max_depth=9, max_branching=25, expansion_time=60, max_ppg=100, template_count=1000, max_cum_prob=0.999, chemical_property_logic='none', max_chemprop_c=0, max_chemprop_n=0, max_chemprop_o=0, max_chemprop_h=0, chemical_popularity_logic='none', min_chempop_reactants=5, min_chempop_products=5, filter_threshold=0.1, return_first='true')[source]

The ASKCOS retrosynthetic analysis oracle function. Please refer https://github.com/connorcoley/ASKCOS to run the ASKCOS with docker on a server to receive requests.

tdc.chem_utils.oracle.oracle.calculateScore(m)[source]
tdc.chem_utils.oracle.oracle.cyp3a4_veith(smiles)[source]
tdc.chem_utils.oracle.oracle.deco_hop(test_smiles)[source]
tdc.chem_utils.oracle.oracle.drd2(smile)[source]

Evaluate DRD2 score of a SMILES string

Parameters

smiles – str

Returns

float

Return type

drd_score

tdc.chem_utils.oracle.oracle.fexofenadine_mpo(test_smiles)[source]
tdc.chem_utils.oracle.oracle.fingerprints_from_mol(mol)[source]
tdc.chem_utils.oracle.oracle.get_PHCO_fingerprint(mol)[source]
tdc.chem_utils.oracle.oracle.gsk3b(smiles)[source]

Evaluate GSK3B score of a SMILES string

Parameters

smiles – str

Returns

float, between 0 and 1.

Return type

gsk3_score

tdc.chem_utils.oracle.oracle.ibm_rxn(smiles, api_key, output='confidence', sleep_time=30)[source]

This function is modified from Dr. Jan Jensen’s code

tdc.chem_utils.oracle.oracle.isomer_meta(target_smiles, means='geometric')[source]
class tdc.chem_utils.oracle.oracle.jnk3[source]

Bases: object

Evaluate JSK3 score of a SMILES string

Parameters

smiles – str

Returns

float , between 0 and 1.

Return type

jnk3_score

tdc.chem_utils.oracle.oracle.load_cyp3a4_veith()[source]
tdc.chem_utils.oracle.oracle.load_drd2_model()[source]
tdc.chem_utils.oracle.oracle.load_gsk3b_model()[source]
class tdc.chem_utils.oracle.oracle.median_meta(target_smiles_1, target_smiles_2, fp1='ECFP6', fp2='ECFP6', modifier_func1=None, modifier_func2=None, means='geometric')[source]

Bases: object

class tdc.chem_utils.oracle.oracle.molecule_one_retro(api_token)[source]

Bases: object

tdc.chem_utils.oracle.oracle.numBridgeheadsAndSpiro(mol, ri=None)[source]
tdc.chem_utils.oracle.oracle.osimertinib_mpo(test_smiles)[source]
tdc.chem_utils.oracle.oracle.parse_molecular_formula(formula)[source]

Parse a molecular formulat to get the element types and counts.

Parameters

formula – molecular formula, f.i. “C8H3F3Br”

Returns

A list of tuples containing element types and number of occurrences.

tdc.chem_utils.oracle.oracle.penalized_logp(s)[source]

Evaluate LogP score of a SMILES string

Parameters

smiles – str

Returns

float, between - infinity and + infinity

Return type

logp_score

tdc.chem_utils.oracle.oracle.perindopril_mpo(test_smiles)[source]
tdc.chem_utils.oracle.oracle.qed(smiles)[source]

Evaluate QED score of a SMILES string

Parameters

smiles – str

Returns

float, between 0 and 1.

Return type

qed_score

tdc.chem_utils.oracle.oracle.ranolazine_mpo(test_smiles)[source]
tdc.chem_utils.oracle.oracle.readFragmentScores(name='fpscores')[source]
class tdc.chem_utils.oracle.oracle.rediscovery_meta(target_smiles, fp='ECFP4')[source]

Bases: object

tdc.chem_utils.oracle.oracle.scaffold_hop(test_smiles)[source]
tdc.chem_utils.oracle.oracle.similarity(smiles_a, smiles_b)[source]

Evaluate Tanimoto similarity between 2 SMILES strings

Parameters
  • smiles_a – str, SMILES string

  • smiles_b – str, SMILES string

Returns

float, between 0 and 1.

Return type

similarity score

class tdc.chem_utils.oracle.oracle.similarity_meta(target_smiles, fp='FCFP4', modifier_func=None)[source]

Bases: object

tdc.chem_utils.oracle.oracle.sitagliptin_mpo(test_smiles)[source]
tdc.chem_utils.oracle.oracle.smiles_2_fingerprint_AP(smiles)[source]

Convert smiles into Atom Pair Fingerprint.

Parameters

smiles – str, SMILES string.

Returns

rdkit.DataStructs.cDataStructs.IntSparseIntVect

Return type

fp

tdc.chem_utils.oracle.oracle.smiles_2_fingerprint_ECFP4(smiles)[source]

Convert smiles into ECFP4 Morgan Fingerprint.

Parameters

smiles – str, SMILES string.

Returns

rdkit.DataStructs.cDataStructs.UIntSparseIntVect

Return type

fp

tdc.chem_utils.oracle.oracle.smiles_2_fingerprint_ECFP6(smiles)[source]

Convert smiles into ECFP6 Fingerprint.

Parameters

smiles – str, SMILES string.

Returns

rdkit.DataStructs.cDataStructs.UIntSparseIntVect

Return type

fp

tdc.chem_utils.oracle.oracle.smiles_2_fingerprint_FCFP4(smiles)[source]

Convert smiles into FCFP4 Morgan Fingerprint.

Parameters

smiles – str, SMILES string.

Returns

rdkit.DataStructs.cDataStructs.UIntSparseIntVect

Return type

fp

tdc.chem_utils.oracle.oracle.smiles_to_rdkit_mol(smiles)[source]

Convert smiles into rdkit’s mol (molecule) format.

Parameters

smiles – str, SMILES string.

Returns

rdkit.Chem.rdchem.Mol

Return type

mol

tdc.chem_utils.oracle.oracle.tree_analysis(current)[source]

Analyze the result of tree builder Calculate: 1. Number of steps 2. Pi plausibility 3. If find a path In case of celery error, all values are -1

Returns

num_path = number of paths found status: Same as implemented in ASKCOS one num_step: number of steps p_score: Pi plausibility synthesizability: binary code price: price for synthesize query compound

tdc.chem_utils.oracle.oracle.valsartan_smarts(test_smiles)[source]
tdc.chem_utils.oracle.oracle.zaleplon_mpo(test_smiles)[source]

tdc.chem_utils.evaluator module

tdc.chem_utils.evaluator.calculate_internal_pairwise_similarities(smiles_list)[source]

Computes the pairwise similarities of the provided list of smiles against itself.

Parameters

smiles_list – list of str

Returns

Symmetric matrix of pairwise similarities. Diagonal is set to zero.

tdc.chem_utils.evaluator.calculate_pc_descriptors(smiles, pc_descriptors)[source]

Calculate Physical Chemical descriptors of a list of molecules.

Parameters
  • list_of_smiles – list of SMILES strings

  • pc_descriptors – list of strings, names of descriptors to calculate

Returns

list of float

Return type

descriptros

tdc.chem_utils.evaluator.canonicalize(smiles)[source]

Convert SMILES into canonical form.

Parameters

smiles – str, SMILES string

Returns

str, canonical SMILES string.

Return type

smiles

tdc.chem_utils.evaluator.continuous_kldiv(X_baseline: numpy.array, X_sampled: numpy.array) float[source]

calculate KL divergence for two numpy arrays, conitnuous version.

Parameters
  • X_baseline – numpy array

  • X_sampled – numpy array

Returns

float

Return type

KL divergence

tdc.chem_utils.evaluator.discrete_kldiv(X_baseline: numpy.array, X_sampled: numpy.array) float[source]

calculate KL divergence for two numpy arrays, discrete version.

Parameters
  • X_baseline – numpy array

  • X_sampled – numpy array

Returns

float

Return type

KL divergence

tdc.chem_utils.evaluator.diversity(list_of_smiles)[source]
Evaluate the internal diversity of a set of molecules. The internbal diversity is defined as the average pairwise

Tanimoto distance between the Morgan fingerprints.

Parameters

list_of_smiles – list of SMILES strings

Returns

float

Return type

div

tdc.chem_utils.evaluator.fcd_distance(generated_smiles_lst, training_smiles_lst)[source]

Evaluate FCD distance between generated smiles set and training smiles set.

Parameters
  • generated_smiles_lst – list (of SMILES string), which are generated.

  • training_smiles_lst – list (of SMILES string), which are used for training.

Returns

float

Return type

fcd_distance

tdc.chem_utils.evaluator.fcd_distance_tf(generated_smiles_lst, training_smiles_lst)[source]

Evaluate FCD distance between generated smiles set and training smiles set using tensorflow.

Parameters
  • generated_smiles_lst – list (of SMILES string), which are generated.

  • training_smiles_lst – list (of SMILES string), which are used for training.

Returns

float

Return type

fcd_distance

tdc.chem_utils.evaluator.fcd_distance_torch(generated_smiles_lst, training_smiles_lst)[source]

Evaluate FCD distance between generated smiles set and training smiles set using PyTorch.

Parameters
  • generated_smiles_lst – list (of SMILES string), which are generated.

  • training_smiles_lst – list (of SMILES string), which are used for training.

Returns

float

Return type

fcd_distance

tdc.chem_utils.evaluator.get_fingerprints(mols, radius=2, length=4096)[source]

Converts molecules to ECFP bitvectors.

Parameters
  • mols – RDKit molecules

  • radius – ECFP fingerprint radius

  • length – number of bits

Returns: a list of fingerprints

tdc.chem_utils.evaluator.get_mols(smiles_list)[source]

Convert SMILES strings to RDKit RDMol objects.

Parameters

list_of_smiles – list of SMILES strings

Returns

list of RDKit RDMol objects

Return type

mols

tdc.chem_utils.evaluator.kl_divergence(generated_smiles_lst, training_smiles_lst)[source]

Evaluate the KL divergence of set of generated smiles using list of training smiles as reference. KL divergence is defined as the averaged KL divergence of a set of physical chemical descriptors between a set of generated molecules and a set of training molecules.

Parameters
  • generated_smiles_lst – list (of SMILES string), which are generated.

  • training_smiles_lst – list (of SMILES string), which are used for training.

Returns

float

Return type

KL divergence

tdc.chem_utils.evaluator.novelty(generated_smiles_lst, training_smiles_lst)[source]

Evaluate the novelty of set of generated smiles using list of training smiles as reference. Novelty is defined as the fraction of generated molecules that doesn’t appear in the training set.

Parameters
  • generated_smiles_lst – list (of SMILES string), which are generated.

  • training_smiles_lst – list (of SMILES string), which are used for training.

Returns

float

Return type

novelty

tdc.chem_utils.evaluator.single_molecule_validity(smiles)[source]

Evaluate the chemical validity of a single molecule in terms of SMILES string

Parameters

smiles – str, SMILES string.

Returns

if the SMILES string is a valid molecule

Return type

Boolean

tdc.chem_utils.evaluator.unique_lst_of_smiles(list_of_smiles)[source]
tdc.chem_utils.evaluator.uniqueness(list_of_smiles)[source]

Evaluate the uniqueness of a list of SMILES string, i.e., the fraction of unique molecules among a given list.

Parameters

list_of_smiles – list (of SMILES string)

Returns

float

Return type

uniqueness

tdc.chem_utils.evaluator.validity(list_of_smiles)[source]