tdc.benchmark_group#

tdc.benchmark_group.base_group module#

class tdc.benchmark_group.base_group.BenchmarkGroup(name, path='./data', file_format='csv')[source]#

Bases: object

Boilerplate of benchmark group class. It downloads, processes, and loads a set of benchmark classes along with their splits. It also provides evaluators and train/valid splitters.

evaluate(pred, testing=True, benchmark=None, save_dict=True)[source]#

automatic evaluation function

Parameters:
  • pred (dict) – a dictionary of benchmark name as the key and prediction array as the value

  • testing (bool, optional) – evaluate using testing set mode or validation set mode

  • benchmark (str, optional) – name of the benchmark

  • save_dict (bool, optional) – whether or not to save the evaluation result

Returns:

a dictionary with key the benchmark name and value a dictionary of metrics to metric value

Return type:

dict

Raises:

ValueError – benchmark name not found

evaluate_many(preds, save_file_name=None, results_individual=None)[source]#

This function returns the data in a format needed to submit to the Leaderboard

Parameters:
  • preds (list of dict) – list of dictionary of predictions, each item is the input to the evaluate function.

  • save_file_name (str, optional) – file name to save the result

  • results_individual (list of dictionary, optional) – if you already have results generated for each run, simply input here so that this function won’t call the evaluation function again

Returns:

a dictionary where key is the benchmark name and value is another dictionary where the key is the metric name and value is a list [mean, std].

Return type:

dict

get(benchmark)[source]#

get individual benchmark

Parameters:

benchmark (str) – benchmark name

Returns:

a dictionary of train_val, test dataframes and normalized name of the benchmark

Return type:

dict

get_train_valid_split(seed, benchmark, split_type='default')[source]#

obtain training and validation split given a split type from train_val file

Parameters:
  • seed (int) – the random seed of the data split

  • benchmark (str) – name of the benchmark

  • split_type (str, optional) – name of the split

Returns:

the training and validation files

Return type:

pd.DataFrame

Raises:

NotImplementedError – split method not implemented

tdc.benchmark_group.admet_group module#

class tdc.benchmark_group.admet_group.admet_group(path='./data')[source]#

Bases: BenchmarkGroup

Create ADMET Group Class object.

Parameters:

path (str, optional) – the path to store/retrieve the ADMET group datasets.

tdc.benchmark_group.docking_group module#

class tdc.benchmark_group.docking_group.docking_group(path='./data', num_workers=None, num_cpus=None, num_max_call=5000)[source]#

Bases: BenchmarkGroup

Create a docking group benchmark loader.

Parameters:
  • path (str, optional) – the folder path to save/load the benchmarks.

  • pyscreener_path (str, optional) – the path to pyscreener repository in order to call docking scores.

  • num_workers (int, optional) – number of workers to parallelize dockings

  • num_cpus (int, optional) – number of CPUs assigned to docking

  • num_max_call (int, optional) – maximum number of oracle calls

evaluate(pred, true=None, benchmark=None, m1_api=None, save_dict=True)[source]#

Summary

Parameters:
  • pred (dict) – a nested dictionary, where the first level key is the docking target, the value is another dictionary where the key is the maximum oracle calls, and value can have two options. One, a dictionary of SMILES paired up with the docking scores and Second, a list of SMILES strings, where the function will generate the docking scores automatically.

  • benchmark (str, optional) – name of the benchmark docking target.

  • m1_api (str, optional) – API token of Molecule.One. This is to use M1 service to generate synthesis score.

  • save_dict (bool, optional) – whether or not to save the results.

Returns:

result with all realistic metrics generated

Return type:

dict

Raises:

ValueError – Description

evaluate_many(preds, save_file_name=None, m1_api=None, results_individual=None)[source]#

evaluate many runs together and output submission ready pkl file.

Parameters:
  • preds (list) – a list of pred across runs, where each follows the format of pred in ‘evaluate’ function.

  • save_file_name (str, optional) – the name of the file to save the result.

  • m1_api (str, optional) – m1 API token for molecule synthesis score.

  • results_individual (list, optional) – if you already have generated the result from the evaluate function for each run, simply put in a list and it will not regenerate the results.

Returns:

the output result file.

Return type:

dict

get(benchmark, num_max_call=5000)[source]#

retrieve one benchmark given benchmark name (docking target)

Parameters:
  • benchmark (str) – the name of the benchmark

  • num_max_call (int, optional) – maximum of oracle calls

Returns:

a dictionary of oracle function, molecule library dataset, and the name of docking target

Return type:

dict

get_train_valid_split(seed, benchmark, split_type='default')[source]#

no split for docking group

Raises:

ValueError – no split for docking group

tdc.benchmark_group.drugcombo_group module#

class tdc.benchmark_group.drugcombo_group.drugcombo_group(path='./data')[source]#

Bases: BenchmarkGroup

create a drug combination benchmark group

Parameters:

path (str, optional) – path to save/load benchmarks

get_cell_line_meta_data()[source]#

tdc.benchmark_group.dti_dg_group module#

class tdc.benchmark_group.dti_dg_group.dti_dg_group(path='./data')[source]#

Bases: BenchmarkGroup

Create a DTI domain generalization benchmark group

Parameters:

path (str, optional) – path to save/load benchmarks