tdc.generation

tdc.generation.generation_dataset module

class tdc.generation.generation_dataset.DataLoader(name, path, print_stats, column_name)[source]

Bases: tdc.base_dataset.DataLoader

A base dataset loader class.

dataset_names

name of the dataset.

Type

str

name

The name fo the dataset.

Type

str

path

the path to save the data file.

Type

str

smiles_lst

a list of smiles strings as training data for distribution learning.

Type

list

get_data(format='df')[source]

Return the data from the whole dataset.

Parameters

format (str, optional) – the desired format for molecular data.

Returns

a dataframe of the dataset/a distionary for information

Return type

pandas DataFrame/dict

Raises

AttributeError – Use the correct format as input (df, dict)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]

Return the data splitted as train, valid, test sets.

Parameters
  • method (str) – splitting schemes: random, scaffold

  • seed (int) – random seed, default 42

  • frac (list of float) – ratio of train/val/test split

Returns

a dataframe of the dataset

Return type

pandas DataFrame/dict

Raises

AttributeError – Use the correct split method as input (random, scaffold)

print_stats()[source]

Print the basic statistics of the dataset.

class tdc.generation.generation_dataset.DataLoader3D(name, path, print_stats, dataset_names, column_name)[source]

Bases: tdc.base_dataset.DataLoader

A basic class for generation of 3D biomedical entities. (under construction)

df

the dataset in pandas DataFrame format.

Type

str

name

the name of the dataset.

Type

str

path

the path to save the data file.

Type

str

get_data(format='df', more_features='None')[source]

Return the data from the whole dataset.

Parameters
  • format (str, optional) – the desired format for molecular data.

  • more_features (str, optional) – 3D feature format, choose from [Graph3D, Coulumb]

Returns

a dataframe of the dataset/a distionary for information

Return type

pandas DataFrame/dict

Raises
  • AttributeError – Use the correct format as input (df, dict)

  • ImportError – Please install rdkit by ‘conda install -c conda-forge rdkit’

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]

Return the data splitted as train, valid, test sets.

Parameters
  • method (str) – splitting schemes: random, scaffold

  • seed (int) – random seed, default 42

  • frac (list of float) – ratio of train/val/test split

Returns

a dataframe of the dataset

Return type

pandas DataFrame/dict

Raises

AttributeError – Use the correct split method as input (random, scaffold)

print_stats()[source]

Print the basic statistics of the dataset.

class tdc.generation.generation_dataset.PairedDataLoader(name, path, print_stats, input_name, output_name)[source]

Bases: tdc.base_dataset.DataLoader

A basic class for generation of biomedical entities conditioned on other entities, such as reaction prediction.

dataset_names

the name fo the dataset.

Type

str

name

the name of the dataset.

Type

str

path

the path to save the data file.

Type

str

get_data(format='df')[source]

Return the data from the whole dataset.

Parameters

format (str, optional) – the desired format for molecular data.

Returns

a dataframe of the dataset/a distionary for information

Return type

pandas DataFrame/dict

Raises

AttributeError – Use the correct format as input (df, dict)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]

Return the data splitted as train, valid, test sets.

Parameters
  • method (str) – splitting schemes: random, scaffold

  • seed (int) – random seed, default 42

  • frac (list of float) – ratio of train/val/test split

Returns

a dataframe of the dataset

Return type

pandas DataFrame/dict

Raises

AttributeError – Use the correct split method as input (random, scaffold)

print_stats()[source]

Print the statistics of the dataset.

tdc.generation.molgen module

class tdc.generation.molgen.MolGen(name, path='./data', print_stats=False, column_name='smiles')[source]

Bases: tdc.generation.generation_dataset.DataLoader

Data loader class accessing to molecular generation task (distribution learning)

tdc.generation.reaction module

class tdc.generation.reaction.Reaction(name, path='./data', print_stats=False, input_name='reactant', output_name='product')[source]

Bases: tdc.generation.generation_dataset.PairedDataLoader

Data loader class accessing to forward reaction prediction task.

tdc.generation.retrosyn module

class tdc.generation.retrosyn.RetroSyn(name, path='./data', print_stats=False, input_name='product', output_name='reactant')[source]

Bases: tdc.generation.generation_dataset.PairedDataLoader

Data loader class accessing to retro-synthetic prediction task.