tdc.generation#

tdc.generation.generation_dataset module#

class tdc.generation.generation_dataset.DataLoader(name, path, print_stats, column_name)[source]#

Bases: DataLoader

A base dataset loader class.

dataset_names#

name of the dataset.

Type:

str

name#

The name fo the dataset.

Type:

str

path#

the path to save the data file.

Type:

str

smiles_lst#

a list of smiles strings as training data for distribution learning.

Type:

list

get_data(format='df')[source]#

Return the data from the whole dataset.

Parameters:

format (str, optional) – the desired format for molecular data.

Returns:

a dataframe of the dataset/a distionary for information

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct format as input (df, dict)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]#

Return the data splitted as train, valid, test sets.

Parameters:
  • method (str) – splitting schemes: random, scaffold

  • seed (int) – random seed, default 42

  • frac (list of float) – ratio of train/val/test split

Returns:

a dataframe of the dataset

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct split method as input (random, scaffold)

print_stats()[source]#

Print the basic statistics of the dataset.

class tdc.generation.generation_dataset.DataLoader3D(name, path, print_stats, dataset_names, column_name)[source]#

Bases: DataLoader

A basic class for generation of 3D biomedical entities. (under construction)

df#

the dataset in pandas DataFrame format.

Type:

str

name#

the name of the dataset.

Type:

str

path#

the path to save the data file.

Type:

str

get_data(format='df', more_features='None')[source]#

Return the data from the whole dataset.

Parameters:
  • format (str, optional) – the desired format for molecular data.

  • more_features (str, optional) – 3D feature format, choose from [Graph3D, Coulumb]

Returns:

a dataframe of the dataset/a distionary for information

Return type:

pandas DataFrame/dict

Raises:
  • AttributeError – Use the correct format as input (df, dict)

  • ImportError – Please install rdkit by ‘conda install -c conda-forge rdkit’

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]#

Return the data splitted as train, valid, test sets.

Parameters:
  • method (str) – splitting schemes: random, scaffold

  • seed (int) – random seed, default 42

  • frac (list of float) – ratio of train/val/test split

Returns:

a dataframe of the dataset

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct split method as input (random, scaffold)

print_stats()[source]#

Print the basic statistics of the dataset.

class tdc.generation.generation_dataset.PairedDataLoader(name, path, print_stats, input_name, output_name)[source]#

Bases: DataLoader

A basic class for generation of biomedical entities conditioned on other entities, such as reaction prediction.

dataset_names#

the name fo the dataset.

Type:

str

name#

the name of the dataset.

Type:

str

path#

the path to save the data file.

Type:

str

get_data(format='df')[source]#

Return the data from the whole dataset.

Parameters:

format (str, optional) – the desired format for molecular data.

Returns:

a dataframe of the dataset/a distionary for information

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct format as input (df, dict)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]#

Return the data splitted as train, valid, test sets.

Parameters:
  • method (str) – splitting schemes: random, scaffold

  • seed (int) – random seed, default 42

  • frac (list of float) – ratio of train/val/test split

Returns:

a dataframe of the dataset

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct split method as input (random, scaffold)

print_stats()[source]#

Print the statistics of the dataset.

tdc.generation.molgen module#

class tdc.generation.molgen.MolGen(name, path='./data', print_stats=False, column_name='smiles')[source]#

Bases: DataLoader

Data loader class accessing to molecular generation task (distribution learning)

tdc.generation.reaction module#

class tdc.generation.reaction.Reaction(name, path='./data', print_stats=False, input_name='reactant', output_name='product')[source]#

Bases: PairedDataLoader

Data loader class accessing to forward reaction prediction task.

tdc.generation.retrosyn module#

class tdc.generation.retrosyn.RetroSyn(name, path='./data', print_stats=False, input_name='product', output_name='reactant')[source]#

Bases: PairedDataLoader

Data loader class accessing to retro-synthetic prediction task.

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2], include_reaction_type=False)[source]#

Return the data splitted as train, valid, test sets.

Parameters:
  • method (str) – splitting schemes: random, scaffold

  • seed (int) – random seed, default 42

  • frac (list of float) – ratio of train/val/test split

  • include_reaction_type (bool) – whether or not to include reaction type in the split

Returns:

a dataframe of the dataset

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct split method as input (random, scaffold)