tdc.single_pred

tdc.single_pred.single_pred_dataset module

class tdc.single_pred.single_pred_dataset.DataLoader(name, path, label_name, print_stats, dataset_names, convert_format)[source]

Bases: tdc.base_dataset.DataLoader

A base data loader class.

Parameters
  • name (str) – the dataset name.

  • path (str) – The path to save the data file

  • label_name (str) – For multi-label dataset, specify the label name

  • print_stats (bool) – Whether to print basic statistics of the dataset

  • dataset_names (list) – A list of dataset names available for a task

  • convert_format (str) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe

convert_format

conversion format of an entity

Type

str

convert_result

a placeholder for a list of conversion outputs

Type

list

entity1

a list of the single entites

Type

Pandas Series

entity1_idx

a list of the single entites index

Type

Pandas Series

entity1_name

a list of the single entites names

Type

Pandas Series

file_format

the format of the downloaded dataset

Type

str

label_name

for multi-label dataset, the label name of interest

Type

str

name

dataset name

Type

str

path

path to save and retrieve the dataset

Type

str

y

a list of the single entities label

Type

Pandas Series

get_data(format='df')[source]
Parameters

format (str, optional) – the returning dataset format, defaults to ‘df’

Returns

a dataframe of a dataset/a dictionary for key information in the dataset

Return type

pandas DataFrame/dict

Raises

AttributeError – Use the correct format input (df, dict, DeepPurpose)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]
Parameters
  • method – splitting schemes, choose from random, cold_{entity}, scaffold, defaults to ‘random’

  • seed – the random seed for splitting dataset, defaults to ‘42’

  • frac – train/val/test split fractions, defaults to ‘[0.7, 0.1, 0.2]’

Returns

a dictionary with three keys (‘train’, ‘valid’, ‘test’), each value is a pandas dataframe object of the splitted dataset

Return type

dict

Raises

AttributeError – the input split method is not available.

print_stats()[source]

Print basic data statistics.

tdc.single_pred.adme module

class tdc.single_pred.adme.ADME(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in ADME task. More info: https://tdcommons.ai/single_pred_tasks/adme/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.crispr_outcome module

class tdc.single_pred.crispr_outcome.CRISPROutcome(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in CRISPROutcome task. More info: https://tdcommons.ai/single_pred_tasks/CRISPROutcome/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.develop module

class tdc.single_pred.develop.Develop(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in Develop task. More info: https://tdcommons.ai/single_pred_tasks/develop/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.epitope module

class tdc.single_pred.epitope.Epitope(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in Epitope Prediction task. More info: https://tdcommons.ai/single_pred_tasks/epitope/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.hts module

class tdc.single_pred.hts.HTS(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in HTS task. More info: https://tdcommons.ai/single_pred_tasks/hts/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.paratope module

class tdc.single_pred.paratope.Paratope(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in Paratope Prediction task. More info: https://tdcommons.ai/single_pred_tasks/paratope/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.qm module

class tdc.single_pred.qm.QM(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in QM (Quantum Mechanics Modeling) task. More info: https://tdcommons.ai/single_pred_tasks/qm/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.test_single_pred module

class tdc.single_pred.test_single_pred.TestSinglePred(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to test the single instance prediction data loader.

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.tox module

class tdc.single_pred.tox.Tox(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in Tox (Toxicity Prediction) task. More info: https://tdcommons.ai/single_pred_tasks/tox/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.yields module

class tdc.single_pred.yields.Yields(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]

Bases: tdc.single_pred.single_pred_dataset.DataLoader

Data loader class to load datasets in Yields (Reaction Yields Prediction) task. More info: https://tdcommons.ai/single_pred_tasks/yields/

Parameters
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None