tdc.single_pred#

tdc.single_pred.single_pred_dataset module#

class tdc.single_pred.single_pred_dataset.DataLoader(name, path, label_name, print_stats, dataset_names, convert_format, raw_format='SMILES')[source]#

Bases: DataLoader

A base data loader class.

Parameters:
  • name (str) – the dataset name.

  • path (str) – The path to save the data file

  • label_name (str) – For multi-label dataset, specify the label name

  • print_stats (bool) – Whether to print basic statistics of the dataset

  • dataset_names (list) – A list of dataset names available for a task

  • convert_format (str) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe

convert_format#

conversion format of an entity

Type:

str

convert_result#

a placeholder for a list of conversion outputs

Type:

list

entity1#

a list of the single entites

Type:

Pandas Series

entity1_idx#

a list of the single entites index

Type:

Pandas Series

entity1_name#

a list of the single entites names

Type:

Pandas Series

file_format#

the format of the downloaded dataset

Type:

str

label_name#

for multi-label dataset, the label name of interest

Type:

str

name#

dataset name

Type:

str

path#

path to save and retrieve the dataset

Type:

str

y#

a list of the single entities label

Type:

Pandas Series

get_data(format='df')[source]#
Parameters:

format (str, optional) – the returning dataset format, defaults to ‘df’

Returns:

a dataframe of a dataset/a dictionary for key information in the dataset

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct format input (df, dict, DeepPurpose)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]#
Parameters:
  • method – splitting schemes, choose from random, cold_{entity}, scaffold, defaults to ‘random’

  • seed – the random seed for splitting dataset, defaults to ‘42’

  • frac – train/val/test split fractions, defaults to ‘[0.7, 0.1, 0.2]’

Returns:

a dictionary with three keys (‘train’, ‘valid’, ‘test’), each value is a pandas dataframe object of the splitted dataset

Return type:

dict

Raises:

AttributeError – the input split method is not available.

print_stats()[source]#

Print basic data statistics.

tdc.single_pred.adme module#

class tdc.single_pred.adme.ADME(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in ADME task. More info: https://tdcommons.ai/single_pred_tasks/adme/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

get_approved_set()[source]#
get_other_species(species=None)[source]#
harmonize(mode=None)[source]#

Removing duplicated experimental readouts.

tdc.single_pred.crispr_outcome module#

class tdc.single_pred.crispr_outcome.CRISPROutcome(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in CRISPROutcome task. More info: https://tdcommons.ai/single_pred_tasks/CRISPROutcome/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.develop module#

class tdc.single_pred.develop.Develop(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in Develop task. More info: https://tdcommons.ai/single_pred_tasks/develop/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

graphein(graph='distance', node_feature=['amino_acid_one_hot'], distance_threshold=6, config=None, convertor=None)[source]#

tdc.single_pred.epitope module#

class tdc.single_pred.epitope.Epitope(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in Epitope Prediction task. More info: https://tdcommons.ai/single_pred_tasks/epitope/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.hts module#

class tdc.single_pred.hts.HTS(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in HTS task. More info: https://tdcommons.ai/single_pred_tasks/hts/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.paratope module#

class tdc.single_pred.paratope.Paratope(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in Paratope Prediction task. More info: https://tdcommons.ai/single_pred_tasks/paratope/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.qm module#

class tdc.single_pred.qm.QM(name, path='./data', label_name=None, print_stats=False, convert_format=None, raw_format='Raw3D')[source]#

Bases: DataLoader

Data loader class to load datasets in QM (Quantum Mechanics Modeling) task. More info: https://tdcommons.ai/single_pred_tasks/qm/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.test_single_pred module#

class tdc.single_pred.test_single_pred.TestSinglePred(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to test the single instance prediction data loader.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.tox module#

class tdc.single_pred.tox.Tox(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in Tox (Toxicity Prediction) task. More info: https://tdcommons.ai/single_pred_tasks/tox/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None

tdc.single_pred.yields module#

class tdc.single_pred.yields.Yields(name, path='./data', label_name=None, print_stats=False, convert_format=None)[source]#

Bases: DataLoader

Data loader class to load datasets in Yields (Reaction Yields Prediction) task. More info: https://tdcommons.ai/single_pred_tasks/yields/

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

  • convert_format (str, optional) – Automatic conversion of SMILES to other molecular formats in MolConvert class. Stored as separate column in dataframe, defaults to None