tdc.multi_pred#

tdc.multi_pred.bi_pred_dataset module#

class tdc.multi_pred.bi_pred_dataset.DataLoader(name, path, label_name, print_stats, dataset_names)[source]#

Bases: DataLoader

A base data loader class that each bi-instance prediction task dataloader class can inherit from.

Attributes: TODO

get_data(format='df')[source]#

generate data in some format, e.g., pandas.DataFrame

Parameters:

format (str, optional) – format of data, the default value is ‘df’ (DataFrame)

Returns:

a dataframe of a dataset/a dictionary for key information in the dataset

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct format input (df, dict, DeepPurpose)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2], column_name=None, time_column=None)[source]#

split dataset into train/validation/test.

Parameters:
  • method (str, optional) – split method, the default value is ‘random’

  • seed (int, optional) – random seed, defaults to ‘42’

  • frac (list, optional) – train/val/test split fractions, defaults to ‘[0.7, 0.1, 0.2]’

  • column_name (Optional[Union[str, List[str]]]) – Optional column name(s) to split on for cold splits. Defaults to None.

  • time_column (None, optional) – Description

Returns:

a dictionary with three keys (‘train’, ‘valid’, ‘test’), each value is a pandas dataframe object of the splitted dataset.

Return type:

dict

Raises:

AttributeError – the input split method is not available.

neg_sample(frac=1)[source]#

negative sampling

Parameters:

frac (int, optional) – the ratio between negative and positive samples.

Returns:

DataLoader, the class itself.

print_stats()[source]#

print the statistics of the dataset

to_graph(threshold=None, format='edge_list', split=True, frac=[0.7, 0.1, 0.2], seed=42, order='descending')[source]#

Summary TODO

Parameters:
  • threshold (float, optional) – threshold to binarize the data.

  • format (str, optional) – format of data, defaults to ‘edge_list’

  • split (bool, optional) – if we need to split data into train/valid/test.

  • frac (list, optional) – train/val/test split fractions, defaults to ‘[0.7, 0.1, 0.2]’

  • seed (int, optional) – random seed, defaults to ‘42’

  • order (str, optional) – order of label transform

Returns:

a dictionary for key information in the dataset

Return type:

dict

Raises:

tdc.multi_pred.multi_pred_dataset module#

class tdc.multi_pred.multi_pred_dataset.DataLoader(name, path, print_stats, dataset_names)[source]#

Bases: DataLoader

A base data loader class that each multi-instance prediction task dataloader class can inherit from.

Attributes: TODO

get_data(format='df')[source]#

generate data in some format, e.g., pandas.DataFrame

Parameters:

format (str, optional) – format of data, the default value is ‘df’ (DataFrame)

Returns:

a dataframe of a dataset/a dictionary for key information in the dataset

Return type:

pandas DataFrame/dict

Raises:

AttributeError – Use the correct format input (df, dict, DeepPurpose)

get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2], column_name=None)[source]#

split dataset into train/validation/test.

Parameters:
  • method (str, optional) – split method, the default value is ‘random’

  • seed (int, optional) – random seed, defaults to ‘42’

  • frac (list, optional) – train/val/test split fractions, defaults to ‘[0.7, 0.1, 0.2]’

  • column_name (None, optional) – Description

Returns:

a dictionary with three keys (‘train’, ‘valid’, ‘test’), each value is a pandas dataframe object of the splitted dataset

Return type:

dict

Raises:

AttributeError – the input split method is not available.

print_stats()[source]#

print the statistics of the dataset

tdc.multi_pred.antibodyaff module#

class tdc.multi_pred.antibodyaff.AntibodyAff(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Antibody-antigen Affinity Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/antibodyaff/

Task Description: Regression. Given the amino acid sequence of antibody and antigen, predict their binding affinity.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

tdc.multi_pred.catalyst module#

class tdc.multi_pred.catalyst.Catalyst(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Catalyst Prediction task More info: https://tdcommons.ai/multi_pred_tasks/catalyst/

Task Description: Given reactant and product set X, predict the catalyst Y from a set of most common catalysts.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

tdc.multi_pred.ddi module#

class tdc.multi_pred.ddi.DDI(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Drug-Drug Interaction Prediction task More info: https://tdcommons.ai/multi_pred_tasks/ddi/

Task Description: Multi-class classification. Given the SMILES strings of two drugs, predict their interaction type.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

print_stats()[source]#

print the statistics of the dataset

tdc.multi_pred.drugres module#

class tdc.multi_pred.drugres.DrugRes(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Drug Response Prediction Task. More info: https://tdcommons.ai/multi_pred_tasks/drugres/

Task Description: Regression. Given the gene expression of cell lines and the SMILES of drug, predict the drug sensitivity level.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

get_gene_symbols()[source]#

Retrieve the gene symbols for the cell line gene expression

tdc.multi_pred.drugsyn module#

class tdc.multi_pred.drugsyn.DrugSyn(name, path='./data', print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Drug Synergy Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/drugsyn/

Task Description: Regression.

Given the gene expression of cell lines and two SMILES strings of the drug combos, predict the drug synergy level.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

tdc.multi_pred.dti module#

class tdc.multi_pred.dti.DTI(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Drug-Target Interaction Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/dti/

Regression task. Given the target amino acid sequence/compound SMILES string, predict their binding affinity.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

harmonize_affinities(mode=None)[source]#

Removing duplicated drug-target pairs with different binding affinities.

tdc.multi_pred.gda module#

class tdc.multi_pred.gda.GDA(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Gene-Disease Association Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/gdi/

Task Description: Regression.

Given the disease description and the amino acid sequence of the gene, predict their association.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

tdc.multi_pred.mti module#

class tdc.multi_pred.mti.MTI(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in MicroRNA-Target Interaction Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/mti/

Task Description: Binary Classification.

Given the miRNA mature sequence and target amino acid sequence, predict their likelihood of interaction.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

tdc.multi_pred.peptidemhc module#

class tdc.multi_pred.peptidemhc.PeptideMHC(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Peptide-MHC Binding Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/peptidemhc/

Task Description: Regression.

Given the amino acid sequence of peptide and the pseudo amino acid sequence of MHC, predict the binding affinity.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

tdc.multi_pred.ppi module#

class tdc.multi_pred.ppi.PPI(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Data loader class to load datasets in Protein-Protein Interaction Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/ppi/

Task Description: Binary Classification. Given the target amino acid sequence pairs, predict if they interact or not.

Parameters:
  • name (str) – the dataset name.

  • path (str, optional) – The path to save the data file, defaults to ‘./data’

  • label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None

  • print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False

print_stats()[source]#

print the statistics of the dataset

tdc.multi_pred.test_multi_pred module#

class tdc.multi_pred.test_multi_pred.TestMultiPred(name, path='./data', label_name=None, print_stats=False)[source]#

Bases: DataLoader

Summary

entity1_name#

Description

Type:

str

entity2_name#

Description

Type:

str

two_types#

Description

Type:

bool