tdc.multi_pred#
tdc.multi_pred.bi_pred_dataset module#
- class tdc.multi_pred.bi_pred_dataset.DataLoader(name, path, label_name, print_stats, dataset_names)[source]#
Bases:
DataLoader
A base data loader class that each bi-instance prediction task dataloader class can inherit from.
Attributes: TODO
- get_data(format='df')[source]#
generate data in some format, e.g., pandas.DataFrame
- Parameters:
format (str, optional) – format of data, the default value is ‘df’ (DataFrame)
- Returns:
a dataframe of a dataset/a dictionary for key information in the dataset
- Return type:
pandas DataFrame/dict
- Raises:
AttributeError – Use the correct format input (df, dict, DeepPurpose)
- get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2], column_name=None, time_column=None)[source]#
split dataset into train/validation/test.
- Parameters:
method (str, optional) – split method, the default value is ‘random’
seed (int, optional) – random seed, defaults to ‘42’
frac (list, optional) – train/val/test split fractions, defaults to ‘[0.7, 0.1, 0.2]’
column_name (Optional[Union[str, List[str]]]) – Optional column name(s) to split on for cold splits. Defaults to None.
time_column (None, optional) – Description
- Returns:
a dictionary with three keys (‘train’, ‘valid’, ‘test’), each value is a pandas dataframe object of the splitted dataset.
- Return type:
- Raises:
AttributeError – the input split method is not available.
- neg_sample(frac=1)[source]#
negative sampling
- Parameters:
frac (int, optional) – the ratio between negative and positive samples.
- Returns:
DataLoader, the class itself.
- to_graph(threshold=None, format='edge_list', split=True, frac=[0.7, 0.1, 0.2], seed=42, order='descending')[source]#
Summary TODO
- Parameters:
threshold (float, optional) – threshold to binarize the data.
format (str, optional) – format of data, defaults to ‘edge_list’
split (bool, optional) – if we need to split data into train/valid/test.
frac (list, optional) – train/val/test split fractions, defaults to ‘[0.7, 0.1, 0.2]’
seed (int, optional) – random seed, defaults to ‘42’
order (str, optional) – order of label transform
- Returns:
a dictionary for key information in the dataset
- Return type:
- Raises:
AttributeError – the threshold is not available.
ImportError – install the required package
tdc.multi_pred.multi_pred_dataset module#
- class tdc.multi_pred.multi_pred_dataset.DataLoader(name, path, print_stats, dataset_names)[source]#
Bases:
DataLoader
A base data loader class that each multi-instance prediction task dataloader class can inherit from.
Attributes: TODO
- get_data(format='df')[source]#
generate data in some format, e.g., pandas.DataFrame
- Parameters:
format (str, optional) – format of data, the default value is ‘df’ (DataFrame)
- Returns:
a dataframe of a dataset/a dictionary for key information in the dataset
- Return type:
pandas DataFrame/dict
- Raises:
AttributeError – Use the correct format input (df, dict, DeepPurpose)
- get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2], column_name=None)[source]#
split dataset into train/validation/test.
- Parameters:
- Returns:
a dictionary with three keys (‘train’, ‘valid’, ‘test’), each value is a pandas dataframe object of the splitted dataset
- Return type:
- Raises:
AttributeError – the input split method is not available.
tdc.multi_pred.antibodyaff module#
- class tdc.multi_pred.antibodyaff.AntibodyAff(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Antibody-antigen Affinity Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/antibodyaff/
Task Description: Regression. Given the amino acid sequence of antibody and antigen, predict their binding affinity.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.catalyst module#
- class tdc.multi_pred.catalyst.Catalyst(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Catalyst Prediction task More info: https://tdcommons.ai/multi_pred_tasks/catalyst/
Task Description: Given reactant and product set X, predict the catalyst Y from a set of most common catalysts.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.ddi module#
- class tdc.multi_pred.ddi.DDI(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Drug-Drug Interaction Prediction task More info: https://tdcommons.ai/multi_pred_tasks/ddi/
Task Description: Multi-class classification. Given the SMILES strings of two drugs, predict their interaction type.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.drugres module#
- class tdc.multi_pred.drugres.DrugRes(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Drug Response Prediction Task. More info: https://tdcommons.ai/multi_pred_tasks/drugres/
Task Description: Regression. Given the gene expression of cell lines and the SMILES of drug, predict the drug sensitivity level.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.drugsyn module#
- class tdc.multi_pred.drugsyn.DrugSyn(name, path='./data', print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Drug Synergy Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/drugsyn/
- Task Description: Regression.
Given the gene expression of cell lines and two SMILES strings of the drug combos, predict the drug synergy level.
tdc.multi_pred.dti module#
- class tdc.multi_pred.dti.DTI(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Drug-Target Interaction Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/dti/
Regression task. Given the target amino acid sequence/compound SMILES string, predict their binding affinity.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.gda module#
- class tdc.multi_pred.gda.GDA(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Gene-Disease Association Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/gdi/
- Task Description: Regression.
Given the disease description and the amino acid sequence of the gene, predict their association.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.mti module#
- class tdc.multi_pred.mti.MTI(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in MicroRNA-Target Interaction Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/mti/
- Task Description: Binary Classification.
Given the miRNA mature sequence and target amino acid sequence, predict their likelihood of interaction.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.peptidemhc module#
- class tdc.multi_pred.peptidemhc.PeptideMHC(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Peptide-MHC Binding Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/peptidemhc/
- Task Description: Regression.
Given the amino acid sequence of peptide and the pseudo amino acid sequence of MHC, predict the binding affinity.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False
tdc.multi_pred.ppi module#
- class tdc.multi_pred.ppi.PPI(name, path='./data', label_name=None, print_stats=False)[source]#
Bases:
DataLoader
Data loader class to load datasets in Protein-Protein Interaction Prediction task. More info: https://tdcommons.ai/multi_pred_tasks/ppi/
Task Description: Binary Classification. Given the target amino acid sequence pairs, predict if they interact or not.
- Parameters:
name (str) – the dataset name.
path (str, optional) – The path to save the data file, defaults to ‘./data’
label_name (str, optional) – For multi-label dataset, specify the label name, defaults to None
print_stats (bool, optional) – Whether to print basic statistics of the dataset, defaults to False