tdc.base_dataset#
This file contains a base data loader object that specific one can inherit from.
- class tdc.base_dataset.DataLoader[source]#
Bases:
object
base data loader class that contains functions shared by almost all data loader classes.
- balanced(oversample=False, seed=42)[source]#
balance the label neg-pos ratio
- Parameters:
- Returns:
the updated dataframe with balanced dataset
- Return type:
pd.DataFrame
- Raises:
AttributeError – alert to binarize the data first as continuous values cannot do balancing
- binarize(threshold=None, order='descending')[source]#
binarize the labels
- Parameters:
- Returns:
data loader class with updated label
- Return type:
- Raises:
AttributeError – no threshold specified for binarization
- convert_from_log(form='standard')[source]#
convert labels from log-scale
- Parameters:
form (str, optional) – standard log-transformation or binding nM <-> p transformation.
- convert_to_log(form='standard')[source]#
convert labels to log-scale
- Parameters:
form (str, optional) – standard log-transformation or binding nM <-> p transformation.
- get_data(format='df')[source]#
- Parameters:
format (str, optional) – the dataset format
- Returns:
when format is df/dict/DeepPurpose
- Return type:
pd.DataFrame/dict/np.array
- Raises:
AttributeError – format not supported
- get_label_meaning(output_format='dict')[source]#
get the biomedical meaning of label
- Parameters:
output_format (str, optional) – dict/df/array for label
- Returns:
when output_format is dict/df/array
- Return type:
dict/pd.DataFrame/np.array
- get_split(method='random', seed=42, frac=[0.7, 0.1, 0.2])[source]#
split function, overwritten by single_pred/multi_pred/generation for more specific splits
- Parameters:
method – splitting schemes
seed – random seed
frac – train/val/test split fractions
- Returns:
a dictionary of train/valid/test dataframes
- Return type:
- Raises:
AttributeError – split method not supported