tdc.evaluator#

class tdc.evaluator.Evaluator(name)[source]#

Bases: object

evaluator to evaluate predictions

Parameters:: name (str) – the name of the evaluator function

assign_evaluator()[source]#: obtain evaluator function given the evaluator name

tdc.evaluator.avg_auc(y_true, y_pred)[source]#

tdc.evaluator.centroid(X)[source]#

Centroid is the mean position of all the points in all of the coordinate directions, from a vectorset X. https://en.wikipedia.org/wiki/Centroid C = sum(X)/len(X) :param X: (N,D) matrix, where N is points and D is dimension. :type X: array

Returns:: C – centroid
Return type:: float

tdc.evaluator.kabsch(P, Q)[source]#

Using the Kabsch algorithm with two sets of paired point P and Q, centered around the centroid. Each vector set is represented as an NxD matrix, where D is the the dimension of the space. The algorithm works in three steps: - a centroid translation of P and Q (assumed done before this function

call)

the computation of a covariance matrix C
computation of the optimal rotation matrix U

For more info see http://en.wikipedia.org/wiki/Kabsch_algorithm :param P: (N,D) matrix, where N is points and D is dimension. :type P: array :param Q: (N,D) matrix, where N is points and D is dimension. :type Q: array

Returns:: U – Rotation matrix (D,D)
Return type:: matrix

tdc.evaluator.kabsch_rmsd(P, Q, W=None, translate=False)[source]#

Rotate matrix P unto Q using Kabsch algorithm and calculate the RMSD. An optional vector of weights W may be provided. :param P: (N,D) matrix, where N is points and D is dimension. :type P: array :param Q: (N,D) matrix, where N is points and D is dimension. :type Q: array :param W:

vector, where N is points.

Parameters:: translate (bool) – Use centroids to translate vector P and Q unto each other.
Returns:: rmsd – root-mean squared deviation
Return type:: float

tdc.evaluator.kabsch_rotate(P, Q)[source]#

Rotate matrix P unto matrix Q using Kabsch algorithm. :param P: (N,D) matrix, where N is points and D is dimension. :type P: array :param Q: (N,D) matrix, where N is points and D is dimension. :type Q: array

Returns:: P – (N,D) matrix, where N is points and D is dimension, rotated
Return type:: array

tdc.evaluator.kabsch_weighted(P, Q, W=None)[source]#

Using the Kabsch algorithm with two sets of paired point P and Q. Each vector set is represented as an NxD matrix, where D is the dimension of the space. An optional vector of weights W may be provided. Note that this algorithm does not require that P and Q have already been overlayed by a centroid translation. The function returns the rotation matrix U, translation vector V, and RMS deviation between Q and P’, where P’ is:

P’ = P * U + V

vector, where N is points.

Returns:

U (matrix) – Rotation matrix (D,D)
V (vector) – Translation vector (D)
RMSD (float) – Root mean squared deviation between P and Q

tdc.evaluator.kabsch_weighted_rmsd(P, Q, W=None)[source]#

Calculate the RMSD between P and Q with optional weighhts W :param P: (N,D) matrix, where N is points and D is dimension. :type P: array :param Q: (N,D) matrix, where N is points and D is dimension. :type Q: array :param W:

vector, where N is points

Returns:: RMSD
Return type:: float

tdc.evaluator.pcc(y_true, y_pred)[source]#

tdc.evaluator.precision_at_recall_k(y_true, y_pred, threshold=0.9)[source]#

tdc.evaluator.range_logAUC(true_y, predicted_score, FPR_range=(0.001, 0.1))[source]#

Author: Yunchao “Lance” Liu (lanceknight26@gmail.com) Calculate logAUC in a certain FPR range (default range: [0.001, 0.1]). This was used by previous methods [1] and the reason is that only a small percentage of samples can be selected for experimental tests in consideration of cost. This means only molecules with very high predicted score can be worth testing, i.e., the decision threshold is high. And the high decision threshold corresponds to the left side of the ROC curve, i.e., those FPRs with small values. Also, because the threshold cannot be predetermined, the area under the curve is used to consolidate all possible thresholds within a certain FPR range. Finally, the logarithm is used to bias smaller FPRs. The higher the logAUC[0.001, 0.1], the better the performance.

A perfect classifer gets a logAUC[0.001, 0.1] ) of 1, while a random classifer gets a logAUC[0.001, 0.1] ) of around 0.0215 (See [2])

References: [1] Mysinger, M.M. and B.K. Shoichet, Rapid Context-Dependent Ligand Desolvation in Molecular Docking. Journal of Chemical Information and Modeling, 2010. 50(9): p. 1561-1573. [2] Liu, Yunchao, et al. “Interpretable Chirality-Aware Graph Neural Network for Quantitative Structure Activity Relationship Modeling in Drug Discovery.” bioRxiv (2022). :param true_y: numpy array of the ground truth. Values are either 0 (inactive) or 1(active). :param predicted_score: numpy array of the predicted score (The score does not have to be between 0 and 1) :param FPR_range: the range for calculating the logAUC formated in (x, y) with x being the lower bound and y being the upper bound :return: a numpy array of logAUC of size [1,1]

tdc.evaluator.recall_at_precision_k(y_true, y_pred, threshold=0.9)[source]#

tdc.evaluator.rmsd(V, W)[source]#

Calculate Root-mean-square deviation from two sets of vectors V and W. :param V: (N,D) matrix, where N is points and D is dimension. :type V: array :param W: (N,D) matrix, where N is points and D is dimension. :type W: array

Returns:: rmsd – Root-mean-square deviation between the two vectors
Return type:: float

tdc.evaluator.rmse(y_true, y_pred)[source]#