The following Python snippets demonstrate how to use the core calibrators from our DistCal
library, showing a simplified end-to-end flow from a base model's predictions to calibrated outputs. For a more detailed walkthrough with actual dataset loading and comprehensive evaluation, please see the full demo notebook.
Installation
Proceed with the installation as follows.
git clone https://github.com/shachideshpande/DistCal.git
cd DistCal
conda env create -f env_nobuilds.yml
conda activate distcal
This will make the required DistCal
package available within the 'distcal' environment. Ensure torchuq
(part of DistCal) is accessible in your PYTHONPATH if not installing globally.
1. Discrete Distribution Calibration (Classification)
This example shows how to train a simple base classifier (Logistic Regression on UCI Digits) and then use DiscreteDistCalibrator
to recalibrate its probabilistic outputs.
First, import necessary libraries:
import torch
import numpy as np
from sklearn.linear_model import LogisticRegression
from torchuq.transform.distcal_discrete import DiscreteDistCalibrator
from torchuq.dataset.classification import get_classification_datasets
from torchuq.evaluate.distribution_cal import discrete_cal_score
1. Load UCI Digits dataset (train, calibration, and test splits):
dataset = get_classification_datasets('digits', val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=False)
(X_train_d, y_train_d) = dataset[0][:][0], dataset[0][:][1]
(X_cal_d, y_cal_d) = dataset[1][:][0], dataset[1][:][1]
(X_test_d, y_test_d) = dataset[2][:][0], dataset[2][:][1]
2. Train a simple base classification model (Logistic Regression):
base_model_d = LogisticRegression(max_iter=200, solver='lbfgs', random_state=0).fit(X_train_d, y_train_d)
3. Get uncalibrated probabilities from the base model for the calibration and test sets:
probs_cal_discrete = torch.tensor(base_model_d.predict_proba(X_cal_d), dtype=torch.float32)
probs_test_discrete = torch.tensor(base_model_d.predict_proba(X_test_d), dtype=torch.float32)
4. Initialize and train the DiscreteDistCalibrator on the calibration data:
discrete_calibrator = DiscreteDistCalibrator(verbose=False)
discrete_calibrator.train(probs_cal_discrete, y_cal_d.long())
5. Apply the trained calibrator to test probabilities and calculate calibration scores. You can print these scores to observe the improvement.
calibrated_probs_test = discrete_calibrator(probs_test_discrete)
score_before = discrete_cal_score(y_test_d, probs_test_discrete)
score_after = discrete_cal_score(y_test_d, calibrated_probs_test)
# To see the scores:
# print(f"Calibration Score Before: {score_before:.4f}")
# print(f"Calibration Score After (DistCal): {score_after:.4f}")
2. Continuous Distribution Calibration (Regression)
This example shows how to train a simple base regression model (Bayesian Ridge on California Housing), convert its output to quantiles, and then use DistCalibrator
to recalibrate these quantiles.
First, import necessary libraries:
import torch
import numpy as np
from sklearn.linear_model import BayesianRidge
from torchuq.transform.distcal_continuous import DistCalibrator
from torchuq.transform.calibrate import convert_normal_to_quantiles
from torchuq.dataset.regression import get_regression_datasets
from torchuq.evaluate import quantile as q_eval
1. Load California Housing dataset (train, calibration, and test splits) and define the number of quantile buckets:
dataset_c = get_regression_datasets('cal_housing', val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=False)
(X_train_c, y_train_c) = dataset_c[0][:][0], dataset_c[0][:][1]
(X_cal_c, y_cal_c) = dataset_c[1][:][0], dataset_c[1][:][1]
(X_test_c, y_test_c) = dataset_c[2][:][0], dataset_c[2][:][1]
num_quantile_buckets = 20
2. Train a simple base regression model (Bayesian Ridge):
base_model_c = BayesianRidge().fit(X_train_c, y_train_c)
3. Get uncalibrated predictions (mean, std) and convert them to quantiles for both calibration and test sets:
mean_cal_c, std_cal_c = base_model_c.predict(X_cal_c.numpy(), return_std=True)
quantiles_cal_c = convert_normal_to_quantiles(
torch.tensor(mean_cal_c, dtype=torch.float32),
torch.tensor(std_cal_c, dtype=torch.float32).clamp(min=1e-3),
num_buckets=num_quantile_buckets
)
mean_test_c, std_test_c = base_model_c.predict(X_test_c.numpy(), return_std=True)
quantiles_test_c = convert_normal_to_quantiles(
torch.tensor(mean_test_c, dtype=torch.float32),
torch.tensor(std_test_c, dtype=torch.float32).clamp(min=1e-3),
num_buckets=num_quantile_buckets
)
4. Initialize and train the continuous DistCalibrator on the calibration quantiles:
continuous_calibrator = DistCalibrator(num_buckets=num_quantile_buckets, quantile_input=True, verbose=False)
continuous_calibrator.train(quantiles_cal_c, y_cal_c.float(), num_epochs=10)
5. Apply the trained calibrator to test quantiles and calculate average check scores. You can print these scores to observe the improvement.
calibrated_quantiles_test = continuous_calibrator(quantiles_test_c)
quantiles_for_eval = torch.linspace(0.05, 0.95, 19) # Define quantile levels for evaluation
score_before = q_eval.check_score(quantiles_test_c, y_test_c.unsqueeze(-1), quantiles_for_eval).mean()
score_after = q_eval.check_score(calibrated_quantiles_test, y_test_c.unsqueeze(-1), quantiles_for_eval).mean()
# To see the scores:
# print(f"Avg. Check Score Before: {score_before:.4f}")
# print(f"Avg. Check Score After (DistCal): {score_after:.4f}")
As shown in our paper and the full demo notebook, applying these DistCal calibrators significantly improves standard calibration metrics (like discrete calibration scores or continuous check scores) often without negatively impacting task-specific accuracy or error metrics.