Generate Error Analysis Report

The generate_error_analysis_report function provides a tabular error-analysis report that groups predictions by feature values and computes error metrics per group. It’s particularly useful for identifying specific feature ranges or categories where the model underperforms.

ds_utils.metrics.error_analysis.generate_error_analysis_report(X: DataFrame, y_true: ndarray, y_pred: ndarray, feature_columns: List[str] | None = None, bins: int = 10, threshold: float = 0.5, min_count: int = 1, sort_metric: str = 'error_rate', ascending: bool = False) → DataFrame[source]

Generate a tabular error-analysis report grouped by feature values.

The report groups predictions by feature values and computes error metrics per group. For numerical features, values are binned into equal-width bins. For categorical features, raw values are used as groups.

Parameters:

X – Feature DataFrame.
y_true – True labels.
y_pred – Predicted labels.
feature_columns – List of columns to analyze. If None, all columns in X are used.
bins – Number of bins for numerical features.
threshold – Threshold for probability-based error definitions. Validated but not used in the current implementation; reserved for future probability-based error definitions.
min_count – Minimum number of samples in a group to be included in the report.
sort_metric – Metric to sort the report by. Valid options are: ‘feature’, ‘group’, ‘count’, ‘error_count’, ‘error_rate’, ‘accuracy’.
ascending – Whether to sort in ascending order.

Returns:

DataFrame containing the error analysis report.

Raises:

ValueError – If bins < 1, threshold not in [0, 1], min_count < 1, or invalid sort_metric.
KeyError – If any column in feature_columns is missing from X.

Code Example

import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from ds_utils.metrics.error_analysis import generate_error_analysis_report

# Load dataset and split
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
X["size_category"] = pd.cut(
    X["mean radius"], bins=3, labels=["small", "medium", "large"]
).astype(str)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier
clf = DecisionTreeClassifier(random_state=42, max_depth=3)
clf.fit(X_train[["mean radius", "mean texture"]], y_train)

y_pred = clf.predict(X_test[["mean radius", "mean texture"]])

# Generate error analysis report for numerical and categorical features
report = generate_error_analysis_report(
    X_test, y_test, y_pred,
    feature_columns=["mean radius", "mean texture", "size_category"],
    bins=3,
    sort_metric="error_rate",
    ascending=False
)
print(report)

The output will be a pandas DataFrame similar to this:

feature	group	count	error_count	error_rate	accuracy
mean radius	(16.71, 24.933]	30	3	0.100000	0.900000
size_category	large	15	1	0.066667	0.933333
mean texture	(25.32, 33.81]	17	1	0.058824	0.941176
mean texture	(16.83, 25.32]	78	4	0.051282	0.948718
mean texture	(8.315, 16.83]	48	2	0.041667	0.958333
size_category	medium	25	1	0.040000	0.960000
mean radius	(8.471, 16.71]	113	4	0.035398	0.964602

(Note: The size_category rows use raw string values as groups, while numerical features are binned. Rows with equal error_rate may appear in any order. Exact values and bins may vary based on data distribution.)