Plot Metric Growth per Labeled Instances
- ds_utils.metrics.learning_curves.plot_metric_growth_per_labeled_instances(X_train: ~numpy.ndarray, y_train: ~numpy.ndarray, X_test: ~numpy.ndarray, y_test: ~numpy.ndarray, classifiers_dict: ~typing.Dict[str, ~sklearn.base.ClassifierMixin], n_samples: ~typing.List[int] | None = None, quantiles: ~typing.List[float] | None = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.39999999999999997, 0.44999999999999996, 0.49999999999999994, 0.5499999999999999, 0.6, 0.65, 0.7, 0.75, 0.7999999999999999, 0.85, 0.9, 0.95, 1.0], metric: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function accuracy_score>, random_state: int | ~numpy.random.mtrand.RandomState | None = None, n_jobs: int | None = None, verbose: int = 0, pre_dispatch: int | str | None = '2*n_jobs', *, ax: ~matplotlib.axes._axes.Axes | None = None, **kwargs) Axes[source]
Plot learning curves showing metric performance vs. training set size.
Receives train and test sets, and plots the change in the given metric with increasing numbers of trained instances.
- Parameters:
X_train – array-like or sparse matrix of shape (n_samples, n_features). The training input samples.
y_train – array-like of shape (n_samples,). The target values (class labels) as integers or strings.
X_test – array-like or sparse matrix of shape (n_samples, n_features). The test or evaluation input samples.
y_test – array-like of shape (n_samples,). The true labels for X_test.
classifiers_dict – mapping from classifier name to a classifier object.
n_samples – List of numbers of samples for training batches, optional (default=None).
quantiles – List of sample percentages for training batches, optional (default=[0.05, 0.1, …, 0.95, 1]). Used when n_samples=None.
metric – sklearn.metrics API function which receives y_true and y_pred and returns float.
random_state – int, RandomState instance or None, optional (default=None). Controls the shuffling applied to the data before applying the split.
n_jobs – int or None, optional (default=None). The number of jobs to run in parallel.
verbose – int, optional (default=0). Controls the verbosity when fitting and predicting.
pre_dispatch – int or string, optional. Controls the number of jobs that get dispatched during parallel execution.
ax – matplotlib Axes object, optional. The axes to plot on.
kwargs – additional keyword arguments to be passed to the plot function.
- Returns:
The Axes object with the plot drawn onto it.
- Raises:
ValueError – If both n_samples and quantiles are None.
Code Example
In this example, we’ll divide the data into train and test sets, decide on which classifiers we want to measure, and plot the results:
from matplotlib import pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from ds_utils.metrics.learning_curves import plot_metric_growth_per_labeled_instances
# Load and prepare the data
features = IRIS.data
labels = IRIS.target
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=.3, random_state=0)
# Define classifiers to compare
classifiers = {
"DecisionTreeClassifier": DecisionTreeClassifier(random_state=0),
"RandomForestClassifier": RandomForestClassifier(random_state=0, n_estimators=5)
}
# Plot metric growth for different amounts of training data
plot_metric_growth_per_labeled_instances(X_train, y_train, X_test, y_test, classifiers)
plt.show()
And the following image will be shown: