qxmt.datasets.raw_preprocess.sampling module#
- qxmt.datasets.raw_preprocess.sampling.sample_by_count(X, y, n_samples, random_seed)
Randomly sample a fixed number of rows from the dataset.
This function samples
n_samplesrows fromXand the corresponding labels fromywithout replacement. Class balance is not considered; usesample_n_per_classwhen you need the same number of samples from each class.- Parameters:
X (np.ndarray) – Input feature array. The first dimension must match the length of
y.y (np.ndarray) – Label array corresponding to
X.n_samples (int) – Number of rows to sample from the whole dataset.
random_seed (int) – Random seed used for reproducible sampling.
- Returns:
Tuple of sampled features and labels.
- Return type:
RAW_DATASET_TYPE
- Raises:
ValueError – If
n_samplesis larger than the number of rows inX.
- qxmt.datasets.raw_preprocess.sampling.sample_n_per_class(X, y, n_samples, labels, random_seed)
Randomly sample a fixed number of rows from each specified class.
This function first shuffles the dataset with
random_seed, then extractsn_samplesrows for every label inlabels. The total number of returned rows is thereforen_samples * len(labels). Labels inyare converted tointbefore comparison.- Parameters:
X (np.ndarray) – Input feature array. The first dimension must match the length of
y.y (np.ndarray) – Label array corresponding to
X.n_samples (int) – Number of rows to sample from each label.
labels (list[int]) – Labels to include in the sampled dataset.
random_seed (int) – Random seed used for reproducible sampling.
- Returns:
Tuple of sampled features and labels.
- Return type:
RAW_DATASET_TYPE
- Raises:
ValueError – If any label in
labelsdoes not exist iny.ValueError – If any requested label has fewer than
n_samplesrows.
- qxmt.datasets.raw_preprocess.sampling.sampling_by_each_class(X, y, n_samples, labels, random_seed)
Deprecated alias for sample_n_per_class.
- Parameters:
X (ndarray)
y (ndarray)
n_samples (int)
labels (list[int])
random_seed (int)
- Return type:
tuple[ndarray, ndarray]
- qxmt.datasets.raw_preprocess.sampling.sampling_by_num(X, y, n_samples, random_seed)
Deprecated alias for sample_by_count.
- Parameters:
X (ndarray)
y (ndarray)
n_samples (int)
random_seed (int)
- Return type:
tuple[ndarray, ndarray]