qxmt.datasets.raw_preprocess.sampling module

qxmt.datasets.raw_preprocess.sampling module#

qxmt.datasets.raw_preprocess.sampling.sample_by_count(X, y, n_samples, random_seed)

Randomly sample a fixed number of rows from the dataset.

This function samples n_samples rows from X and the corresponding labels from y without replacement. Class balance is not considered; use sample_n_per_class when you need the same number of samples from each class.

Parameters:
  • X (np.ndarray) – Input feature array. The first dimension must match the length of y.

  • y (np.ndarray) – Label array corresponding to X.

  • n_samples (int) – Number of rows to sample from the whole dataset.

  • random_seed (int) – Random seed used for reproducible sampling.

Returns:

Tuple of sampled features and labels.

Return type:

RAW_DATASET_TYPE

Raises:

ValueError – If n_samples is larger than the number of rows in X.

qxmt.datasets.raw_preprocess.sampling.sample_n_per_class(X, y, n_samples, labels, random_seed)

Randomly sample a fixed number of rows from each specified class.

This function first shuffles the dataset with random_seed, then extracts n_samples rows for every label in labels. The total number of returned rows is therefore n_samples * len(labels). Labels in y are converted to int before comparison.

Parameters:
  • X (np.ndarray) – Input feature array. The first dimension must match the length of y.

  • y (np.ndarray) – Label array corresponding to X.

  • n_samples (int) – Number of rows to sample from each label.

  • labels (list[int]) – Labels to include in the sampled dataset.

  • random_seed (int) – Random seed used for reproducible sampling.

Returns:

Tuple of sampled features and labels.

Return type:

RAW_DATASET_TYPE

Raises:
  • ValueError – If any label in labels does not exist in y.

  • ValueError – If any requested label has fewer than n_samples rows.

qxmt.datasets.raw_preprocess.sampling.sampling_by_each_class(X, y, n_samples, labels, random_seed)

Deprecated alias for sample_n_per_class.

Parameters:
  • X (ndarray)

  • y (ndarray)

  • n_samples (int)

  • labels (list[int])

  • random_seed (int)

Return type:

tuple[ndarray, ndarray]

qxmt.datasets.raw_preprocess.sampling.sampling_by_num(X, y, n_samples, random_seed)

Deprecated alias for sample_by_count.

Parameters:
  • X (ndarray)

  • y (ndarray)

  • n_samples (int)

  • random_seed (int)

Return type:

tuple[ndarray, ndarray]