qxmt.datasets.file package#

Submodules#

Module contents#

class qxmt.datasets.file.FileDataLoader(data_path, label_path=None, label_name=None)#

Bases: object

This class loads the data and label from the multi format file. The loaded data and label are returned as numpy arrays (X and y).

Input data schema supports two patterns: 1. data and label are defined in separate files. 2. data and label are defined in a single file. In this case, the label name must be defined.

Supported file formats: - .npy, .npz, .csv, .tsv

Examples

>>> loader = FileDataLoader(data_path="data.npy", label_path="label.npy")
>>> X, y = loader.load()
>>> loader = FileDataLoader(data_path="data.npz")
>>> X, y = loader.load()
>>> loader = FileDataLoader(data_path="data.csv", label_path="label.csv")
>>> X, y = loader.load()
>>> loader = FileDataLoader(data_path="data.csv", label_name="target")
>>> X, y = loader.load()
>>> loader = FileDataLoader(data_path="data.tsv", label_path="label.tsv")
>>> X, y = loader.load()
>>> loader = FileDataLoader(data_path="data.tsv", label_name="target")
>>> X, y = loader.load()
Parameters:
  • data_path (str | Path)

  • label_path (str | Path | None)

  • label_name (str | None)

__init__(data_path, label_path=None, label_name=None)#

Initialize the FileDataLoader.

Parameters:
  • data_path (str | Path) – path to the data file.

  • label_path (Optional[str | Path], optional) – path to the label file. Defaults to None.

  • label_name (Optional[str], optional) – label name in the dataset. Defaults to None.

Return type:

None

load()#

Load the data and label from the file path. The file format is determined by the extension of the file path.

Supported file formats: - numpy: .npy, .npz - pandas: .csv, .tsv

Two input patterns exist: 1. “data_path” and “label_path” are defined. “label_name” is not needed, because the label is loaded from the file. 2. “data_path” and “label_name” are defined. “label_path” is not needed, because the label data include in the data file.

Returns:

loaded data and label as numpy arrays.

Return type:

tuple[np.ndarray, np.ndarray]

Raises:
  • ValueError – data and label file extensions do not match

  • ValueError – Data or label key is not matched in the npz file.

  • ValueError – Label defined in data of “label_path”, not need to define “label_name”.

  • ValueError – “Data and label are expected to be contained in the single file defined in “data_path”

  • ValueError – Label name is not found in the dataset.

  • ValueError – unsupported file extension