qxmt.experiment.experiment module

qxmt.experiment.experiment module#

class qxmt.experiment.experiment.Experiment(name=None, desc=None, auto_gen_mode=False, root_experiment_dirc=PosixPath('/home/runner/work/qxmt/qxmt/experiments'), llm_model_path='microsoft/Phi-3-mini-128k-instruct', logger=<Logger qxmt.experiment.experiment (INFO)>)

Bases: object

Experiment class for managing the experiment and each run data. The Experiment class provides methods for initializing the experiment, running the experiment, saving the experiment data, and reproducing the model.

All experiment data is stored in the ExperimentDB instance. It is save in local directory as a json file (root_experiment_dirc/experiments/your_exp_name/experiment.json).

Experiment can be initialized and strated from scratch by calling the init() method. Anthoer way is to load the existing experiment data from the json file (experiment.json) by calling the load_experiment() method.

The Experiment class can be used in two ways: 1. Provide config_path: This method accepts the path to the config file or config instance. It is more flexible but requires a YAML base config file. This method tracks the experiment settings, result and can reproduce the model. Officially, we recommend using the config file method.

2. Directly provide dataset and model instance: This method directly accepts dataset and model instances. It is easy to use but does “NOT” track the experiment settings. This method is useful for adhoc experiments, quick testing or debugging.

Examples

>>> import qxmt
>>> exp = qxmt.Experiment(
...        name="my_qsvm_algorithm",
...       desc="""This is a experiment for new qsvm algorithm.
...        This experiment is applied and evaluated on multiple datasets.
...        """,
...        auto_gen_mode=True,
...    ).init()
>>> config_path = "../configs/template.yaml"
>>> artifact, result = exp.run(
...     config_source=config_path)
>>> exp.runs_to_dataframe()
    run_id      accuracy        precision       recall  f1_score
0            1      0.45             0.53         0.66      0.59
Parameters:
  • name (str | None)

  • desc (str | None)

  • auto_gen_mode (bool)

  • root_experiment_dirc (str | Path)

  • llm_model_path (str)

  • logger (Logger)

__init__(name=None, desc=None, auto_gen_mode=False, root_experiment_dirc=PosixPath('/home/runner/work/qxmt/qxmt/experiments'), llm_model_path='microsoft/Phi-3-mini-128k-instruct', logger=<Logger qxmt.experiment.experiment (INFO)>)

Initialize the Experiment class. Set the experiment name, description, and other settings such as auto_gen_mode, root_experiment_dirc and logger. auto_gen_mode controls whether to use the DescriptionGenerator by LLM. If use, set environemnt variable “USE_LLM” to True. root_experiment_dirc is the root directory to save the experiment data. Each artifact and result store in the subdirectory of the root directory.

Parameters:
  • name (Optional[str], optional) – experiment name. If None, generate by execution time. Defaults to None.

  • desc (Optional[str], optional) – description of the experiment. The purpose is search, memo, etc not used in the code. Defaults to None.

  • auto_gen_mode (bool, optional) – whether to use the DescriptionGenerator for generating the description of each run. Defaults to USE_LLM.

  • root_experiment_dirc (str | Path, optional) – root directory to save the experiment data. Defaults to DEFAULT_EXP_DIRC.

  • llm_model_path (str, optional) – path to the LLM model. Defaults to LLM_MODEL_PATH.

  • logger (Logger, optional) – logger instance for warning or error messages. Defaults to LOGGER.

Return type:

None

get_run_record(runs, run_id)

Get the run record of the target run_id.

Parameters:
  • run_id (int) – target run_id

  • runs (list[RunRecord])

Raises:

ValueError – if the run record does not exist

Returns:

target run record

Return type:

RunRecord

init()

Initialize the experiment directory and DB.

Returns:

initialized experiment

Return type:

Experiment

load(exp_dirc, exp_file_name=PosixPath('experiment.json'))

Load existing experiment data from a json file.

Parameters:
  • exp_dirc (str | Path) – path to the experiment directory

  • exp_file_name (str | Path)

Raises:
  • FileNotFoundError – if the experiment file does not exist

  • ExperimentSettingError – if the experiment directory does not exist

Returns:

loaded experiment

Return type:

Experiment

reproduce(run_id, check_commit_id=False)

Reproduce the target run_id model from config file. If the target run_id does not have a config file path, raise an error. Reoroduce method not supported for the run executed from the instance.

Parameters:
  • run_id (int) – target run_id

  • check_commit_id (bool, optional) – whether to check the commit_id. Defaults to False.

Returns:

artifact and record of the reproduced run_id

Return type:

tuple[RunArtifact, RunRecord]

Raises:

ReproductinoError – if the run_id does not have a config file path

run(task_type=None, dataset=None, model=None, config_source=None, default_metrics_name=None, custom_metrics=None, n_jobs=2, desc='', repo_path=None, add_results=True)

Start a new run for the experiment.

The run() method can be called in two ways:

1. Provide dataset and model instance: This method directly accepts dataset and model instances. It is easy to use but less flexible and does “NOT” track the experiment settings.

2. Provide config_path: This method accepts the path to the config file or config instance. It is more flexible but requires a config file.

Parameters:
  • task_type (str, optional) – type of the task (classification or regression). Defaults to None.

  • dataset (Dataset) – the dataset object.

  • model (BaseMLModel) – the model object.

  • config_source (ExperimentConfig, str | Path, optional) – config source can be either an ExperimentConfig instance or the path to a config file. If a path is provided, it loads and creates an ExperimentConfig instance. Defaults to None.

  • default_metrics_name (list[str], optional) – list of default metrics names. Defaults to None.

  • custom_metrics (list[dict[str, Any]], optional) – list of user defined custom metric configurations. Defaults to None.

  • n_jobs (int, optional) – number of jobs for parallel processing. Defaults to DEFAULT_N_JOBS.

  • desc (str, optional) – description of the run. Defaults to “”.

  • repo_path (str, optional) – path to the git repository. Defaults to None.

  • add_results (bool, optional) – whether to add the run record to the experiment. Defaults to True.

Returns:

Returns a tuple containing the artifact and run record of the current run_id.

Return type:

tuple[RunArtifact, RunRecord]

Raises:

ExperimentNotInitializedError – Raised if the experiment is not initialized.

run_evaluation(task_type, actual, predicted, default_metrics_name, custom_metrics)

Run evaluation for the current run.

Parameters:
  • actual (np.ndarray) – array of actual values

  • predicted (np.ndarray) – array of predicted values

  • default_metrics_name (Optional[list[str]]) – list of default metrics name

  • custom_metrics (Optional[list[dict[str, Any]]]) – list of user defined custom metric configurations

  • task_type (str)

Returns:

evaluation result

Return type:

dict

runs_to_dataframe()

Convert the run data to a pandas DataFrame.

Returns:

DataFrame of run data

Return type:

pd.DataFrame

Raises:

ExperimentNotInitializedError – if the experiment is not initialized

save_experiment(exp_file=PosixPath('experiment.json'))

Save the experiment data to a json file.

Parameters:

exp_file (str | Path, optional) – name of the file to save the experiment data.Defaults to DEFAULT_EXP_DB_FILE.

Raises:

ExperimentNotInitializedError – if the experiment is not initialized

Return type:

None