Runners#

Core evaluation execution logic. The runner module provides two entry points:


JSON Runner#

Reads an eval_set.json manifest (emitted by the Dart CLI) and calls eval_set().

Thin shim: read InspectEvalSet JSON, build Tasks, call eval_set().

The JSON file maps ~1:1 to eval_set() kwargs. The ‘tasks’ key contains task definitions with inline datasets (InspectDataset with InspectSample objects).

dash_evals.runner.json_runner.run_from_json(manifest_path)[source]#

Load an InspectEvalSet JSON, build Tasks, and call eval_set().

Parameters:

manifest_path (str | Path) – Path to eval_set.json emitted by the Dart CLI.

Return type:

bool

Returns:

True if any tasks failed, False if all succeeded.


Args Runner#

Runs a single task directly from CLI arguments (--task, --model, --dataset).