Runners#

Core evaluation execution logic. The runner module provides two entry points:

JSON Runner#

Reads an eval_set.json manifest (emitted by the Dart CLI) and calls eval_set().

Thin shim: read InspectEvalSet JSON, build Tasks, call eval_set().

The JSON file maps ~1:1 to eval_set() kwargs. The ‘tasks’ key contains task definitions with inline datasets (InspectDataset with InspectSample objects).

dash_evals.runner.json_runner.run_from_json(manifest_path)[source]#

Load an InspectEvalSet JSON, build Tasks, and call eval_set().

Parameters:: manifest_path (str | Path) – Path to eval_set.json emitted by the Dart CLI.
Return type:: bool
Returns:: True if any tasks failed, False if all succeeded.

Args Runner#

Runs a single task directly from CLI arguments (--task, --model, --dataset).