Runners#
Core evaluation execution logic. The runner module provides two entry points:
JSON Runner#
Reads an eval_set.json manifest (emitted by the Dart CLI) and calls eval_set().
Thin shim: read InspectEvalSet JSON, build Tasks, call eval_set().
The JSON file maps ~1:1 to eval_set() kwargs. The ‘tasks’ key contains task definitions with inline datasets (InspectDataset with InspectSample objects).
Args Runner#
Runs a single task directly from CLI arguments (--task, --model, --dataset).