devals_cli (devals)#
Dart CLI for managing evals — initialize datasets, create samples, run evaluations, and view results. Located in packages/devals_cli/.
For setup instructions, see the Quick Start or Contributing Guide.
Commands#
Command |
Description |
|---|---|
|
Initialize a new dataset in the current directory (creates |
|
Check that prerequisites are installed (Dart, Python, dash_evals, Podman, Flutter, Serverpod, API keys) |
|
Interactively add a new sample to an existing task |
|
Interactively create a new task file in |
|
Interactively create a new job file |
|
Guided flow to create a task and job together |
|
Resolve config and run evaluations via the Python dash_evals |
|
Upload Inspect AI log files to Google Cloud Storage |
|
Launch the Inspect AI viewer to browse evaluation results |
Usage#
# Scaffold a new dataset
devals init
# Check your environment
devals doctor
# Create a new eval (task + job in one step)
devals create pipeline
# Run evaluations
devals run local_dev
# Preview without executing
devals run local_dev --dry-run
# Upload logs to GCS
devals publish logs/2026-01-07_17-11-47/
# View results
devals view
How devals run Works#
The CLI resolves the job YAML into
EvalSetobjects using the dataset_config_dart package (entirely in Dart)EvalSetWriterwrites the resolved config to a JSON fileThe CLI invokes
run-evals --manifest <path>to hand off to the Python dash_evals
With --dry-run, the CLI resolves and validates the config without calling the Python runner.
Source Layout#
bin/
└── devals.dart # Entry point
lib/
├── devals.dart # Library barrel file
└── src/
├── runner.dart # DevalRunner (CommandRunner)
├── cli_exception.dart # CLI-specific exceptions
├── commands/ # Command implementations
│ ├── init_command.dart
│ ├── doctor_command.dart
│ ├── create_command.dart
│ ├── create_sample_command.dart
│ ├── create_task_command.dart
│ ├── create_job_command.dart
│ ├── create_pipeline_command.dart
│ ├── run_command.dart
│ ├── publish_command.dart
│ └── view_command.dart
├── config/ # Environment and .env helpers
├── dataset/ # Dataset reading, writing, templates
└── gcs/ # Google Cloud Storage client
Testing#
cd packages/devals_cli
dart test