Use the CLI#

You’ve written tasks and jobs by hand. The devals CLI can generate most of that configuration for you — this page shows how, and what you’ll want to customize afterward.

Scaffolding commands#

`devals init`#

Initializes a fresh project for evals:

cd ~/my-project
devals init

What it creates:

my-project/
├── devals.yaml                        # marker file
└── evals/
    ├── tasks/
    │   └── get_started/
    │       └── task.yaml              # starter task
    └── jobs/
        └── local_dev.yaml             # ready-to-run job

What to customize:

The starter task uses func: analyze_codebase — fine for a smoke test, but you’ll want to change func to match your eval type (question_answer, bug_fix, code_gen, etc.)
The job defaults to google/gemini-2.5-flash. Update models: to the provider(s) you want to test.
files points at ../../ (your project root). Update if your workspace lives elsewhere.

`devals create pipeline`#

An interactive walkthrough that creates a sample, task, and job in one go. Great for first-timers:

devals create pipeline

It prompts you for:

A sample ID and prompt
Which task function to use
A job name and model selection

The result is a fully wired-up set of YAML files ready to devals run.

`devals create task`#

Creates a new task directory with a starter task.yaml:

devals create task

Prompts for:

Task ID (becomes the directory name under tasks/)
Task function (selected from the Python registry)
Optional system message

What to customize after:

Add your samples — the generated file is a skeleton
Add files and setup if your task needs a workspace
Add metadata with tags for filtering

`devals create sample`#

Adds a new sample interactively:

devals create sample

Prompts for:

Sample ID (snake_case)
Difficulty level
Whether a workspace is needed

What to customize after:

Write a specific input prompt — the generated placeholder is generic
Write grading criteria in target
Add metadata.tags for filtering

`devals create job`#

Creates a new job YAML file:

devals create job

Prompts for:

Job name
Which models, variants, and tasks to include

What to customize after:

Add or refine variants — the generated file may only include baseline: {}
Add task_filters or sample_filters if you want to target a subset
Configure inspect_eval_arguments for retry, timeout, and limit settings

Running evals#

Basic run#

devals run <job_name>

The CLI:

Reads devals.yaml to find the evals/ directory
Resolves your YAML config into a JSON manifest
Passes the manifest to run-evals (the Python dash_evals runner)
dash_evals calls Inspect AI’s eval_set()
Logs are written to logs/

Dry run#

Preview the resolved configuration without making API calls:

devals run <job_name> --dry-run

This prints every task × model × variant combination that would execute. Use it to verify your setup before spending API credits.

[!TIP] Always dry-run after editing YAML config. It catches typos, missing files, and bad task references before they cost you money.

Viewing results#

devals view

Launches the Inspect AI log viewer — a local web UI. devals automatically finds your logs/ directory from devals.yaml.

To view logs from a specific location:

devals view /path/to/logs

What to look for in the viewer:

Section	What it shows
Runs	Each task × model × variant combination
Transcript	The full conversation, including every tool call
Score	Pass/fail, model-graded scores, test results
Metadata	Timing, token usage, cost

Troubleshooting#

`devals doctor`#

Checks all prerequisites:

devals doctor

It verifies:

Dart SDK — required for the CLI itself
Python 3.13+ — required for dash_evals
dash_evals — the Python evaluation package
Podman/Docker — container runtime for sandboxed tasks
Flutter SDK — needed for Flutter-based eval tasks
API Keys — checks for configured provider keys

Fix any errors before running evals. Warnings (like a missing Flutter SDK) are safe to ignore if your evals don’t need that tool.

Quick reference#

Command	What it does
`devals init`	Initialize a new dataset in the current directory
`devals doctor`	Check prerequisites
`devals create pipeline`	Interactive walkthrough: sample → task → job
`devals create task`	Create a new task directory
`devals create sample`	Create a new sample
`devals create job`	Create a new job file
`devals run <job>`	Run an evaluation
`devals run <job> --dry-run`	Preview without executing
`devals view [path]`	Launch the Inspect AI log viewer

Next steps#

You now know the full CLI workflow. Part 5 looks under the hood at the dash_evals Python package — useful if you ever want to write custom task logic.

Use the CLI#

Scaffolding commands#

devals init#

devals create pipeline#

devals create task#

devals create sample#

devals create job#

Running evals#

Basic run#

Dry run#

Viewing results#

Troubleshooting#

devals doctor#

Quick reference#

Next steps#

`devals init`#

`devals create pipeline`#

`devals create task`#

`devals create sample`#

`devals create job`#

`devals doctor`#