Run Agents on ContextBench

ContextBench includes a unified runner for executing agents and collecting trajectories.

Runner entrypoint

Use the module entrypoint:

python -m contextbench.run --help

Common examples

# Run agentless on Verified
python -m contextbench.run --agent agentless --bench Verified

# Run MiniSWE on Pro, first 5 instances
python -m contextbench.run --agent miniswe --bench Pro --limit 5

Task lists

By default the runner reads:

data/selected_500_instances.csv

See also

The Markdown guide: docs/run_agent_on_contextbench.md