Agent Trajectory Extractors
ContextBench includes trajectory extractors for different coding agents, exposed via a unified API.
Supported Agents
1. MiniSWE-agent
Format:
.traj.jsonfilesLocation:
contextbench/agents/minisweagent/extract.py
Features:
Extracts file views from bash commands in messages
Supports
cat,sed -n,head,grep,nl | sedcommandsParses
patch_context_data.patch_contextfor final contextReturns model patch from
info.submission
2. SWE-agent
Format:
.checkpoints.jsonlfilesLocation:
contextbench/agents/sweagent/extract.py
Features:
Extracts from
str_replace_editor viewcommands with--view_rangeOnly includes steps with explicit line ranges
Parses
patch_contextstring format (File:/Lines:)
3. Agentless
Format: Custom JSON format
Location:
contextbench/agents/agentless/extract.py
Features:
Extracts localization and repair steps
Parses file access from retrieval outputs
Supports multi-stage reasoning
4. OpenHands
Format: Trajectory logs
Location:
contextbench/agents/openhands/extract.py
Features:
Extracts file operations from action logs
Supports browsing and editing actions
Handles multi-file contexts
5. Prometheus
Format: Agent-specific format
Location:
contextbench/agents/prometheus/extract.py
Features:
Extracts context from reasoning traces
Supports iterative refinement steps
Unified Interface
All agent extractors use a unified interface:
from contextbench.agents import extract_trajectory
# Automatically detects format based on file extension
result = extract_trajectory("path/to/trajectory.traj.json")
result = extract_trajectory("path/to/trajectory.checkpoints.jsonl")
The extractor returns a unified structure:
{
"pred_steps": [
{"files": [...], "spans": {...}},
...
],
"pred_files": [...],
"pred_spans": {...},
"pred_patch": "...", # Optional: model-generated patch
}
Return Structure
pred_steps
List of per-step context:
"pred_steps": [
{
"files": ["src/utils.py", "src/main.py"],
"spans": {
"src/utils.py": [(0, 100), (200, 300)],
"src/main.py": [(0, 500)]
}
},
...
]
pred_files
Cumulative set of all viewed files:
"pred_files": ["src/utils.py", "src/main.py", "tests/test.py"]
pred_spans
Cumulative union of all viewed spans:
"pred_spans": {
"src/utils.py": [(0, 100), (200, 300)],
"src/main.py": [(0, 500)],
"tests/test.py": [(0, 1000)]
}
pred_patch
Optional: The model-generated patch (if available):
"pred_patch": "diff --git a/src/utils.py b/src/utils.py\n..."
Adding a New Agent
To add support for a new agent:
1. Create extractor module
Create contextbench/agents/myagent/extract.py:
def extract_trajectory(traj_path: str) -> dict:
"""
Extract trajectory from MyAgent format.
Args:
traj_path: Path to trajectory file
Returns:
dict with keys: pred_steps, pred_files, pred_spans
"""
# Parse trajectory file
with open(traj_path) as f:
traj = json.load(f)
# Extract per-step context
steps = []
for step in traj["steps"]:
files, spans = parse_step(step)
steps.append({"files": files, "spans": spans})
# Compute cumulative context
all_files = compute_union_files(steps)
all_spans = compute_union_spans(steps)
return {
"pred_steps": steps,
"pred_files": all_files,
"pred_spans": all_spans,
}
2. Register in dispatcher
Update contextbench/agents/__init__.py:
from contextbench.agents.myagent.extract import extract_trajectory as extract_myagent
def extract_trajectory(traj_path: str) -> dict:
if traj_path.endswith(".myagent.json"):
return extract_myagent(traj_path)
# ... other formats
3. Add tests
Create tests/test_myagent_extractor.py:
def test_myagent_extraction():
result = extract_trajectory("test_data/myagent.json")
assert "pred_files" in result
assert "pred_spans" in result
assert len(result["pred_steps"]) > 0
Testing Extractors
Test an extractor on a single trajectory:
python -m contextbench.evaluate \
--gold data/full.parquet \
--pred traj_verified-mini/instance/instance.traj.json \
--out results.jsonl
Check the extracted context:
from contextbench.agents import extract_trajectory
result = extract_trajectory("path/to/traj.json")
print(f"Files: {result['pred_files']}")
print(f"Steps: {len(result['pred_steps'])}")
print(f"Spans: {result['pred_spans']}")
Common Issues
Missing line ranges
Some trajectories don’t include explicit line ranges. In this case:
Extract full file content as spans
Or skip steps without line information
Inconsistent file paths
Normalize paths to match gold annotations:
import os
file_path = os.path.normpath(file_path) # Remove ./, ../ etc.
Duplicate context
When computing cumulative context, use union operations:
from contextbench.core.intervals import union
all_spans = {}
for step in steps:
for file, intervals in step["spans"].items():
if file not in all_spans:
all_spans[file] = []
all_spans[file].extend(intervals)
# Union overlapping intervals
for file in all_spans:
all_spans[file] = union(all_spans[file])
Next Steps
See Run Agents on ContextBench for batch evaluation
Understand the Evaluation Pipeline for how trajectories are processed
Explore Agents API for API reference