Agent Trajectory Extractors

ContextBench includes trajectory extractors for different coding agents, exposed via a unified API.

Supported Agents

1. MiniSWE-agent

Format: .traj.json files
Location: contextbench/agents/minisweagent/extract.py

Features:

Extracts file views from bash commands in messages
Supports cat, sed -n, head, grep, nl | sed commands
Parses patch_context_data.patch_context for final context
Returns model patch from info.submission

2. SWE-agent

Format: .checkpoints.jsonl files
Location: contextbench/agents/sweagent/extract.py

Features:

Extracts from str_replace_editor view commands with --view_range
Only includes steps with explicit line ranges
Parses patch_context string format (File:/Lines:)

3. Agentless

Format: Custom JSON format
Location: contextbench/agents/agentless/extract.py

Features:

Extracts localization and repair steps
Parses file access from retrieval outputs
Supports multi-stage reasoning

4. OpenHands

Format: Trajectory logs
Location: contextbench/agents/openhands/extract.py

Features:

Extracts file operations from action logs
Supports browsing and editing actions
Handles multi-file contexts

5. Prometheus

Format: Agent-specific format
Location: contextbench/agents/prometheus/extract.py

Features:

Extracts context from reasoning traces
Supports iterative refinement steps

Unified Interface

All agent extractors use a unified interface:

from contextbench.agents import extract_trajectory

# Automatically detects format based on file extension
result = extract_trajectory("path/to/trajectory.traj.json")
result = extract_trajectory("path/to/trajectory.checkpoints.jsonl")

The extractor returns a unified structure:

{
    "pred_steps": [
        {"files": [...], "spans": {...}},
        ...
    ],
    "pred_files": [...],
    "pred_spans": {...},
    "pred_patch": "...",  # Optional: model-generated patch
}

Return Structure

pred_steps

List of per-step context:

"pred_steps": [
    {
        "files": ["src/utils.py", "src/main.py"],
        "spans": {
            "src/utils.py": [(0, 100), (200, 300)],
            "src/main.py": [(0, 500)]
        }
    },
    ...
]

pred_files

Cumulative set of all viewed files:

"pred_files": ["src/utils.py", "src/main.py", "tests/test.py"]

pred_spans

Cumulative union of all viewed spans:

"pred_spans": {
    "src/utils.py": [(0, 100), (200, 300)],
    "src/main.py": [(0, 500)],
    "tests/test.py": [(0, 1000)]
}

pred_patch

Optional: The model-generated patch (if available):

"pred_patch": "diff --git a/src/utils.py b/src/utils.py\n..."

Adding a New Agent

To add support for a new agent:

1. Create extractor module

Create contextbench/agents/myagent/extract.py:

def extract_trajectory(traj_path: str) -> dict:
    """
    Extract trajectory from MyAgent format.

    Args:
        traj_path: Path to trajectory file

    Returns:
        dict with keys: pred_steps, pred_files, pred_spans
    """
    # Parse trajectory file
    with open(traj_path) as f:
        traj = json.load(f)

    # Extract per-step context
    steps = []
    for step in traj["steps"]:
        files, spans = parse_step(step)
        steps.append({"files": files, "spans": spans})

    # Compute cumulative context
    all_files = compute_union_files(steps)
    all_spans = compute_union_spans(steps)

    return {
        "pred_steps": steps,
        "pred_files": all_files,
        "pred_spans": all_spans,
    }

2. Register in dispatcher

Update contextbench/agents/__init__.py:

from contextbench.agents.myagent.extract import extract_trajectory as extract_myagent

def extract_trajectory(traj_path: str) -> dict:
    if traj_path.endswith(".myagent.json"):
        return extract_myagent(traj_path)
    # ... other formats

3. Add tests

Create tests/test_myagent_extractor.py:

def test_myagent_extraction():
    result = extract_trajectory("test_data/myagent.json")
    assert "pred_files" in result
    assert "pred_spans" in result
    assert len(result["pred_steps"]) > 0

Testing Extractors

Test an extractor on a single trajectory:

python -m contextbench.evaluate \
    --gold data/full.parquet \
    --pred traj_verified-mini/instance/instance.traj.json \
    --out results.jsonl

Check the extracted context:

from contextbench.agents import extract_trajectory

result = extract_trajectory("path/to/traj.json")
print(f"Files: {result['pred_files']}")
print(f"Steps: {len(result['pred_steps'])}")
print(f"Spans: {result['pred_spans']}")

Common Issues

Missing line ranges

Some trajectories don’t include explicit line ranges. In this case:

Extract full file content as spans
Or skip steps without line information

Inconsistent file paths

Normalize paths to match gold annotations:

import os
file_path = os.path.normpath(file_path)  # Remove ./, ../ etc.

Duplicate context

When computing cumulative context, use union operations:

from contextbench.core.intervals import union

all_spans = {}
for step in steps:
    for file, intervals in step["spans"].items():
        if file not in all_spans:
            all_spans[file] = []
        all_spans[file].extend(intervals)

# Union overlapping intervals
for file in all_spans:
    all_spans[file] = union(all_spans[file])

Next Steps

See Run Agents on ContextBench for batch evaluation
Understand the Evaluation Pipeline for how trajectories are processed
Explore Agents API for API reference