Agent Trajectory Extractors
============================

ContextBench includes trajectory extractors for different coding agents, exposed via a unified API.

Supported Agents
----------------

1. MiniSWE-agent
~~~~~~~~~~~~~~~~

- **Format**: ``.traj.json`` files
- **Location**: ``contextbench/agents/minisweagent/extract.py``

**Features**:

- Extracts file views from bash commands in messages
- Supports ``cat``, ``sed -n``, ``head``, ``grep``, ``nl | sed`` commands
- Parses ``patch_context_data.patch_context`` for final context
- Returns model patch from ``info.submission``

2. SWE-agent
~~~~~~~~~~~~

- **Format**: ``.checkpoints.jsonl`` files
- **Location**: ``contextbench/agents/sweagent/extract.py``

**Features**:

- Extracts from ``str_replace_editor view`` commands with ``--view_range``
- Only includes steps with explicit line ranges
- Parses ``patch_context`` string format (``File:/Lines:``)

3. Agentless
~~~~~~~~~~~~

- **Format**: Custom JSON format
- **Location**: ``contextbench/agents/agentless/extract.py``

**Features**:

- Extracts localization and repair steps
- Parses file access from retrieval outputs
- Supports multi-stage reasoning

4. OpenHands
~~~~~~~~~~~~

- **Format**: Trajectory logs
- **Location**: ``contextbench/agents/openhands/extract.py``

**Features**:

- Extracts file operations from action logs
- Supports browsing and editing actions
- Handles multi-file contexts

5. Prometheus
~~~~~~~~~~~~~

- **Format**: Agent-specific format
- **Location**: ``contextbench/agents/prometheus/extract.py``

**Features**:

- Extracts context from reasoning traces
- Supports iterative refinement steps

Unified Interface
-----------------

All agent extractors use a unified interface:

.. code-block:: python

   from contextbench.agents import extract_trajectory

   # Automatically detects format based on file extension
   result = extract_trajectory("path/to/trajectory.traj.json")
   result = extract_trajectory("path/to/trajectory.checkpoints.jsonl")

The extractor returns a unified structure:

.. code-block:: python

   {
       "pred_steps": [
           {"files": [...], "spans": {...}},
           ...
       ],
       "pred_files": [...],
       "pred_spans": {...},
       "pred_patch": "...",  # Optional: model-generated patch
   }

Return Structure
----------------

pred_steps
~~~~~~~~~~

List of per-step context:

.. code-block:: python

   "pred_steps": [
       {
           "files": ["src/utils.py", "src/main.py"],
           "spans": {
               "src/utils.py": [(0, 100), (200, 300)],
               "src/main.py": [(0, 500)]
           }
       },
       ...
   ]

pred_files
~~~~~~~~~~

Cumulative set of all viewed files:

.. code-block:: python

   "pred_files": ["src/utils.py", "src/main.py", "tests/test.py"]

pred_spans
~~~~~~~~~~

Cumulative union of all viewed spans:

.. code-block:: python

   "pred_spans": {
       "src/utils.py": [(0, 100), (200, 300)],
       "src/main.py": [(0, 500)],
       "tests/test.py": [(0, 1000)]
   }

pred_patch
~~~~~~~~~~

Optional: The model-generated patch (if available):

.. code-block:: python

   "pred_patch": "diff --git a/src/utils.py b/src/utils.py\n..."

Adding a New Agent
------------------

To add support for a new agent:

1. Create extractor module
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create ``contextbench/agents/myagent/extract.py``:

.. code-block:: python

   def extract_trajectory(traj_path: str) -> dict:
       """
       Extract trajectory from MyAgent format.
       
       Args:
           traj_path: Path to trajectory file
           
       Returns:
           dict with keys: pred_steps, pred_files, pred_spans
       """
       # Parse trajectory file
       with open(traj_path) as f:
           traj = json.load(f)
       
       # Extract per-step context
       steps = []
       for step in traj["steps"]:
           files, spans = parse_step(step)
           steps.append({"files": files, "spans": spans})
       
       # Compute cumulative context
       all_files = compute_union_files(steps)
       all_spans = compute_union_spans(steps)
       
       return {
           "pred_steps": steps,
           "pred_files": all_files,
           "pred_spans": all_spans,
       }

2. Register in dispatcher
~~~~~~~~~~~~~~~~~~~~~~~~~

Update ``contextbench/agents/__init__.py``:

.. code-block:: python

   from contextbench.agents.myagent.extract import extract_trajectory as extract_myagent

   def extract_trajectory(traj_path: str) -> dict:
       if traj_path.endswith(".myagent.json"):
           return extract_myagent(traj_path)
       # ... other formats

3. Add tests
~~~~~~~~~~~~

Create ``tests/test_myagent_extractor.py``:

.. code-block:: python

   def test_myagent_extraction():
       result = extract_trajectory("test_data/myagent.json")
       assert "pred_files" in result
       assert "pred_spans" in result
       assert len(result["pred_steps"]) > 0

Testing Extractors
------------------

Test an extractor on a single trajectory:

.. code-block:: bash

   python -m contextbench.evaluate \
       --gold data/full.parquet \
       --pred traj_verified-mini/instance/instance.traj.json \
       --out results.jsonl

Check the extracted context:

.. code-block:: python

   from contextbench.agents import extract_trajectory
   
   result = extract_trajectory("path/to/traj.json")
   print(f"Files: {result['pred_files']}")
   print(f"Steps: {len(result['pred_steps'])}")
   print(f"Spans: {result['pred_spans']}")

Common Issues
-------------

Missing line ranges
~~~~~~~~~~~~~~~~~~~

Some trajectories don't include explicit line ranges. In this case:

- Extract full file content as spans
- Or skip steps without line information

Inconsistent file paths
~~~~~~~~~~~~~~~~~~~~~~~~

Normalize paths to match gold annotations:

.. code-block:: python

   import os
   file_path = os.path.normpath(file_path)  # Remove ./, ../ etc.

Duplicate context
~~~~~~~~~~~~~~~~~

When computing cumulative context, use union operations:

.. code-block:: python

   from contextbench.core.intervals import union
   
   all_spans = {}
   for step in steps:
       for file, intervals in step["spans"].items():
           if file not in all_spans:
               all_spans[file] = []
           all_spans[file].extend(intervals)
   
   # Union overlapping intervals
   for file in all_spans:
       all_spans[file] = union(all_spans[file])

Next Steps
----------

- See :doc:`run_agent_on_contextbench` for batch evaluation
- Understand the :doc:`pipeline` for how trajectories are processed
- Explore :doc:`api/agents` for API reference