Parsers API

Parsers module.

class contextbench.parsers.Gold(data: dict)[source]

Bases: object

Gold context for one instance.

__init__(data: dict)[source]
files() List[str][source]

Get merged file list from init+add.

byte_spans(repo_dir: str) Dict[str, List[Tuple[int, int]]][source]

Get merged byte intervals per file from init+add.

byte_spans_init(repo_dir: str) Dict[str, List[Tuple[int, int]]][source]

Get byte intervals from init_ctx only (for EditLoc gold).

line_spans_init() Dict[str, List[Tuple[int, int]]][source]

Get line intervals from init_ctx only (for EditLoc gold based on lines).

Returns {file: [(start_line, end_line)]} where lines are inclusive.

class contextbench.parsers.GoldLoader(path: str)[source]

Bases: object

Lazy loader for gold contexts.

__init__(path: str)[source]
get(instance_id: str) Gold | None[source]

Get gold context by ID.

size() int[source]

Number of indexed IDs.

contextbench.parsers.parse_diff(diff_text: str, repo_dir: str) Dict[str, List[Tuple[int, int]]][source]

Extract edited byte ranges per file from unified diff.

contextbench.parsers.parse_trajectory(data: dict) Tuple[List[Step], Step | None][source]

Parse trajectory from unified agent data format.

Parameters:

data – dict with ‘traj_data’ containing: - pred_steps: list of {‘files’: […], ‘spans’: {…}} - pred_files: final file list - pred_spans: final span dict

Returns:

(trajectory_steps, final_step)

contextbench.parsers.load_pred(path: str) List[dict][source]

Load prediction data from JSON/JSONL or trajectory files.

class contextbench.parsers.Step(files=None, spans=None, symbols=None)[source]

Bases: object

One retrieval step.

__init__(files=None, spans=None, symbols=None)[source]
contextbench.parsers.load_traj_file(traj_file: str) dict[source]

Load trajectory file using unified agent interface.

contextbench.parsers.parse_custom(path: str) List[dict][source]

Parse custom trajectory format into ContextBench unified format.

Override this function when using –agent custom in contextbench.process_trajectories convert.

Parameters:

path – File or directory path containing your agent’s trajectory output. May be a single file, a directory of instance subdirs, or a JSONL file.

Returns:

  • instance_id (str): e.g. “owner__repo-12345”

  • traj_data (dict): Required. Must contain at least one of:
    • pred_steps: List[dict], each step has:
      • files: List[str] - file paths viewed at this step

      • spans: Dict[str, List[dict]] - {file_path: [{“start”: int, “end”: int}, …]}

      • symbols: Dict[str, List[str]] - optional, {file_path: [symbol_name, …]}

    • pred_files: List[str] - final context file list

    • pred_spans: Dict[str, List[dict]] - {file_path: [{“start”: int, “end”: int}, …]}

  • model_patch (str): Optional. Final patch for EditLoc metric.

Return type:

List of dicts, each with

Example traj_data:
{
“pred_steps”: [

{“files”: [“src/foo.py”], “spans”: {“src/foo.py”: [{“start”: 1, “end”: 10}]}, “symbols”: {}}, …

], “pred_files”: [“src/foo.py”, “src/bar.py”], “pred_spans”: {“src/foo.py”: [{“start”: 1, “end”: 10}], “src/bar.py”: [{“start”: 5, “end”: 20}]}

}

Raises:

NotImplementedError – Override this in your module.