Parsers API

Parsers module.

class contextbench.parsers.Gold(data: dict)[source]

Bases: object

Gold context for one instance.

byte_spans(repo_dir: str) → Dict[str, List[Tuple[int, int]]][source]: Get merged byte intervals per file from init+add.

byte_spans_init(repo_dir: str) → Dict[str, List[Tuple[int, int]]][source]: Get byte intervals from init_ctx only (for EditLoc gold).

line_spans_init() → Dict[str, List[Tuple[int, int]]][source]

Get line intervals from init_ctx only (for EditLoc gold based on lines).

Returns {file: [(start_line, end_line)]} where lines are inclusive.

class contextbench.parsers.GoldLoader(path: str)[source]

Bases: object

Lazy loader for gold contexts.

contextbench.parsers.parse_diff(diff_text: str, repo_dir: str) → Dict[str, List[Tuple[int, int]]][source]: Extract edited byte ranges per file from unified diff.

contextbench.parsers.parse_trajectory(data: dict) → Tuple[List[Step], Step | None][source]

Parse trajectory from unified agent data format.

Parameters:: data – dict with ‘traj_data’ containing: - pred_steps: list of {‘files’: […], ‘spans’: {…}} - pred_files: final file list - pred_spans: final span dict
Returns:: (trajectory_steps, final_step)

contextbench.parsers.load_pred(path: str) → List[dict][source]: Load prediction data from JSON/JSONL or trajectory files.

class contextbench.parsers.Step(files=None, spans=None, symbols=None)[source]

Bases: object

One retrieval step.

contextbench.parsers.load_traj_file(traj_file: str) → dict[source]: Load trajectory file using unified agent interface.

contextbench.parsers.parse_custom(path: str) → List[dict][source]

Parse custom trajectory format into ContextBench unified format.

Override this function when using –agent custom in contextbench.process_trajectories convert.

Parameters:

path – File or directory path containing your agent’s trajectory output. May be a single file, a directory of instance subdirs, or a JSONL file.

Returns:

Return type:

List of dicts, each with

Example traj_data:

{

“pred_steps”: [: {“files”: [“src/foo.py”], “spans”: {“src/foo.py”: [{“start”: 1, “end”: 10}]}, “symbols”: {}}, …

], “pred_files”: [“src/foo.py”, “src/bar.py”], “pred_spans”: {“src/foo.py”: [{“start”: 1, “end”: 10}], “src/bar.py”: [{“start”: 5, “end”: 20}]}

}