ContextBench Logo

Getting Started

  • Installation
    • Requirements
    • Install from Source
    • Install Dependencies
    • Verify Installation
    • Optional: Development Installation
    • Troubleshooting
  • Quick Start
    • Basic Usage
    • Example with Trajectory
    • Understanding the Output
      • Key Metrics
    • Environment Variables
    • Next Steps
  • Evaluation Pipeline
    • Overview
    • Pipeline Steps
      • 1. Trajectory Extraction
      • 2. Repository Checkout
      • 3. Symbol Extraction
      • 4. Gold Context Loading
      • 5. Metric Computation
      • 6. Result Aggregation
    • Granularity Levels
      • File-Level
      • Symbol-Level
      • Span-Level
      • EditLoc-Level
    • Supported Formats
    • Next Steps

User Guide

  • Evaluation
    • Run a single evaluation
    • Common options
    • Next steps
  • Agent Trajectory Extractors
    • Supported Agents
      • 1. MiniSWE-agent
      • 2. SWE-agent
      • 3. Agentless
      • 4. OpenHands
      • 5. Prometheus
    • Unified Interface
    • Return Structure
      • pred_steps
      • pred_files
      • pred_spans
      • pred_patch
    • Adding a New Agent
      • 1. Create extractor module
      • 2. Register in dispatcher
      • 3. Add tests
    • Testing Extractors
    • Common Issues
      • Missing line ranges
      • Inconsistent file paths
      • Duplicate context
    • Next Steps
  • Metrics
    • Granularity Levels
    • Core Metrics
      • Coverage (Recall)
      • Precision
      • F1 Score
    • Trajectory Metrics
      • AUC-Coverage
      • Redundancy
      • Per-Step Coverage
    • Granularity-Specific Details
      • File-Level Metrics
      • Symbol-Level Metrics
      • Span-Level Metrics
      • EditLoc-Level Metrics
    • Aggregation Methods
      • Macro Average
      • Micro Average
    • Interpreting Results
      • High Coverage, Low Precision
      • Low Coverage, High Precision
      • Balanced F1
      • High Redundancy
      • Low AUC-Coverage
    • Next Steps
  • Datasets
    • Hugging Face
    • Local files

Advanced Usage

  • Run Agents on ContextBench
    • Runner entrypoint
    • Common examples
    • Task lists
    • See also
  • Process Trajectories
    • CLI
    • See also
  • Environment Variables
    • Common variables

API Reference

  • Core API
    • merge()
    • length()
    • intersect()
    • intersect_size()
    • line_to_byte()
    • checkout()
  • Parsers API
    • Gold
      • Gold.__init__()
      • Gold.files()
      • Gold.byte_spans()
      • Gold.byte_spans_init()
      • Gold.line_spans_init()
    • GoldLoader
      • GoldLoader.__init__()
      • GoldLoader.get()
      • GoldLoader.size()
    • parse_diff()
    • parse_trajectory()
    • load_pred()
    • Step
      • Step.__init__()
    • load_traj_file()
    • parse_custom()
  • Extractors API
    • extract_defs()
    • extract_def_set_in_spans()
    • extract_def_set_from_symbol_names()
    • available()
  • Metrics API
    • coverage_precision()
    • compute_granularity_metrics()
    • compute_trajectory_metrics()
    • span_total_bytes()
    • span_intersection_bytes()
  • Agents API
    • extract_trajectory()

Additional Information

  • Leaderboard
    • Live Leaderboard
    • Current Rankings
      • Main Board (Verified Split)
      • Backbone Model Comparison
    • Key Findings
      • The Bitter Lesson of Coding Agents
      • Recall vs. Precision Trade-off
      • Explored vs. Utilized Context Gap
    • Submitting Results
    • Evaluation Criteria
    • Benchmark Variants
      • Verified
      • Pro
      • Poly
      • Multi
    • Next Steps
  • Citation
    • BibTeX
    • Paper
    • Abstract
    • Related Work
      • Tree-sitter
      • SWE-bench
    • Acknowledgements
    • Contact
    • License
  • Contributing
  • License
ContextBench
  • Overview: module code

All modules for which code is available

  • contextbench.agents
  • contextbench.core.fileio
  • contextbench.core.intervals
  • contextbench.core.repo
  • contextbench.extractors.treesitter
  • contextbench.metrics.compute
  • contextbench.parsers.custom_parser
  • contextbench.parsers.diff
  • contextbench.parsers.gold
  • contextbench.parsers.trajectory

© Copyright 2026, ContextBench Research Group (Nanjing University & University College London).

Built with Sphinx using a theme provided by Read the Docs.