Benchmarks Module¶
The benchmarks module provides a framework for measuring inference engine
performance. All benchmarks implement the BaseBenchmark ABC and are
registered via BenchmarkRegistry. The BenchmarkSuite runner executes
a collection of benchmarks and aggregates results into JSONL or summary
format.
Abstract Base Class and Runner¶
BaseBenchmark¶
BaseBenchmark
¶
Bases: ABC
Base class for all benchmark implementations.
Subclasses must be registered via
@BenchmarkRegistry.register("name") to become discoverable.
Attributes¶
description
abstractmethod
property
¶
Human-readable description of what this benchmark measures.
Functions¶
run
abstractmethod
¶
run(engine: InferenceEngine, model: str, *, num_samples: int = 10) -> BenchmarkResult
BenchmarkResult¶
BenchmarkResult
dataclass
¶
BenchmarkResult(benchmark_name: str, model: str, engine: str, metrics: Dict[str, float] = dict(), metadata: Dict[str, Any] = dict(), samples: int = 0, errors: int = 0)
Result from running a single benchmark.
BenchmarkSuite¶
BenchmarkSuite
¶
BenchmarkSuite(benchmarks: Optional[List[BaseBenchmark]] = None)
Run a collection of benchmarks and aggregate results.
Source code in src/openjarvis/bench/_stubs.py
Functions¶
run_all
¶
run_all(engine: InferenceEngine, model: str, *, num_samples: int = 10) -> List[BenchmarkResult]
Run all benchmarks and return a list of results.
Source code in src/openjarvis/bench/_stubs.py
to_jsonl
¶
to_jsonl(results: List[BenchmarkResult]) -> str
Serialize results to JSONL format (one JSON object per line).
Source code in src/openjarvis/bench/_stubs.py
summary
¶
summary(results: List[BenchmarkResult]) -> Dict[str, Any]
Create a summary dict from benchmark results.