PolicyBench

PolicyAsLanguage - PolicyBench v1.0 Results

133 of 147 (90.5%) policies fully verified, 96.1% fixture-level pass rate.

TierTool Model versionv1.0.3 Evaluated2026-05-05 Evaluator1.0 OPA1.16.1 Corpus147 entries

Headline

Self-reported VALID Compile OK Fixture PASS (entry-level) Fixture pass rate (fixture-level) Self-report agreement
99.3% 99.3% 90.5% 96.1% 0.912

By category

Category Entries Compile OK Fixture PASS (entry) Fixture pass rate
application_authz 3 100.0% 100.0% 100.0%
iac_scanning 65 100.0% 90.8% 96.1%
kubernetes_admission 79 98.7% 89.9% 96.0%

Performance

Per-call latency from the runner's recorded latency_ms.

Calls timed p50 mean p95 max Total
146 1.93 s 1.98 s 2.98 s 4.75 s 4.8 min

Quality flag distribution

No quality flags raised.

Notable disagreements

Entries where the runner's self-reported status disagrees with the harness verdict. These are the most informative entries for understanding model blind spots.

No notable self-report / harness disagreements.

Tool homepage: https://policyaslanguage.com

Badge

Embed this on your site to show your PolicyBench score:

PolicyBench score badge for PolicyAsLanguage

HTML

<a href="https://policybench.dev/models/policyaslanguage.html">
  <img src="https://policybench.dev/badges/policyaslanguage.svg" alt="PolicyBench: 90.5%" />
</a>

Markdown

[![PolicyBench: 90.5%](https://policybench.dev/badges/policyaslanguage.svg)](https://policybench.dev/models/policyaslanguage.html)

Direct link: https://policybench.dev/badges/policyaslanguage.svg. The badge is regenerated whenever the underlying score changes.

Other tools