A two-stage scoring engine: an LLM classifies the work, deterministic algorithms compute the weight. Reproducible, file-by-file, no black box.
What ETV measures and why it exists.
Lines of code. Commit count. Story points. DORA. Each answers a different question. None answer the one that matters: did the output get more valuable, or just more numerous?
ETV (Engineering Throughput Value) is a unit of performance, produced by a measurement engine that reads code the way a senior engineer does — not just what changed, but what it meant, where it landed, and whether it touched the architecture.
The Engineering Throughput Value applied per file, per merged commit, combines five factors:
Each file contributes to one of three buckets: Growth, Maintenance, or Fixes. The buckets stay separate. Two engineers with identical totals can be doing very different work, and the score shows it.
ETV is answerable from commit history alone. No PM tools, no surveys, no self‑report.
Growth, Maintenance, Fixes — additive within, never across.
Every file change is classified into one of three buckets. Scores are additive inside a bucket (you can sum Growth ETV across a quarter for an org), but deliberately not additive across buckets. Two engineers with identical total ETV can be doing very different work.
Growth
New functionality and net-new capabilities. Added endpoints, new modules, new product surface area.
Maintenance
Upkeep, refactors, cleanup, performance tuning, tests, dependency updates, docs, style, build, CI.
Fixes
Work that corrects previous output — bug fixes, regressions, hotfixes. Each fix is traced back to the commit that introduced it.
Per-commit, per-file, deterministic structure with ML inside.
For each merged commit, the engine produces three sub-scores — one for Growth, one for Maintenance, one for Fixes. Each sub-score is assembled per file and summed across the commit. The structure of the score itself is deterministic; machine-learning components are used to tune thresholds and coefficients inside it, and a large-language-model classifier resolves ambiguous work classifications where pattern-based signals are insufficient.
Per-file sub-score
Each per-file sub-score begins with a context complexity signal derived from the structural properties of the change. That signal is then scaled by an engagement multiplier that captures the ratio of surrounding context complexity to the complexity of the change itself — targeted modifications in complex areas score higher than equivalent changes in trivial code. Several decay and amplification factors are then applied (see below).
Where ML enters — and where it doesn't
ML tunes thresholds and coefficients within the deterministic structure. An LLM classifies ambiguous changes into Growth / Maintenance / Fixes when pattern-based signals are insufficient, and traces bug-fix commits back to the commit that introduced the issue. The structure of the score itself is deterministic. The same diff produces the same score on every run.
A feature graph inferred from code, used to weight changes by where they land.
Before any decay or amplification factors are applied, Navigara builds a structural model of each repository — a feature graph. The graph is derived from code organization alone (no external metadata, no PM tooling) and informs how per-file scores are weighted.
Feature graph
An AI analysis discovers distinct named features (e.g. auth, billing, checkout) and assigns each to a vertical layer — frontend, backend, or data. Edges between feature nodes capture inter-feature dependencies.
Commit → feature mapping
Each commit is mapped to one or more features via weighted path scoring (exact path match > directory containment > filename affinity). Files are no longer treated atomically; they're located inside a product surface.
Architecture multiplier
The structural complexity of the surrounding feature contributes a multiplier to each per-file sub-score. A change inside a deeply connected feature with many cross-feature dependencies carries more weight than the same change inside a peripheral one.
Inputs: code structure only. No connected ticketing system, no external metadata. When multiple repositories are connected, the graph extends across them via shared libraries and API contracts.
Where credit is reduced — and where it's amplified.
Several factors run before the per-file score is finalized. Three dampeners reduce credit when a change exists but doesn't represent genuine cognitive work; one multiplier amplifies fixes when the surrounding signals say the bug was costly.
Similarity dampener
Reduces credit for changes that are structurally similar to existing code — mechanical refactors and copy-paste patterns.
Blame decay
Discounts changes that overwrite very recent work by the same author. The signal fades over a short business-day window — rewriting your own code from yesterday is partial credit; revisiting it weeks later is scored normally.
Copy decay
Reduces credit when a high proportion of added lines are duplicated from elsewhere in the codebase.
Waste multiplier
For Fixes only. The score is amplified based on three signals: how long the original code existed, whether the fix targets another author's code, and how frequently the affected area has been modified recently. A trivial self-fix on code written the same day barely moves the score; a fix in a high-churn area on code the fixer has never touched before is amplified substantially.
Generated code, lockfiles, binaries — out before scoring.
The filter list runs before any scoring and is identical across organizations.
*_generated or *.gen.* patterns.go.sum, package-lock.json, yarn.lock, Cargo.lock, poetry.lock, etc.dist/, node_modules/, hashed outputs, anything checked in by mistake.Full structural analysis for 13 languages; partial for the rest.
Full analysis
Go · Java · JavaScript · TypeScript · C · C++ · C# · Kotlin · Python · PHP · Ruby · Rust · Scala · Swift
Function-scope context complexity, within-file engagement, cross-repo data-flow analysis.
Partial analysis
HTML · CSS · SQL · Terraform · shell · YAML · Markdown
Classification runs as normal; mechanical fidelity is reduced because structural parsing is shallower.
How per-commit scores become the headline number.
The scoring engine produces three sub-scores per commit. The report layer collapses them into a single scalar — Engineering Throughput Value (ETV) — and aggregates upward through three levels.
Per commit
ETV per commit = sum of the three sub-scores (Growth + Maintenance + Fixes) for that commit.
Per SWE, per quarter
Sum of ETV across all qualifying commits authored by that engineer in the quarter.
Per org, per quarter
Mean ETV across the organization's qualifying SWEs that quarter.
Cross-org aggregate
Developer-weighted mean. Every qualifying SWE contributes one observation per quarter, weighted equally regardless of organization size. A 30-person org and a 200-person org each get pooled engineer-by-engineer — not org-by-org. This avoids the headline being dominated by the largest org's mean.
Figures in the report label the scalar as "performance" for readability. The formal definition is ETV.
Who gets credit for a commit.
Primary credit goes to the git author of the merged commit, after email-alias resolution. Co-authors are tracked but do not receive score credit. Attribution uses the merge date, not the commit date — so out-of-order merges land in the quarter they actually shipped.
Automation accounts (dependabot, renovate, github-actions) are excluded by default. Organizations can flag additional bots or service accounts.
What ETV does not measure.
ETV is descriptive, not normative. It tells you what shipped, not whether the right thing shipped.
The three buckets, or KTLO + Growth.
Growth, Maintenance, and Fixes is the canonical view. For executive reporting some teams prefer a two-bucket view: Maintenance + Fixes combined into Keep The Lights On (KTLO), with Growth on its own. The underlying three-category data is unchanged — KTLO is a visualization choice, not a different metric.
Sample floor, fixed-panel sanity check, the 418-engineer constant-population result, OpenAI's four-quarter window, and the complete list of 66 repositories analyzed.