Methodology

Every number on Ref Geek traces to a calculation. This page documents the formulas, sample minimums, and interpretation notes for each stat we publish. Each section has a stable anchor link so you can cite or reference a specific formula directly.

Stats are computed by aggregate refresh jobs that read from event tables (penalties, game assignments, on-ice players). User-facing pages always read from the materialized aggregates; we never query raw event tables on the read path.

Attribution model

Every penalty in a game attributes to both refereesassigned to that game. A ref's season stats are computed across every game they worked, regardless of partner. This produces robust per-ref sample sizes (~70 games per active ref per season) and matches how hockey people reason about officials — by name, not by pair.

When the NHL API or a broadcast specifically identifies which referee made a call, we store that as metadata for incident-level review and AI-writer claims, but it does not change primary stat calculation. Pair-level analysis exists as a small-sample drill-down, never as a primary stat.

Core per-60 ratesspec §2

Formula

Penalties Per 60 = total penalties called × 3600 / total game seconds officiated. The denominator is the actual elapsed game time across the games this referee worked, summed in seconds: 3600 for a regulation finish, 3600 + (300 − OT clock) for a regular-season OT goal, 3900 for a regular-season shootout, and 3600 + (completed OT periods × 1200) + (1200 − last OT clock) for playoff OT. Major / Minor / Misconduct / Game Misconduct breakdowns share the same denominator with a severity filter on the numerator. OT-rate uses overtime-only seconds in the denominator so a 60-minute multi-OT playoff game is honestly normalized.

Period-rate (P1 / P2 / P3) bars are calls × 3 / games — true per-60 of period time given each regulation period is 20 minutes. Score-state buckets (tied / one-goal / blowout) remain per-game because we don't yet ingest time-in-state granularity.

Per-call-type ratesspec §2.5

Formula

For each of the 17 canonical call types (HOOKING, TRIPPING, INTERFERENCE, CROSS_CHECKING, ROUGHING, SLASHING, HIGH_STICKING, HOLDING, BOARDING, CHARGING, ELBOWING, GOALIE_INTERFERENCE, TOO_MANY_MEN, DELAY_OF_GAME, UNSPORTSMANLIKE, INSTIGATOR, FIGHTING_MAJOR), we count occurrences and compute calls × 3600 / total game seconds officiated for the per-60 rate. NHL descKey values map to canonical codes via penalty_type_nhl_mappings; bench-coded variants like INTERFERENCE_BENCH collapse into the same canonical code as their on-ice counterpart but are preserved in penalty_events.metadata.descKeyfor downstream stats that need the distinction (Discipline, future PP-creation flag).

The “League pct” column is sample-filtered: refs with fewer than 3 calls of that type (or fewer than 10 games overall) don't contribute to the percentile distribution.

Stripe Scorespec §8.1

Formula

Weighted composite of Consistency, Home Cooking, Leverage, Gap Rhythm, Discipline, Pace Impact, and PP Efficiency. 0–100 scale. Weights (sum to 1.00 at full coverage): consistency 0.25, home cooking 0.15, leverage 0.15, gap rhythm 0.15, discipline 0.15, pace impact 0.075, PP efficiency 0.075. Each component is z-scored against the league baseline before weighting; weights are renormalized when a component is null for the ref so partial coverage still produces a usable score.

Higher = more conventional / consistent. Tier bands map score ranges to labels (Strong, Stable, Volatile, Outlier) for at-a-glance reading.

Consistency Indexspec §8.2

Formula

Standard deviation of call rates across periods within individual games, averaged over the ref's games in scope. Lower = more consistent (calls penalties at a steady rate regardless of period).

Home Cooking Coefficientspec §4.1 / §8.3

Formula

Per-game differential of road-team penalties minus home-team penalties, averaged over the ref's games in scope. Positive = home team benefits (more calls go against the road team). League average sits near 0; a ref with +1.7 calls 1.7 more penalties per game on the road team than the home team relative to the league baseline.

Leverage Gradespec §4.9 / §8.4

Formula

Call rate in high-leverage minutes (third period of one-goal games, overtime, playoffs) divided by the ref's overall rate. Ratio format: 1.0 = consistent, <1.0 = swallows the whistle in big moments, >1.0 = calls more in big moments. The displayed letter grade is a band over the z-score against the league distribution.

Gap Rhythmspec §8.9

Formula

Standard deviation (in seconds) of gaps between consecutive penalty calls within individual games, averaged over the ref's games. Low = metronomic; high = bursts and droughts. Paired with the average gap (see Gap Between Calls) so the mean and spread are visible together.

Discipline Ratingspec §8.6

Formula

(misconducts + bench minors + failed coach's challenges) per 60 minutes worked. Misconducts come straight from penalty_severity. Bench minors are identified by NHL descKey match (BENCH or *_BENCH variants like INTERFERENCE_BENCH, UNSPORTSMANLIKE_CONDUCT_BENCH). Legacy rows without the descKey metadata fall back to a player-null heuristic.

Known limitation: the failed-coach-challenges term is currently stubbed at 0 until coach-challenge ingestion ships. The other two terms are live.

Coach Challenge Record (ref)spec §6.3

Formula

Per ref: coach_challenges_total= number of challenges in this ref's games (any type). coach_challenges_upheld + coach_challenges_overturnedcount only non-league-initiated challenges where outcome has been determined; the sum may be less than total when outcome is still NULL or when a challenge was league-initiated. Reads from coach_challenge_eventsjoined to the ref's assigned games.

Outcome detection runs inline during ingestion via a score-state heuristic (UPHELD if a goal was removed; OVERTURNED if score state held; NULL when the signal isn't clear). Ratings should not over-interpret refs with low total counts; challenges are rare events.

Make-Up Window Ratespec §1.3 / §8.8

Formula

Percentage of the ref's games in which a penalty on Team A was followed by a penalty on Team B within 10 minutes (600 seconds). Computed via a LAG window function over penalty events partitioned by game; a game qualifies if any consecutive cross-team pair sits inside the window.

Pattern only — never describe as the ref "evening things up." The stat measures how often cross-team penalty pairs cluster, regardless of cause.

First Penalty Timingspec §1.1

Formula

Average elapsed time within Period 1 before the first penalty is called, across the ref's games. Games with zero P1 penalties contribute nothing — not a zero — so the average isn't pulled down by quiet games. Displayed as M:SS.

Gap Between Calls (avg)spec §1.2

Formula

Average elapsed seconds between consecutive penalty calls within games the ref worked. Shares the LAG-over-penalty-events scaffolding with Make-Up Window and Gap Rhythm (Gap Rhythm is the standard deviation of the same gap distribution).

Penalty Density by Periodspec §1.5

Formula

Calls per game in each period (P1, P2, P3, OT). Each period rate is total calls in that period divided by games officiated. The bar chart on the ref page sizes each bar relative to the period with the most calls.

Late-Period Suppressionspec §1.6

Formula

1 − (10 × calls in the final 2 minutes of P1–P3 ÷ total P1–P3 calls). Each regulation period is 20 minutes; a proportional rate would have 10% of P1–P3 calls in the last 2 minutes. 0 = proportional. 0.34 = 34% fewer than proportional ("swallowing the whistle"). Negative = calls more late.

Power Play Chain Ratespec §1.11

Formula

Of all PP-creating penalties the ref called, the fraction followed by another penalty in the same game before the first PP expires. PP-creating heuristic: severity in (MINOR, DOUBLE_MINOR, MAJOR) and no offsetting partner. The lookahead window for each penalty is its own penalty_duration_minutes × 60 seconds.

Known limitation: the heuristic infers PP-creation from severity + non-offsetting status. The penalty_events.power_play_resulted column is not yet populated by ingestion, so coincidental majors and similar edge cases may be miscounted. Tightens once power_play_resulted wires up.

First / Final 5-Minute Deltaspec §1.9

Formula

Opening rate − closing rate, in calls per minute. Opening 5 = P1 with period_time_elapsed_seconds ≤ 300. Closing 5 = P3 with period_time_remaining_seconds ≤ 300. Each window is 5 minutes per game, so the denominator is 5 × games_total. Positive = starts hot, finishes soft. Negative = finishes hot.

OT Call Ratespec §1.8

Formula

ot_calls / games_with_ot. Per-overtime-game rate, not per-total-game. The ref's overall games count includes every regulation finish, so dividing OT calls by all games understates the true frequency by an order of magnitude. games_with_ot is computed from games.went_to_overtime per ref-game pairing. NULL when the ref has worked no OT games in scope; the period bar chart on the ref profile shows the OT bar empty in that case.

Known limitation:the spec target is calls per 20 minutes of overtime. Regular-season OT is sudden-death 5-min, so most OT periods end short of 5 minutes. Once shift-chart data flows through more reliably we'll switch the denominator to actual OT-minutes-worked × (1 / 20) × 60 for true per-20-min comparability across regular-season and playoff OT.

High-TOI vs Low-TOI Call Ratiospec §7.1

Formula

(high_toi_calls / low_toi_calls) × 0.455. Bucket each penalty by the penalized player's TOI tier classification: STAR + TOP_6 → high_toi_calls; DEPTH + FOURTH_LINE → low_toi_calls. The 0.455 factor is the approximate ratio of average ice time between the two tiers (~10 min/game vs ~22 min/game), so the output is normalized to 1.0 = proportional to ice time. <1.0 means the ref calls fewer penalties on top-tier forwards than expected (star leniency); >1.0 means the inverse. NULL when either bucket has zero calls.

Known limitation: the 0.455 TOI-exposure constant is a fixed approximation. True per-tier ice time varies game to game; a future tightening will use actual TOI distributions from player_toi_tier_classifications. MIDDLE_6 tier is excluded from both numerator and denominator to keep the contrast clean.

Score-State Calling (close vs blowout)spec §4.6

Formula

Three per-game call rates split by score margin at the moment of the call: tied = score_margin = 0, one-goal = |margin| = 1, blowout = |margin| ≥ 3. Counts come from penalty_events.score_margin(set during ingestion); rates are calls / games_officiated. Same bar pattern as Penalty Density by Period.

Known limitation: these are calls per game, not per time-in-state. A ref's tied-score rate is influenced by how often their games are tied at all. Once scoring-play timestamps land we'll switch to true rate- per-minute-in-state.

One-Goal Game Suppressionspec §1.7

Formula

Per-ref share of P3 calls in one-goal score state, expressed as a deviation from the league baseline share. 1 − (ref_share / league_share) where ref_share = ref's P3 calls with |score_margin| ≤ 1 / ref's P3 callsand the league share is the same ratio computed across every ref in scope (sum-of-sums, so refs with more games weight correctly). Positive = ref calls fewer P3 close-game penalties than the league norm. NULL until both shares populate.

Known limitation:The spec target is a true rate-per-time comparison (P3 calls per 20 min in one-goal games vs overall P3 rate). That requires per-game time-in-state tracking we don't have yet. The share-based proxy is meaningful relative to other refs but doesn't reflect raw exposure ratios. Tightens once scoring-play timestamps land and league_baselines stores time-weighted shares.

Crunch-Time Discipline (player)spec §7.4

Formula

Player metric. Counts penalties this player took in late-game close situations and divides by games played in scope. Late + close is defined as: period ≥ 3 (3rd period or overtime), period_time_remaining_seconds ≤ 300 (last 5 minutes of the period), and |score_margin| ≤ 1(one-goal game). Drawn penalties don't count — discipline is about avoiding the box, not earning calls. Stored in player_officiating_stats.crunch_time_disciplineas a per-game rate (numeric(5,3)). Higher = worse discipline when the result's on the line.

Known limitation:Per-game (not per-60) because we don't track ice time spent incrunch-time situations. A player's rate is partially driven by how often his team plays close games at all. Once time-in-state ingestion lands the rate becomes per-60 with crunch-time TOI as denominator.

Penalty Differential vs Teamspec §3.1

Formula

penalties_for_team − penalties_against_team, where "for" means penalties called on the team's opponent (giving them a power play) and "against" means penalties called on the team itself. Positive differential means the team gets more PPs than they give up under this ref. Per-game differential normalizes by games worked together.

PP Opportunities Differentialspec §3.2

Formula

pp_opportunities_for_team − pp_opportunities_against_team. Strict subset of Penalty Differential: counts only penalties that actually created a power play (a penalty on the team's opponent for "for", on the team itself for "against"), filtered by the same heuristic as Power Play Chain Rate (severity in MINOR/DOUBLE_MINOR/MAJOR with no offsetting partner). The PP scoreboard view of the same relationship: a +6 PP Diff means the team got 6 more power plays than they gave up under this ref.

Often close to Penalty Differential since most penalties create PPs. The difference shows up most when there's a fight or coincidental call cluster, where penalties offset and don't move the PP scoreboard.

Most Favorable / Unfavorable Refsspec §3.4 / §3.5

Formula

For each team, refs sorted by signed Penalty Differential. Top 5 by largest positive and largest negative are surfaced. Filtered to ref/team pairings with at least 5 games in scope to avoid small-sample artifacts.

Framing per spec: "Largest positive / negative differential." Never "biased against," "targets," or "favors." The stat is a pattern; we do not attribute intent.

Home / Road Differential (team)spec §3.6

Formula

(penalties_against_road / road_games) − (penalties_against_home / home_games)for each team. Positive = team takes more penalties on the road than at home; negative = team takes more at home. Computed in the team-season-officiating refresh from the same per-game stats that populate penalties_against, split by is_home.

Surfaced as the “Home / Road” KPI on the team profile. NULL when the team hasn't played at least one home and one road game in scope.

Team Discipline Indexspec §3.8

Formula

(misconducts + bench_minors + failed_coach_challenges) / games. Per-game composite of self-inflicted disciplinary penalties on this team. Misconducts come from penalty_severity IN (MISCONDUCT, GAME_MISCONDUCT). Bench minors are identified by canonical TOO_MANY_MEN OR descKey-based bench match (BENCH, *_BENCH variants); legacy rows fall back to a player-null heuristic on minor severity.

Known limitation: the failed-coach-challenges term is currently stubbed at 0 until the coach_challenge_events ingestion lands. The other two terms are live.

Team Call-Type Profilespec §3.7

Formula

For each (team, canonical_code) pair: taken_count (penalties on this team of this type), drawn_count (penalties on the opponent of this type while this team played), and per-game rates taken_per_game = taken_count / games, drawn_per_game = drawn_count / games. League baselines are the AVG of the per-team rates across all teams in scope (per canonical_code). The team profile shows the top types taken and top types drawn, with a vs-league delta column.

Sample sufficiency requires the team to have played at least 5 games AND total count (taken + drawn) of that type ≥ 3. Below threshold the row is still rendered but the league baseline isn't emphasized.

Sample minimums

Stats below the sample threshold are still shown but tagged with sample_sufficient = false. We exclude them from leaderboards and from league percentile distributions so a ref with a handful of games doesn't skew the league baseline.

Per-ref season stats: 10 games.
Per-(ref, team) pairings: 5 games.
Per-call-type rates: 3 calls AND ref ≥ 10 games.
Pair (ref + ref) stats: navigation only; no primary metrics.

Framing rules

Every stat description on the site follows three rules:

Describe a pattern, never imply intent. "Largest negative differential" instead of "biased against."
Tie controversial labels to a formula. "Phantom Call" only with reviewer confirmation.
Make small samples explicit. Numbers below threshold ship with a warning, not silently.

AI writer pipeline

Ref Geek's articles — pre-game scouting, post-game recap, weekly column, playoff series preview / review — are produced by a deterministic pipeline that drafts from grounded data, validates the output, and routes for human review at the right altitude. The pipeline below is the same shape across every content type.

Grounded.Every article is drafted from a structured Data Pack pulled from Ref Geek's aggregate tables — ref season stats, team season officiating, player TOI tier classifications, finalized penalty events. No free-form opinion, no facts not in the pack.
Validated.Every draft runs through five checks before a publishing decision is made: banned phrases (no “targets,” “biased,” “wants to,” “obviously”), word-count band per content type, tone (no exclamation points, no emoji, low all-caps tolerance), hallucination (every cited number must trace to a Data Pack value or a reasonable rounded variant), and name consistency (every proper noun must appear in the Data Pack's name roster). Validation failures route to human review regardless of importance score.
Routed by stakes.Pre-game and post-game articles in the AUTO band (importance 0–30) auto-publish. STAGED band (31–60) auto-publishes with a “not yet reviewed” banner that the future review queue can clear retroactively. MANDATORY band (61+), all weekly columns, and all playoff-series pieces require Senior Reviewer signoff before publication.
Auditable. The public accuracy page tracks every article: count published per content type, reviewer signoff rate, articles in queue, articles rejected, and any logged corrections. Reviewer-edits and corrections are stored on the article row so the audit trail is queryable.
Voice.Pattern language only. The article generator never asserts referee intent, never uses judgment language, never characterizes a referee as “biased” or “favoring” a side. The system prompt is verbatim identical across every content type so the same voice rules apply uniformly.
Honest about missing data.When a Data Pack field is null — for example, NHL play-by-play didn't attribute a specific call to a specific referee that week — the prompt is told to skip that section rather than fabricate a plausible-looking number.

Corrections policy. Articles that ship with a factual error get logged in article_corrections with type (minor / significant / retraction), description, and reviewer attribution. Recent corrections appear on /accuracy.