PhilosophyELO vs OPR: Why We Need Both
A Brief History of ELO
The ELO rating system was developed by physicist Arpad Elo in the 1960s for chess. Today it's used across competitive domains: FIFA world rankings, League of Legends matchmaking, FiveThirtyEight's NFL predictions, and professional esports. The system's power lies in its ability to predict outcomes and adapt based on results.
Why Not Just Use OPR?
If OPR estimates a team's scoring contribution, why can't we just add up OPRs to predict winners?
The Problem: Close Matches
Consider these two outcomes:
| Match | Red Score | Blue Score | Result |
|---|
| Match A | 200 | 198 | Red Wins |
| Match B | 50 | 48 | Red Wins |
OPR sees these as completely different matches (200 vs 50 points). But for winning, they're equally valuable - a 2-point victory either way. ELO captures this: both red alliances get similar rating boosts because both achieved the outcome that matters.
When Each Metric Shines
📊 Use OPR For:
- Predicting expected scores
- Evaluating robot hardware capability
- Alliance selection scouting
- Identifying high-scoring partners
🎯 Use ELO For:
- Predicting match winners
- Measuring competitive success rate
- Bracket placement and seeding
- Cross-regional ranking
💡 Key Insight: A team scoring 150 points per match (high OPR) but consistently losing 180-150 will have lower ELO than a team scoring 120 but winning 120-110. OPR says the first team has a better robot; ELO says the second team wins more often. Both are true - they measure different things.
Core MetricNormalized cELO: The Best of Both Worlds
Normalized Cumulative ELO (cELO) combines competitive success with absolute performance, adjusted for regional strength and meta evolution. It's our most comprehensive single metric for ranking teams globally.
The Three-Level System
Event ELOIsolated rating from a single event's matches
cELO (Cumulative ELO)Running total across all matches, exponentially weighted toward recent performance
Normalized cELOcELO adjusted for regional strength and blended with cOPR-based absolute performance
Recency Weighting
Teams improve throughout the season. To reflect current skill rather than historical averages, we apply exponential decay weighting to match importance. Recent matches contribute significantly more to your rating than matches from weeks or months ago.
$$ w(t) = e^{-\lambda \cdot \Delta t} $$
Where Δt is days since the match and λ is the decay parameter. This ensures ratings reflect a team's current skill level.
The Regional Normalization Challenge
Consider two teams with identical cELO ratings:
- Team A: Dominates a weaker region (15-0 record, avg opponent ELO: below average)
- Team B: Competes in an elite region (8-7 record, avg opponent ELO: well above average)
Which team is truly stronger? Raw ELO can't distinguish between "big fish in small pond" and "contender among elites."
Hybrid Normalization
Our normalization blends two components to create a globally fair rating:
Evolution Scaling
To prevent artificial rating ceilings and account for meta evolution (teams collectively improving as the season progresses), the entire ELO scale adjusts proportionally to global scoring trends.
As teams collectively improve and raise the scoring ceiling, the ELO scale naturally inflates to match. A world-class team today might rate differently than a world-class team from an earlier season due to meta evolution.
Example: Cross-Regional Comparison
A team with a perfect record in a weak region but low scoring ability will be normalized down, while a team with a mediocre record in an elite region but high scoring ability will be normalized up. This enables meaningful cross-region comparisons.
Use Cases
- Cross-regional team comparisons and world rankings
- Championship seeding and advancement predictions
- Identifying underrated teams from highly competitive regions
- Multi-season historical comparisons despite meta evolution
PerformanceCumulative Offensive Power Rating (cOPR)
While ELO measures ability to win, cOPR measures ability to score points. It isolates an individual team's contribution to alliance scores, with exponentially higher weight given to recent events.
The Alliance Score Problem
FTC matches are 2v2, but we only observe total alliance scores. If Red Alliance (Teams 123 + 456) scores 180 points, how much did each team contribute individually?
Linear System Solution
We model alliance scores as a linear system across many matches:
$$ \text{cOPR}_{\text{Team}_1} + \text{cOPR}_{\text{Team}_2} \approx \text{Alliance Score} $$
Over an event with N teams and M matches, this creates an overdetermined system \( Ax = b \), solved using Weighted Least Squares Regression.
Time-Weighted Recency
Teams improve throughout the season. To emphasize current performance, recent matches receive significantly higher weight than older ones using exponential decay:
- Most recent matches: Full weight
- Older matches: Progressively less influence (exponential decay)
This makes cOPR more predictive of current capability than a simple average across all events.
💡 Why Weighted? A team that scored poorly at their first event but now scores well should be rated closer to their current ability, not dragged down by early-season struggles.
TrendMomentum
Momentum quantifies the rate of improvement over time. It answers: "Is this team getting better, staying stable, or declining?"
Methodology
We perform Weighted Least Squares regression on match scores over time, with higher weights on recent matches. The slope of the fitted line represents points-per-match improvement rate.
$$ \text{Score}(t) = \beta_0 + \beta_1 \cdot t + \epsilon $$
Where β₁ (the slope) indicates improvement direction:
- Positive slope: Improving performance
- Near-zero slope: Stable performance
- Negative slope: Declining performance
The raw slope is normalized to a 0-100 scale for interpretability, with 50 representing stable (no trend).
ReliabilityConsistency Index
Consistency measures how reliably a team performs near their average. High consistency means few "bad matches," while low consistency indicates volatility.
Mathematical Foundation
Based on the Coefficient of Variation (CV):
$$ CV = \frac{\sigma}{\mu} $$
Where σ is standard deviation and μ is mean score. We invert and scale this to 0-100, where CV = 0 (perfect consistency) maps to 100.
💡 Why It Matters: A team with high variance is riskier for eliminations than a team scoring slightly lower but with tight consistency. Alliance captains should consider this when picking!
PenaltiesFoul cOPR
Foul cOPR estimates the average penalty points a team gives to opponents per match. Like scoring OPR, penalties are reported per alliance, so we use the same linear system approach to isolate individual responsibility.
Lower Foul cOPR is better. A team with a high Foul cOPR contributes significant penalty points to opponents per match on average - something to watch out for during alliance selection!
Time-Weighted Evolution
Foul cOPR uses the same recency weighting as scoring cOPR. Teams that clean up their driving or fix problematic mechanisms will see rapid improvement in this metric.
ContextSchedule Grade
Schedule Grade measures the strength of opposition a team has faced. It answers: "Did this team earn their record against tough opponents or easy ones?"
How It Works
We calculate the average ELO rating of all opposing alliances a team has faced. This is then converted to a letter grade (A+ through F) based on how it compares to the typical opponent strength across all teams.
Interpretation
- High Grade (A+/A): Team has faced tough opponents. Their record is "battle-tested."
- Mid Grade (B/C): Team has faced average competition.
- Low Grade (D/F): Team has faced weak opponents. Their record may be inflated.
💡 Note: A high schedule grade combined with a winning record is extremely valuable - it means the team has proven themselves against quality opponents.
ContextStrength of Schedule (SOS)
Strength of Schedule is the raw numerical value behind the Schedule Grade. It represents the average ELO rating of opposing alliances faced.
Calculation
For each match, we identify the opposing alliance and average their ELO ratings. The team's SOS is the average of all these opposing alliance ratings across all matches played.
$$ \text{SOS} = \frac{1}{n} \sum_{i=1}^{n} \left( \frac{\text{Opp}_1 \text{ ELO} + \text{Opp}_2 \text{ ELO}}{2} \right)_i $$
Use Cases
- Contextualizing win/loss records
- Comparing teams with similar records but different opposition quality
- Identifying teams that may be underrated due to tough schedules
Ranking PointsRP Reliability
In FTC Into The Deep (DECODE), Ranking Points determine tournament seeding. Beyond just winning, teams can earn bonus RPs for achieving specific game objectives. RP Reliability estimates the probability of earning each bonus RP type in the next match.
Bayesian Inference with Recency
We blend three statistical approaches:
- Historical Success Rate: Long-term track record
- Recency Weighting: Recent matches weighted exponentially higher
- Bayesian Smoothing: Prevents overfitting to small samples (e.g., a single success shouldn't mean 100% probability)
$$ P(\text{RP}) = \frac{\sum_{i} w_i \cdot \text{Success}_i + \text{Prior Successes}}{\sum_{i} w_i + \text{Prior Trials}} $$
Where wᵢ are recency weights. This produces robust probabilities that adapt quickly to new strategies (like a new autonomous path) without overreacting to outliers.
Special Case: Never Achieved
If a team has never earned a particular RP in any match, their probability for that RP is forced to 0% - we won't predict they'll suddenly achieve something they've never demonstrated.
PredictionsMatch Win Probability
Given two alliances, what's the probability each alliance wins?
ELO-Based Probability
The probability Alliance A defeats Alliance B follows a logistic curve:
$$ P(A \text{ wins}) = \frac{1}{1 + 10^{(R_B - R_A) / D}} $$
Where RA and RB are alliance ratings (sum of both teams' Normalized cELOs) and D is a scaling constant.
Score Prediction Enhancement
We also estimate expected scores using cOPR and Foul cOPR:
$$ \text{Expected Score}_A = \sum \text{cOPR}_{A} + \sum \text{Foul cOPR}_{B} $$
Alliance A's expected score equals their teams' combined scoring ability plus penalties they'll draw from Alliance B.
💡 Two Models, One Prediction: If ELO predicts Red wins but score prediction favors Blue, we flag this as a high-uncertainty match requiring further analysis.
Event AnalysisEvent Difficulty
Event Difficulty quantifies how challenging an event is based on the strength of competing teams relative to the current global competition level. This dynamic system adapts throughout the season as teams improve.
Dynamic Percentile-Based Rating
Unlike absolute thresholds, difficulty ratings are calculated using global percentiles that update as the season progresses:
Why Relative Ratings Matter
An event with an average top-8 ELO of 1350 in Week 1 might be rated Elite (10/10) because teams are just starting out. That same 1350 ELO event in Week 20 might only rate Moderate (5/10) because the global competition has improved significantly.
Similarly, a team scoring 50 points in Week 1 could be considered highly competitive, while 60 points in Week 20 might be below average—it's all relative to the current meta.
Calculation Method
The system calculates difficulty by:
- Identifying Top Teams: Takes the average ELO of the top 8 teams at the event
- Computing Global Percentiles: Calculates current percentiles (p99, p90, p70, p50, etc.) from ALL teams in the season
- Mapping to Scale: Compares the event's top-8 average to global percentiles to assign a 1-10 score
Difficulty Scale
Ratings are assigned based on where the event falls in the global distribution:
- Elite (9-10): Top 1-10% of global competition—championship-caliber field
- High (7-8): Top 10-30%—very strong regional competition
- Moderate (4-6): Top 30-70%—typical competitive event
- Low (2-3): Bottom 30-50%—developing region or early-season event
- Beginner (1): Bottom 10%—entry-level competition
💡 Key Insight: The same teams at the same event could receive different difficulty ratings depending on when the event occurs. This ensures ratings always reflect current competitive context, not arbitrary fixed thresholds.
Applications
- Contextualize team performance relative to competition strength
- Compare events across different regions and time periods fairly
- Predict advancement probabilities for Championship events
- Help teams strategize for event selection and preparation
AnalysisUpset Detection
An upset occurs when the predicted loser wins a match. We track upsets to identify matches with unexpected outcomes.
How We Detect Upsets
Before each match, we calculate win probabilities based on team ratings. If the underdog (team with lower win probability) wins, and the favorite had a confident prediction, we flag it as an upset.
Upset Magnitude
Not all upsets are equal. A 45% underdog winning is barely an upset, but a 15% underdog winning is shocking. The magnitude is calculated based on how confident the original prediction was.
💡 Why Track Upsets? They reveal which teams overperform under pressure, which matchups are volatile, and where our predictions need improvement.