Bias and Fairness in AI
Detect, measure, and mitigate bias in machine learning models
Why AI Fairness Matters
AI Bias and Fairness addresses how machine learning systems can discriminate against protected groups. Even well-intentioned models trained on historical data can perpetuate and amplify societal inequalities. Understanding bias sources and mitigation strategies is crucial for ethical AI.
⚠️ The Core Problem
Understanding the Origins of AI Bias
What is AI Bias?
AI bias occurs when machine learning systems systematically favor or discriminate against certain groups, individuals, or outcomes. Unlike random errors that affect predictions equally, bias creates systematic disparities that often disadvantage already marginalized groups.
The critical insight: AI doesn't create bias from nothing. It learns and often amplifies existing patterns in data, design choices, and societal structures. Even models trained on "objective" data can perpetuate historical discrimination.
🎯 Key Distinction: Bias vs Variance
Statistical Bias (model error): Difference between model's expected prediction and true value. This is technical, not necessarily unfair.
Bias = E[f̂(x)] - f(x)
Fairness Bias (social harm): Systematic discrimination against protected groups. This is what we address in fairness research.
Example: A hiring model with 5% error rate for men but 20% error rate for women has fairness bias even if statistical bias is low.
The Four Primary Sources of Bias
AI bias isn't monolithic—it emerges from multiple, often interconnected sources throughout the machine learning pipeline. Understanding each source is crucial for effective mitigation.
1. Data Bias: The Foundation Problem
Data bias occurs when training data fails to represent reality fairly. Since models learn patterns from data, biased data produces biased models—garbage in, garbage out.
Types of Data Bias:
📉 Representation Bias (Sampling Bias)
Training data underrepresents or misrepresents certain groups. The data doesn't reflect the true population distribution.
Example: ImageNet Dataset
Early image recognition datasets were predominantly Western-centric. A model trained on these would fail to recognize non-Western weddings, clothing, or food because those images were severely underrepresented (e.g., 45% US images vs 3% from India despite India having 4× population).
⏳ Historical Bias
Data accurately reflects historical reality, but that reality contains discrimination. The model learns to perpetuate past injustices.
Example: Hiring Data
Amazon's recruiting tool trained on 10 years of resumes (mostly male engineers) learned that male-associated terms correlated with "good hire." The data accurately reflected past hiring, but past hiring was discriminatory. Model penalized resumes containing "women's chess club" or graduates of women's colleges.
🏷️ Label Bias (Measurement Bias)
Labels/outcomes are measured differently or incorrectly across groups, encoding bias into the "ground truth."
Example: Criminal Risk Assessment
Using "arrest" or "conviction" as labels assumes criminal justice system is unbiased. In reality, Black defendants are arrested/convicted at higher rates for similar behavior due to policing patterns. Model learns: "Black → high risk" not because of actual recidivism but because of biased measurement.
🔍 Aggregation Bias
Using a single model for different populations when their relationships between features and outcomes differ.
Example: Medical Diagnostics
Heart attack symptoms differ by gender (men: chest pain; women: nausea, fatigue). A model trained on pooled data learns predominantly male patterns. Result: Lower accuracy for women (women are 50% more likely to be misdiagnosed).
2. Algorithm Bias: Design Choices Matter
Even with perfect data, algorithmic choices—objective functions, regularization, feature selection—can introduce or amplify bias.
🎯 Objective Function Bias
Optimizing overall accuracy often prioritizes majority group performance since they provide most training examples.
Mathematical Example:
Loss = (1/N) Σ loss(ŷᵢ, yᵢ)
Dataset: 900 Group A samples, 100 Group B samples. Achieve 95% accuracy on A, 60% accuracy on B.
Overall accuracy = (900×0.95 + 100×0.60)/1000 = 91.5%
Model is incentivized to improve Group A (larger contribution to loss) while neglecting Group B. 40% error rate on B is acceptable if it improves A!
🔧 Feature Selection Bias
Features that work well for majority may poorly represent minority experiences or encode stereotypes.
Example: Using "years of experience" in hiring models may disadvantage women who took career breaks for caregiving (societal pressure women face). Feature is technically accurate but encodes gendered societal structures.
3. Human Bias: Designer Choices
ML practitioners' implicit biases and cultural assumptions shape system design, from problem framing to feature engineering to evaluation criteria.
🧠 Cognitive Biases in ML Development:
- •Automation Bias: Over-trusting model predictions, assuming AI is "objective" when it reflects training data's biases
- •Confirmation Bias: Interpreting results in ways that confirm existing beliefs about group differences
- •WEIRD Bias: Designing for Western, Educated, Industrialized, Rich, Democratic populations while claiming universality
Example: "Professionalism" Definition
A resume screening tool might penalize non-Western names, non-standard English, or employment gaps—reflecting designers' culturally-specific notions of "professional." These are subjective human judgments encoded as "objective" features.
4. Feedback Loop Bias: Self-Fulfilling Prophecies
The most insidious source: model predictions influence future data, creating self-reinforcing cycles that amplify initial biases exponentially.
📈 How Feedback Loops Work:
Initial Model trained on biased historical data (e.g., more arrests in Black neighborhoods)
Model Predicts higher crime risk in those neighborhoods
Police Deploy more officers to "high-risk" areas based on predictions
More Arrests occur in over-policed areas (not because more crime, but more scrutiny)
New Data shows even higher arrest rates in those neighborhoods
Model Retrains on this data, predictions become even more biased
⚠️ Result: Initial bias compounds with each iteration, creating a self-fulfilling prophecy that's extremely difficult to break.
Example: Content Recommendation
YouTube recommends videos → Users watch recommended content → Model learns these patterns → Recommends similar content → Creates "filter bubbles" and echo chambers → Users see increasingly extreme content → Model amplifies this pattern. Initial 10% bias toward sensational content becomes 80% dominance after iterations.
🔗 Intersecting Biases: The Compounding Effect
These sources don't operate in isolation—they interact and amplify each other:
- →Historical data bias + Objective function that prioritizes accuracy = Model that optimizes for majority group
- →Human biases in feature selection + Feedback loops = Self-reinforcing stereotypes encoded as "objective" patterns
- →Representation bias + Aggregation bias = Systematically poor performance on underrepresented groups
1. Sources of AI Bias
🔍 Interactive: Explore Bias Origins
Data Bias
Training data reflects historical discrimination or underrepresents groups
- •Historical hiring data favors one gender
- •Image datasets with racial imbalance
- •Loan data from redlined neighborhoods
2. Dataset Representation Bias
📊 Interactive: Visualize Data Imbalance
🚨 Severe Imbalance: High risk of poor performance on minority group
Mathematical Foundations of Fairness
The Challenge: Defining "Fair"
Unlike accuracy or precision (which have clear mathematical definitions), fairness is context-dependent and often contested. Different stakeholders have competing notions of what constitutes fair treatment. The machine learning community has formalized several mathematical definitions, each capturing different intuitions about fairness.
Critical insight: These fairness definitions are often mathematically incompatible—satisfying one makes it impossible to satisfy others except in trivial cases. This is known as the Impossibility Theorem of Fairness.
🎯 Notation
A ∈ {0,1}
Protected attribute (0=Group A, 1=Group B)
Example: A=0 (male), A=1 (female)
Y ∈ {0,1}
True outcome/label
Example: Y=1 (qualified), Y=0 (not qualified)
Ŷ ∈ {0,1}
Model prediction
Example: Ŷ=1 (predict hire), Ŷ=0 (predict reject)
S ∈ [0,1]
Model score/probability
Example: S=0.73 (73% probability of positive)
1. Demographic Parity (Statistical Parity)
Mathematical Definition:
Probability of positive prediction is independent of protected attribute
Demographic parity requires that both groups receive positive outcomes at the same rate, regardless of ground truth qualifications. This is "fairness as equal representation in outcomes."
✓ Example: University Admissions
1000 applicants: 500 Group A, 500 Group B
Admit 200 total students (20% rate)
Demographic Parity requires:
Admit 100 from Group A (20%) AND 100 from Group B (20%)
P(admit | A) = 100/500 = 0.20
P(admit | B) = 100/500 = 0.20
✓ 0.20 = 0.20 → Parity satisfied
✓ When to Use:
- • Equal opportunity policies (quotas)
- • Outcomes should match population distribution
- • Historical exclusion needs correction
- • Ground truth labels may be biased
✗ Limitations:
- • Ignores actual qualifications/merit
- • May require unequal treatment
- • Can violate individual fairness
- • Assumes equal base rates across groups
⚠️ Controversial Issue: If groups have genuinely different qualification rates (e.g., different application pools), demographic parity requires selecting less qualified candidates from one group or more qualified from another. Critics call this "reverse discrimination."
2. Equal Opportunity (Equalized TPR)
Mathematical Definition:
True Positive Rate (TPR) / Recall / Sensitivity is equal across groups
Equal opportunity focuses on the qualified individuals. It requires that among people who should get positive outcomes (Y=1), both groups have equal probability of receiving them. This is "fairness as equal treatment of qualified candidates."
✓ Example: Job Hiring
Group A: 80 qualified (Y=1), 20 not qualified (Y=0)
Group B: 60 qualified (Y=1), 40 not qualified (Y=0)
Model predictions:
Group A: Hire 64 of 80 qualified → TPR = 64/80 = 0.80 (80%)
Group B: Hire 48 of 60 qualified → TPR = 48/60 = 0.80 (80%)
✓ Both groups: qualified individuals have 80% chance of being hired
Note: False positive rates can differ. Maybe Group A has 5 false positives (5/20=0.25) while Group B has 8 (8/40=0.20). Equal Opportunity doesn't constrain FPR.
✓ When to Use:
- • Merit-based decisions important
- • False negatives more harmful than false positives
- • "Don't deny qualified people" priority
- • Hiring, admissions, opportunity allocation
✗ Limitations:
- • Ignores false positive rates (can differ)
- • Requires accurate ground truth labels
- • Doesn't address unequal base rates
- • May allow different treatment of unqualified
💡 Intuition: Equal opportunity says "qualified people should have equal chances regardless of group." It's less restrictive than demographic parity—allows different selection rates if groups have different qualification rates, but ensures qualified individuals aren't disadvantaged by their group membership.
3. Equalized Odds (Equalized TPR and FPR)
Mathematical Definition:
AND
Both TPR and FPR must be equal across groups
Equalized odds is the most restrictive: it requires equal treatment of both qualified (Y=1) and unqualified (Y=0) individuals. Groups must have same TPR (opportunity for qualified) and same FPR (protection for unqualified). This is "fairness as equal error rates."
✓ Example: Loan Approval
Group A: 70 will repay (Y=1), 30 will default (Y=0)
Group B: 60 will repay (Y=1), 40 will default (Y=0)
Equalized Odds requires:
TPR (approve those who'll repay):
Group A: 63/70 = 0.90 = Group B: 54/60 = 0.90 ✓
FPR (approve those who'll default):
Group A: 3/30 = 0.10 = Group B: 4/40 = 0.10 ✓
✓ Both error types equalized: 90% of good borrowers get loans, 10% of bad borrowers get loans (both groups)
✓ When to Use:
- • Both types of errors matter
- • False positives and negatives both harmful
- • High-stakes decisions (lending, criminal justice)
- • Want comprehensive fairness guarantee
✗ Limitations:
- • Most restrictive (hardest to satisfy)
- • Often reduces overall accuracy
- • Requires very accurate labels
- • May be impossible with different base rates
🎓 Key Insight: Equalized odds = Equal opportunity + Equal FPR. It's stricter because it protects both qualified people (via TPR) and unqualified people (via FPR) from group-based discrimination. Most "fair" in theory, but hardest to achieve in practice.
The Impossibility Theorem of Fairness
Proven mathematical result: Except in trivial cases (perfect prediction or equal base rates), you cannot simultaneously satisfy demographic parity, equal opportunity, and equalized odds. Fairness is trade-offs, not absolutes.
Why These Metrics Conflict:
Scenario: Medical School Admissions
Population:
Group A: 100 applicants, 80 qualified (80%)
Group B: 100 applicants, 40 qualified (40%)
Perfect classifier (100% accuracy):
Admits all 80 qualified from A, all 40 qualified from B
✓ Equal Opportunity: SATISFIED
TPR(A) = 80/80 = 1.0 = TPR(B) = 40/40 = 1.0
All qualified candidates admitted (both groups)
✓ Equalized Odds: SATISFIED
TPR(A)=TPR(B)=1.0 and FPR(A)=FPR(B)=0.0
Perfect predictions for both groups
✗ Demographic Parity: VIOLATED
P(Ŷ=1|A) = 80/100 = 0.80 ≠ P(Ŷ=1|B) = 40/100 = 0.40
Admission rates differ: 80% vs 40%
⚠️ The Dilemma: To satisfy demographic parity, must admit 60 from each group (equal rates). But Group B only has 40 qualified → must either admit 20 unqualified from B (violates merit) or reject 20 qualified from A (violates equal opportunity). No solution satisfies all constraints!
Mathematical Proof Sketch:
When base rates differ [P(Y=1|A=0) ≠ P(Y=1|A=1)], demographic parity forces equal positive prediction rates, but equal opportunity/equalized odds forces predictions to track true positives. You can't have equal overall rates AND equal conditional rates unless base rates are equal or predictor is perfect. QED.
🤔 How to Choose a Fairness Metric
Since you can't satisfy all metrics, choosing a fairness definition is a value judgment, not a technical decision. It depends on context, stakeholder input, and ethical considerations.
Choose Demographic Parity when:
- • Historical exclusion needs active correction
- • Outcomes should match demographics
- • Ground truth labels may be biased
- • Example: Diverse candidate slates
Choose Equal Opportunity when:
- • Merit/qualifications matter
- • False negatives more harmful
- • "Equal chance if qualified" principle
- • Example: Competitive admissions
Choose Equalized Odds when:
- • Both error types matter equally
- • High-stakes decisions
- • Comprehensive fairness needed
- • Example: Criminal sentencing
⚠️ Critical Warning: No metric is "correct." Each embeds different values. Involve stakeholders (especially affected communities) in choosing fairness criteria. Technical optimization alone cannot resolve ethical questions about how to treat people fairly.
3. Fairness Metrics
📐 Interactive: Compare Fairness Definitions
Demographic Parity
Equal positive prediction rates across groups
The Dynamics of Bias Amplification
Feedback Loops: From Bad to Catastrophic
One of the most dangerous aspects of AI bias is amplification through feedback loops. Unlike static bias that remains constant, feedback loops create self-reinforcing cycles where model predictions influence future data collection, which influences future predictions, creating exponential growth in bias.
A model with just 10% initial bias can reach 80-90% bias after just a few feedback iterations. This isn't a bug—it's the natural consequence of deploying ML systems that interact with the world.
The Feedback Loop Mechanism
Feedback loops occur when model outputs become model inputs in the next iteration. This creates a circular dependency that can amplify small biases exponentially.
The Six-Step Cycle:
Initial Training Data (t=0)
Historical data contains existing bias (e.g., 60% arrests in Black neighborhoods, 40% in white neighborhoods due to historical policing patterns, not actual crime rates)
Model Learns Patterns
Algorithm learns: "Black neighborhood → high crime probability." Not because it's racist, but because it optimizes to fit the biased training data patterns
Deployment & Predictions
Model predicts higher crime risk in Black neighborhoods. System outputs: "Deploy 70% of police to Black neighborhoods, 30% to white neighborhoods"
Human Actions Based on Predictions
Police follow model recommendations. More officers patrol Black neighborhoods → More surveillance, more stops, more scrutiny
New Data Collection (t=1)
More policing → More arrests in Black neighborhoods (not because more crime occurs, but because more eyes are watching). New data shows 75% of arrests in Black neighborhoods
Model Retraining → Amplification
Model retrained on new data (75% vs 25%). Learns even stronger association. Next iteration predicts 80% policing in Black neighborhoods. Cycle continues, bias grows exponentially
🚨 The Catastrophic Result:
Each iteration amplifies the bias. After 3-5 iterations, the system creates a self-fulfilling prophecy: predictions appear "accurate" because the system itself creates the reality it predicts. The bias becomes institutionalized and nearly impossible to detect from accuracy metrics alone.
Mathematical Model of Amplification
We can model bias amplification mathematically to understand the exponential growth dynamics:
Basic Amplification Model:
B(t+1) = B(t) × (1 + α)
Where:
B(t) = bias at time t
α = amplification rate per iteration (typically 0.2-0.5)
Example: Content Recommendation Bias
Initial: 10% of recommended videos are extreme content
Amplification rate: α = 0.3 (30% increase per iteration)
Iteration 0: B(0) = 10%
Iteration 1: B(1) = 10% × 1.3 = 13%
Iteration 2: B(2) = 13% × 1.3 = 16.9%
Iteration 3: B(3) = 16.9% × 1.3 = 22%
Iteration 4: B(4) = 22% × 1.3 = 28.6%
Iteration 5: B(5) = 28.6% × 1.3 = 37.2%
Iteration 10: B(10) = 10% × (1.3)^10 = 137% → 100% (capped)
⚠️ After just 10 iterations (maybe 10 weeks), nearly all recommended content is extreme. Started at 10%, ended at saturation.
General Solution (Exponential Growth):
B(t) = B(0) × (1 + α)^t
This is exponential growth with base (1+α). Similar to compound interest, but for bias instead of money.
Doubling time: t = log(2) / log(1+α)
For α=0.3: doubles every ~2.6 iterations
📊 Key Insight: The amplification rate α depends on how strongly the model's predictions influence future data. Higher influence → faster amplification. This is why deployed systems with direct feedback (recommendations, policing, lending) are especially dangerous.
Real-World Amplification Cases
🚨 Predictive Policing (PredPol)
Initial bias: Historical arrest data overrepresented minority neighborhoods (legacy of discriminatory policing)
Iteration 1: System predicts high crime in those areas → More police deployed
Iteration 2-5: More arrests in predicted areas (observation bias) → Model confidence increases → Even more policing
Result: After multiple iterations, system creates massively disproportionate policing. Studies found some neighborhoods had 10× policing intensity vs demographically similar areas, purely due to algorithmic feedback loop
📱 YouTube Radicalization Pipeline
Initial: User watches one conspiracy video (10% of watch history)
Amplification: Algorithm notices high engagement (controversial content gets clicks) → Recommends similar content → User watches more → Algorithm updates: "This user likes extreme content" → Recommends even more extreme videos
Outcome: Studies showed users can go from mainstream content to extremist content in 5-10 recommendations (hours/days). The algorithm doesn't "intend" radicalization—it optimizes for watch time, but creates radicalization pipeline as side effect
💳 Credit Scoring Spiral
t=0: Model denies loan to marginalized group at slightly higher rate due to historical bias
t=1: Denied applicants can't build credit history → Next model sees "no history = high risk" → Even higher denial rate
t=2-3: Compound effect: No loans → No assets → Lower income → Model predicts higher default risk → More denials
Generational impact: Creates self-perpetuating poverty cycles. Initial 5-10% disparity becomes 40-50% wealth gap over decades
Breaking Feedback Loops
Feedback loops are hard to break because the system appears to be working correctly (predictions match observations). Intervention requires understanding the causal structure:
✓ Effective Interventions:
- • Randomized deployments: Randomly vary predictions to gather unbiased data (like A/B testing but for fairness)
- • Causal modeling: Model counterfactuals ("what would have happened if...") instead of just correlations
- • External data sources: Use data not influenced by model predictions
- • Temporal discounting: Weight recent data less if it's influenced by model
✗ Ineffective (Common Mistakes):
- • Just retraining: Using biased new data amplifies rather than fixes bias
- • Removing protected attributes: Doesn't stop feedback if proxies exist
- • Accuracy monitoring alone: Feedback loops can increase accuracy while worsening fairness
- • One-time correction: Feedback loops require continuous monitoring, not one-time fixes
🔬 Research Finding: Studies show that once a feedback loop is established, it requires 3-5× more effort to reverse than it would have taken to prevent initially. Prevention is far easier than cure—design systems to avoid feedback loops from the start.
⚠️ Critical Takeaway: Bias Amplification is the Default
Without active intervention, deployed ML systems with any feedback component will naturally amplify existing biases. This isn't a failure of engineering—it's a mathematical consequence of optimizing on biased feedback. Fairness requires constant vigilance, not just good initial training.
4. Bias Amplification Over Time
🔄 Interactive: Feedback Loop Simulation
🔄 Feedback Loop: Model predictions influence future data collection, creating a self-reinforcing cycle. Small initial biases compound exponentially without intervention.
5. Protected Attributes
🛡️ Interactive: Identify Sensitive Features
Protected attributes are characteristics that should not influence decisions in fair systems. Click to toggle which attributes to protect:
⚠️ Note: Removing protected attributes from features doesn't guarantee fairness. Proxy variables (zip code for race, name for gender) can still encode bias.
Disparate Impact: Legal and Statistical Framework
From Legal Doctrine to ML Metric
Disparate impact is a legal concept that became a fundamental fairness metric in machine learning. Unlike disparate treatment (intentional discrimination), disparate impact focuses on outcomes: even neutral policies can be discriminatory if they disproportionately harm protected groups.
The 80% Rule (Four-Fifths Rule) is the most widely used quantitative standard for detecting discrimination in employment, lending, and increasingly, in AI systems. It provides a clear mathematical threshold for when outcome disparities constitute evidence of bias.
Legal Origins and Framework
The disparate impact doctrine originated from Griggs v. Duke Power Co. (1971), a landmark U.S. Supreme Court case. The court ruled that employment practices with discriminatory effects violate civil rights law, even without discriminatory intent.
Historical Case: Griggs v. Duke Power (1971)
Situation: Company required high school diploma for employment
Claimed Purpose: "Ensure quality workers" (race-neutral policy)
Impact: 34% of white applicants had diploma vs 12% of Black applicants
Disparate Impact Ratio: 12/34 = 0.35 = 35% (far below 80%)
Court Decision: Policy was discriminatory despite neutral language because (1) disparate impact existed and (2) diploma requirement wasn't related to job performance (couldn't be justified as business necessity).
Disparate Treatment
- • Intentional discrimination
- • Different rules for different groups
- • Requires proof of intent
- • Example: "No women allowed"
Disparate Impact
- • Unintentional, outcomes-based
- • Same rules, different results
- • Proven by statistical evidence
- • Example: Height requirements excluding women
The 80% Rule: Mathematical Definition
Formal Definition:
Where SR = Selection Rate for each group
The 80% Rule states: The selection rate for the protected group should be at least 80% (four-fifths) of the selection rate for the group with the highest rate. Falls below this threshold trigger legal scrutiny.
✓ Step-by-Step Calculation Example
Scenario: Loan Approval System
Group A (majority): 500 applicants, 400 approved
Group B (minority): 300 applicants, 180 approved
Step 1: Calculate Selection Rates
SR_A = 400/500 = 0.80 = 80%
SR_B = 180/300 = 0.60 = 60%
Step 2: Identify Min and Max
min(80%, 60%) = 60% (Group B)
max(80%, 60%) = 80% (Group A)
Step 3: Calculate Ratio
DI Ratio = 60% / 80% = 0.75 = 75%
Step 4: Interpret
75% < 80% → Disparate impact detected!
Group B's approval rate is only 75% of Group A's rate. This falls below the 80% threshold and constitutes prima facie evidence of discrimination requiring justification.
💡 Why 80%? The threshold is somewhat arbitrary (established in 1978 Uniform Guidelines on Employee Selection), but has become the legal standard. It represents "substantial disparity"—a 20% difference is considered significant, while smaller differences might be due to chance or legitimate factors.
Relationship to Statistical Parity
The 80% Rule is closely related to statistical parity (demographic parity) but is less strict. Statistical parity requires exact equality (100% ratio), while the 80% Rule allows for some disparity (recognizing real-world variability).
Statistical Parity (Strict)
P(Ŷ=1|A=0) = P(Ŷ=1|A=1)
Requires: Exactly equal selection rates
Example: 80% = 80% (ratio: 100%)
✓ Ideal fairness standard
✗ May be unrealistic/impossible
80% Rule (Pragmatic)
min/max ≥ 0.80
Allows: Up to 20% disparity
Example: 72% vs 90% (ratio: 80%)
✓ Legally enforceable threshold
✓ Accounts for statistical noise
Interpretation Guide:
Legal Burden and Defenses
Proving disparate impact shifts the burden of proof to the employer/system deployer. The three-stage legal framework:
Stage 1: Prima Facie Case (Plaintiff)
Show that 80% Rule is violated using statistical evidence. If ratio < 80%, establishes presumption of discrimination. Burden shifts to defendant.
Stage 2: Business Necessity Defense (Defendant)
Must prove the practice is job-related and consistent with business necessity. Not enough to say "our model is accurate"—must show discriminatory criteria are essential.
Example Defenses:
- • ✓ Physical strength requirement for construction workers (genuinely necessary)
- • ✗ College degree for janitor position (not necessary for job)
- • ✓ Credit check for financial officer (related to job duties)
- • ✗ Credit check for retail cashier (not clearly necessary)
Stage 3: Less Discriminatory Alternative (Plaintiff)
If business necessity shown, plaintiff can still prevail by proving a less discriminatory alternative exists that achieves the same business goal. Forces consideration of fairness in design choices.
⚠️ For ML Systems: "Our algorithm is accurate" is not a valid defense for disparate impact. Must show (1) the features causing disparity are necessary for the stated purpose and (2) no alternative approach with less disparity exists. This is a high bar!
Applying the 80% Rule to ML Systems
The 80% Rule provides a clear, testable criterion for ML fairness. Unlike abstract fairness concepts, it gives developers and auditors a specific threshold to measure against.
Practical Implementation Steps:
- 1.Identify Protected Groups: Determine which attributes (race, gender, age, etc.) are protected by law in your jurisdiction
- 2.Measure Selection Rates: Calculate positive prediction rate for each group (even if protected attribute not used as feature—test on outcomes!)
- 3.Compute DI Ratio: min/max of selection rates. Flag if < 80%
- 4.Document Justification: If ratio < 80%, document business necessity and explore alternatives
- 5.Monitor Continuously: DI ratio can drift over time due to feedback loops or distribution shift
✓ Advantages:
- • Clear numerical threshold (80%)
- • Legally recognized standard
- • Easy to compute and explain
- • Focuses on outcomes, not process
- • Courts understand it (decades of precedent)
✗ Limitations:
- • Binary comparison (only 2 groups at once)
- • Doesn't consider error types (FP vs FN)
- • 80% threshold somewhat arbitrary
- • Doesn't account for intersectionality
- • May conflict with accuracy/merit
🔍 Best Practice: Use the 80% Rule as a minimum baseline, not the only fairness criterion. Systems passing the 80% Rule can still have unfair error distributions (unequal TPR/FPR). Combine with equal opportunity or equalized odds for comprehensive fairness.
🎯 Key Takeaway: Quantifying Discrimination
The 80% Rule bridges law and machine learning by providing a concrete, enforceable standard for disparate impact. ML practitioners should treat it as a mandatory baseline check, not an optional "nice-to-have." Violating the 80% Rule exposes organizations to legal liability and ethical scrutiny.
6. Disparate Impact: The 80% Rule
⚖️ Interactive: Hiring Bias Simulator
Below 80% threshold
📋 80% Rule: Legal guideline that selection rate for protected group should be at least 80% of the highest group's rate. Below this threshold suggests discrimination.
7. Bias Mitigation Strategies
🛠️ Interactive: Apply Mitigation Techniques
No Mitigation
Standard training without fairness constraints
8. Group-Specific Thresholds
🎚️ Interactive: Adjust Decision Boundaries
✅ Equal Thresholds: Same decision boundary for both groups
🎯 Threshold Adjustment: One post-processing technique is to set different classification thresholds for different groups to achieve equal error rates or equal opportunity.
9. Real-World Bias Cases
📰 Interactive: Learn from History
Amazon Hiring Tool
10. Fairness-Accuracy Tradeoff
⚖️ Interactive: Balance Competing Goals
⚖️ The Tradeoff: Balanced approach considers both fairness and accuracy. Often the best practical choice.
🎯 Key Takeaways
Bias Has Multiple Sources
Data, algorithms, human designers, and feedback loops all contribute to AI bias. Address all sources, not just one.
No Universal Fairness Metric
Demographic parity, equal opportunity, and equalized odds are incompatible. Choose based on context and stakeholder values.
Feedback Loops Amplify Bias
Initial small biases compound over time as model predictions influence future data. Monitor and intervene continuously.
Removing Features Isn't Enough
Proxy variables (zip code, name, school) can encode protected attributes. Audit for disparate impact even without explicit sensitive features.
Multiple Mitigation Approaches
Pre-processing (data), in-processing (training), and post-processing (predictions) each have tradeoffs. Often combine multiple techniques.
Fairness vs Accuracy Tradeoff
Some accuracy loss is often acceptable for fairness. The "right" balance depends on domain, risks, and ethical considerations. No technical solution alone.