January 15, 2025
RBI's AI Governance Expectations: What Banks Actually Need to Implement
A private sector bank deployed an AI-powered loan approval system. Six months in, RBI examiners asked three questions:
- “Show us the model’s decision logic for this rejected application.”
- “What monitoring do you have for model performance degradation?”
- “How do you ensure the model doesn’t discriminate based on geography or community?”
The bank had answers for question one (sort of). For questions two and three, they had nothing.
This is increasingly common. RBI has taken a thoughtful approach - rather than rushing a single AI regulation, they’ve built AI governance expectations into existing frameworks across multiple circulars covering IT governance, digital lending, and outsourcing. This integrated approach means banks need to understand the full regulatory landscape. Examiners are asking pointed questions, and banks that aren’t prepared face uncomfortable conversations.
The Regulatory Landscape
RBI’s AI-related expectations emerge from several sources:
Master Direction on IT Governance (2023)
Key requirements relevant to AI:
- Risk-based approach: IT risks (including AI/ML model risks) must be identified, assessed, and managed
- Board oversight: Board must understand and oversee technology risks
- Third-party risk: Vendors and cloud providers must meet security standards
- Audit trails: Complete logging of system activities
Credit Information Companies Regulations
For AI in credit decisioning:
- Explainability: Customers have a right to understand why credit was denied
- Dispute resolution: Process for customers to challenge AI-assisted decisions
- Data accuracy: Obligation to ensure data used in AI models is accurate
Digital Lending Guidelines (2022)
Specific to AI in lending:
- Algorithm audit: Lending service providers must ensure algorithms are auditable
- Fair practices: AI must not result in discriminatory lending
- Disclosure: Key facts about automated decision-making must be disclosed
Outsourcing Guidelines
When using third-party AI/ML services:
- Due diligence: Thorough assessment of AI vendors
- Data protection: Customer data must be protected
- Business continuity: AI services must not create single points of failure
What This Means Technically
Let’s translate regulatory language into technical requirements:
Requirement 1: Model Inventory and Risk Classification
You need to know what AI/ML models you’re running and their risk levels.
class ModelInventory:
"""
RBI expects banks to maintain a complete inventory of AI/ML models
"""
def __init__(self):
self.models = {}
def register_model(self, model_id: str, metadata: dict):
risk_level = self.assess_risk(metadata)
self.models[model_id] = {
'name': metadata['name'],
'purpose': metadata['purpose'],
'owner': metadata['owner'],
'deployment_date': metadata['deployment_date'],
# Risk classification
'risk_level': risk_level, # 'critical', 'high', 'medium', 'low'
'customer_impacting': metadata.get('customer_impacting', False),
'financial_impact': metadata.get('financial_impact', 'low'),
# Governance metadata
'last_validation_date': None,
'next_review_date': None,
'validation_reports': [],
'change_history': [],
}
return self.models[model_id]
def assess_risk(self, metadata: dict) -> str:
"""
Risk classification based on RBI expectations
"""
# Critical: Direct financial decisions affecting customers
if metadata.get('purpose') in ['credit_decisioning', 'fraud_block', 'aml_alert']:
if metadata.get('autonomous', False):
return 'critical'
return 'high'
# High: Customer-facing with indirect financial impact
if metadata.get('customer_impacting', False):
return 'high'
# Medium: Internal operations with potential customer effect
if metadata.get('purpose') in ['customer_segmentation', 'risk_scoring']:
return 'medium'
return 'low'
A sample model inventory for a typical bank:
| Model | Purpose | Risk Level | Validation Frequency |
|---|---|---|---|
| Credit Scorecard v3 | Loan approval | Critical | Quarterly |
| Fraud Detection | Transaction blocking | Critical | Monthly |
| AML Alerting | Suspicious activity | Critical | Monthly |
| Customer Churn | Retention targeting | Medium | Semi-annual |
| Cross-sell Propensity | Marketing | Low | Annual |
| Chatbot NLU | Customer service | Medium | Quarterly |
Requirement 2: Model Validation Framework
RBI expects independent validation of AI models, especially those in the critical/high risk categories.
flowchart TB
subgraph Development["Model Development"]
A[Data Scientists Build Model] --> B[Unit Testing]
B --> C[Development Validation]
end
subgraph Validation["Independent Validation"]
D[Model Risk Team Review]
E[Conceptual Soundness]
F[Data Quality Assessment]
G[Performance Testing]
H[Bias & Fairness Testing]
I[Stress Testing]
end
subgraph Approval["Governance"]
J[Model Risk Committee]
K[Documentation Review]
L[Approval / Rejection]
end
subgraph Production["Production"]
M[Deployment]
N[Ongoing Monitoring]
O[Periodic Revalidation]
end
C --> D
D --> E --> F --> G --> H --> I
I --> J --> K --> L
L -->|Approved| M --> N
N --> O
O -->|Trigger| D
L -->|Rejected| A
Validation must cover:
class ModelValidationFramework:
"""
Comprehensive validation as per RBI expectations
"""
def validate(self, model, validation_data) -> ValidationReport:
report = ValidationReport(model_id=model.id)
# 1. Conceptual Soundness
report.conceptual = self.validate_conceptual_soundness(model)
# - Is the modeling approach appropriate for the problem?
# - Are assumptions reasonable and documented?
# - Is the model specification correct?
# 2. Data Quality
report.data_quality = self.validate_data_quality(model, validation_data)
# - Is training data representative?
# - Are there data quality issues?
# - Is data lineage documented?
# 3. Discriminatory Performance
report.performance = self.validate_performance(model, validation_data)
# - Does it meet accuracy thresholds?
# - How does it perform across segments?
# - What are the error patterns?
# 4. Fairness and Bias
report.fairness = self.validate_fairness(model, validation_data)
# - Demographic parity across protected groups
# - Equal opportunity metrics
# - Disparate impact analysis
# 5. Stability and Robustness
report.stability = self.validate_stability(model, validation_data)
# - Sensitivity to input perturbations
# - Performance under stress scenarios
# - Behavior at decision boundaries
# 6. Implementation Verification
report.implementation = self.validate_implementation(model)
# - Code review
# - Production vs development parity
# - Integration testing
return report
Requirement 3: Explainability Infrastructure
For customer-impacting decisions, you need to explain why. Not “the model said so” - actual reasons.
class ExplainabilityEngine:
"""
Generate explanations that satisfy RBI requirements
"""
def explain_decision(self, model, input_data, prediction) -> Explanation:
# Get feature contributions
shap_values = self.compute_shap_values(model, input_data)
# Identify top factors
top_positive = self.get_top_factors(shap_values, direction='positive', n=3)
top_negative = self.get_top_factors(shap_values, direction='negative', n=3)
# Generate human-readable explanation
explanation = Explanation(
decision=prediction,
confidence=model.predict_proba(input_data).max(),
# For internal use / audit
technical_factors={
'positive_contributors': top_positive,
'negative_contributors': top_negative,
'shap_values': shap_values,
},
# For customer communication
customer_explanation=self.generate_customer_explanation(
prediction, top_positive, top_negative
),
# For regulatory queries
audit_trail={
'model_version': model.version,
'input_hash': hash(str(input_data)),
'timestamp': datetime.now(),
'explanation_method': 'SHAP',
}
)
return explanation
def generate_customer_explanation(self, prediction, positive, negative) -> str:
"""
Generate explanation suitable for customer communication
RBI requires this to be understandable by the customer
"""
if prediction == 'rejected':
factors = [self.humanize_factor(f) for f in negative[:3]]
return f"Your application was not approved. Key factors: {', '.join(factors)}. " \
f"You may request a detailed explanation or dispute this decision."
else:
return f"Your application was approved."
def humanize_factor(self, technical_factor: str) -> str:
"""
Convert technical feature names to customer-friendly language
"""
mapping = {
'credit_utilization_ratio': 'credit card usage relative to limit',
'delinquency_count_12m': 'recent payment delays',
'time_at_current_address': 'length of time at current address',
'income_to_debt_ratio': 'income relative to existing debt',
'enquiry_count_6m': 'recent credit applications',
}
return mapping.get(technical_factor, technical_factor)
Requirement 4: Continuous Monitoring
RBI expects ongoing monitoring, not just point-in-time validation.
flowchart LR
subgraph Monitoring["Continuous Monitoring"]
A[Model Performance] --> E[Alert Engine]
B[Data Drift] --> E
C[Prediction Distribution] --> E
D[Fairness Metrics] --> E
end
subgraph Response["Response Actions"]
E --> F{Severity}
F -->|Critical| G[Immediate Review]
F -->|High| H[Accelerated Validation]
F -->|Medium| I[Scheduled Review]
F -->|Low| J[Log & Track]
end
subgraph Governance["Governance"]
G --> K[Model Risk Committee]
H --> K
K --> L{Decision}
L -->|Retrain| M[Model Update]
L -->|Retire| N[Fallback to Rules]
L -->|Continue| O[Enhanced Monitoring]
end
Key metrics to monitor:
class RBICompliantMonitoring:
"""
Monitoring framework aligned with RBI expectations
"""
def __init__(self, model, baseline_metrics):
self.model = model
self.baseline = baseline_metrics
self.alert_thresholds = self.get_thresholds_by_risk_level()
def monitor(self, production_data, predictions, outcomes) -> MonitoringReport:
report = MonitoringReport()
# 1. Performance Degradation
current_performance = self.compute_performance(predictions, outcomes)
report.performance_drift = self.compare_to_baseline(
current_performance,
self.baseline['performance']
)
# 2. Population Stability Index (PSI)
# Measures if the population you're scoring has changed
report.psi = self.compute_psi(
self.baseline['score_distribution'],
self.get_score_distribution(predictions)
)
# 3. Characteristic Stability Index (CSI)
# Measures if input feature distributions have changed
report.csi = {}
for feature in self.model.features:
report.csi[feature] = self.compute_csi(
self.baseline['feature_distributions'][feature],
production_data[feature]
)
# 4. Fairness Drift
# Has model fairness degraded for any group?
report.fairness_drift = self.compute_fairness_drift(
production_data, predictions, outcomes
)
# 5. Decision Distribution
# Is the model approving/rejecting at expected rates?
report.decision_drift = self.compute_decision_drift(predictions)
# Generate alerts based on thresholds
report.alerts = self.evaluate_alerts(report)
return report
def compute_psi(self, expected_dist, actual_dist) -> float:
"""
Population Stability Index
PSI < 0.1: No significant change
PSI 0.1-0.25: Moderate change, investigate
PSI > 0.25: Significant change, action required
"""
psi = 0
for i in range(len(expected_dist)):
if actual_dist[i] > 0 and expected_dist[i] > 0:
psi += (actual_dist[i] - expected_dist[i]) * \
np.log(actual_dist[i] / expected_dist[i])
return psi
Requirement 5: Fairness and Non-Discrimination
This is increasingly important. RBI expects banks to ensure AI doesn’t discriminate.
class FairnessAuditor:
"""
Audit model for discriminatory outcomes
RBI expects this for customer-impacting models
"""
def audit(self, model, data, predictions, outcomes) -> FairnessReport:
report = FairnessReport()
# Protected attributes to check
# Note: Banks may not have these directly, but can infer from proxies
protected_attributes = ['gender', 'region', 'urban_rural']
for attr in protected_attributes:
if attr in data.columns:
report.add_attribute_analysis(
attribute=attr,
demographic_parity=self.compute_demographic_parity(
data[attr], predictions
),
equal_opportunity=self.compute_equal_opportunity(
data[attr], predictions, outcomes
),
disparate_impact_ratio=self.compute_disparate_impact(
data[attr], predictions
)
)
# Proxy analysis - features that might encode protected attributes
report.proxy_analysis = self.detect_proxy_discrimination(
model, data, predictions
)
# Geographic analysis - required given India's diversity
report.geographic_analysis = self.analyze_geographic_fairness(
data, predictions, outcomes
)
return report
def compute_disparate_impact(self, protected_attr, predictions) -> float:
"""
Disparate Impact Ratio
< 0.8 is typically considered evidence of discrimination
RBI hasn't specified a threshold, but 0.8 is industry standard
"""
groups = protected_attr.unique()
approval_rates = {}
for group in groups:
mask = protected_attr == group
approval_rates[group] = (predictions[mask] == 1).mean()
max_rate = max(approval_rates.values())
min_rate = min(approval_rates.values())
return min_rate / max_rate if max_rate > 0 else 0
Requirement 6: Audit Trail and Documentation
Every model decision should be traceable and documentable.
class AuditTrailLogger:
"""
Complete audit trail for regulatory compliance
"""
def log_decision(self,
model_id: str,
input_data: dict,
prediction: any,
explanation: dict,
context: dict) -> str:
audit_record = {
'audit_id': str(uuid4()),
'timestamp': datetime.now().isoformat(),
# Model identification
'model_id': model_id,
'model_version': self.get_model_version(model_id),
# Input (hashed for privacy, full data in secure storage)
'input_hash': self.hash_input(input_data),
'input_storage_ref': self.store_securely(input_data),
# Output
'prediction': prediction,
'confidence': context.get('confidence'),
# Explanation
'explanation_method': explanation.get('method'),
'top_factors': explanation.get('top_factors'),
'explanation_storage_ref': self.store_securely(explanation),
# Context
'user_id': context.get('user_id'),
'session_id': context.get('session_id'),
'channel': context.get('channel'),
# Governance
'human_override': context.get('human_override', False),
'override_reason': context.get('override_reason'),
}
self.write_to_audit_log(audit_record)
return audit_record['audit_id']
def retrieve_for_regulatory_query(self,
customer_id: str = None,
date_range: tuple = None,
model_id: str = None) -> list:
"""
Retrieve audit records for regulatory examination
"""
# This needs to be fast - examiners don't wait
return self.query_audit_log(
customer_id=customer_id,
date_range=date_range,
model_id=model_id
)
Implementation Roadmap
For banks starting their AI governance journey:
flowchart TB
subgraph Phase1["Phase 1: Foundation (Months 1-3)"]
A[Model Inventory] --> B[Risk Classification]
B --> C[Basic Documentation]
end
subgraph Phase2["Phase 2: Validation (Months 4-6)"]
D[Validation Framework] --> E[Critical Model Validation]
E --> F[Explainability for Credit]
end
subgraph Phase3["Phase 3: Monitoring (Months 7-9)"]
G[Monitoring Infrastructure] --> H[Alert Framework]
H --> I[Fairness Monitoring]
end
subgraph Phase4["Phase 4: Maturity (Months 10-12)"]
J[Governance Integration] --> K[Automated Reporting]
K --> L[Continuous Improvement]
end
Phase1 --> Phase2 --> Phase3 --> Phase4
Common Gaps We See
Based on our work with Indian banks:
| Gap | Risk | Fix |
|---|---|---|
| No model inventory | Can’t demonstrate control | Start cataloging immediately |
| Validation by developers | Independence violation | Establish separate model risk function |
| No fairness testing | Discrimination risk | Add to validation framework |
| Incomplete audit trails | Can’t respond to queries | Implement logging before next model |
| Monitoring dashboard but no alerts | Issues go unnoticed | Define thresholds and alert paths |
| Documentation in notebooks | Not auditable | Standardize documentation templates |
How Rotavision Helps
We’ve built Guardian specifically for regulated financial services:
- Model inventory management with automatic risk classification
- Continuous monitoring with RBI-aligned metrics (PSI, CSI, fairness)
- Explainability engine for customer communications and regulatory queries
- Audit trail that meets retention and retrieval requirements
- Fairness monitoring calibrated for Indian demographic dimensions
We also offer AI governance advisory for banks building their frameworks - helping translate regulatory expectations into technical requirements and organizational processes.
Our team includes former bank risk managers who’ve been on the receiving end of RBI examinations. We know what examiners ask.
The Examiner is Coming
RBI’s focus on AI governance will only intensify. The banks that build robust frameworks now will:
- Handle examiner queries with confidence
- Deploy AI faster (governance enables, not blocks)
- Avoid the reputational damage of AI-related incidents
- Build genuine competitive advantage in AI
The banks that wait will be scrambling to build frameworks under regulatory pressure - always a worse position.
If you’re a bank thinking about AI governance, the time to start was yesterday. The second-best time is today.
Contact us to discuss your AI governance requirements.