Developing a Loss-Given-Default (LGD) model
March 26, 2024
Definition of LGD:
Definition of Default:
variable | data type | description |
---|---|---|
customer | categorical | type of customer (private or corporate) |
loan_amount | numerical | loan granted to customer in CHF |
real_estate_type | categorical | type of mortgage (single family house, apartment, office building) |
mortgage_collateral_mv | numerical | market value of mortgage collateral at time of origination |
additional_collateral_type | categorical | type of additional collateral (retirement account, cash account, none) |
additional_collateral_mv | numerical | market value of additional collateral at time of origination |
lgd | numerical | (non-economic) loss given default in percent of loan granted |
lgd
: (non-economic) Loss Given DefaultThe loss given default data was modeled as follows:
\[ \text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}} * \text{e}^{-a-bX} - \frac{\text{other coll. MV}}{\text{loan amount}} * y \]
whereby
\[ X \sim \text{N}(0,1)\quad\text{and}\quad y \sim \text{Ber}(p) \]
Parameters \(a\) and \(b\) are fixed but different for client types and real estate types (private apartment, private house, corporate office)
Parameter \(p\) can take one of three different values, depending on the customer type and collateral type (private retirement, private cash, corporate cash)
\[ \color{grey}{\text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}}} *\mathbf{\color{black}{\text{e}^{-a-bX}}} \color{grey}{- \frac{\text{other coll. MV}}{\text{loan amount}} * y } \]
\[ \color{grey}{\text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}} *\text{e}^{-a-bX} - \frac{\text{other coll. MV}}{\text{loan amount}}} \mathbf{* y} \]
\[ \text{Ber}(p) = y = \begin{cases} 1 & \text{with probability } p\\ 0 & \text{with probability } 1-p \end{cases} \Rightarrow y = \begin{cases} 1 & \text{if (whole) additional collateral can be liquidized}\\ 0 & \text{if none of additional collateral can be liquidized} \end{cases} \]
Scenario: A customer defaults
Question: How much do we loose due to this default?
\[ loss\,given\,default = \max\biggl(0,\;\frac{loan - Revenue\,from\,selling\,collateral}{loan\,amount}\biggr) \]
Problem: We are forced to sell (potentially) under market value
\[ Revenue\,from\,selling\,collateral = (1-haircut) * market\,value \]
whereby \(haircut\in[0,1]\) is unknown in advance
Let’s define
\[ \beta_1 = (1-haircut_{mortgage}) \quad\text{and}\quad \beta_2 = (1-haircut_{additional\,collateral}) \]
Then, loss given default in our setting is given by
\[ nominal\,lgd_i = loan\,amount_i-\beta_1*mv\,mortgage_i-\beta_2*mv\,additional_i \]
resp.
\[ lgd_i = 1 - \beta_1 * \frac{mv\,mortgage_i}{loan\,amount}_i - \beta_2 * \frac{mv\,additional_i}{loan\,amount_i} \]
The haircuts might depend on the type of collateral
example: an apartment might be sold easier and therefore with less haircut compared to office building (risk driver: liquidity in corresponding market)
Step 1:
Only consider loans of certain mortgage type without additional collateral. Estimate \(\beta_1\):
\[ lgd_i = 1 - \beta_1*\frac{mv\,mortgage_i}{loan\,amount}_i + \epsilon_i \]
Step 2:
Now consider loans of same mortgage type but with additional collateral. Estimate \(\beta_2\):
\[ 1 - lgd_i - \hat{\beta}_1*\frac{mv\,mortgage_i}{loan\,amount}_i = \beta_2*\frac{mv\,additional_i}{loan\,amount_i} + \epsilon_i \]
assumption: \(\beta_1\frac{mv\,mortgage_i}{loan\,amount_i}\) and \(\beta_2\frac{mv\,additional_i}{loan\,amount_i}\) independent
\[ 1-lgd_i = \beta_1 * \frac{mortgage\,collateral\,mv_i}{loan\,amount_i} \]
Call:
lm(formula = y ~ 0 + x)
Residuals:
Min 1Q Median 3Q Max
-0.140193 -0.023305 0.006439 0.028488 0.094793
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 0.767774 0.002265 339 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04399 on 226 degrees of freedom
Multiple R-squared: 0.998, Adjusted R-squared: 0.998
F-statistic: 1.149e+05 on 1 and 226 DF, p-value: < 0.00000000000000022
\[ 1 - lgd_i-\hat{\beta}_1*\frac{mortgage\,collateral\,mv_i}{loan\,amount_i} = \beta_2*\frac{additional\,collateral\,mv_i}{loan\,amount_i} \]
Call:
lm(formula = y ~ 0 + x)
Residuals:
Min 1Q Median 3Q Max
-0.23179 -0.01373 0.01717 0.04128 0.09469
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 0.81690 0.02313 35.31 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04532 on 395 degrees of freedom
Multiple R-squared: 0.7594, Adjusted R-squared: 0.7588
F-statistic: 1247 on 1 and 395 DF, p-value: < 0.00000000000000022
percentage from MV | private apartments | private houses | corporate offices |
---|---|---|---|
mortgage | 0.77 (apartment) | 0.73 (single family house) | 0.66 (office building) |
additional collateral | 0.82 (retirement account) | 0.87 (retirement account) | 0.94 (cash account) |
Interpretation: on average 77% of the market value of the apartment measured at time of origination of the loan can be recovered if the apartment must be sold due to a default
Differences between the real estate types could be explained by liquidity in corresponding markets (assumption)
Differences between the additional collateral types could be explained by characteristics of the account used and characteristics of the corresponding obligor
\[ \widetilde{lgd}_i = 1 - \hat{\beta}_1*\frac{mortgage\,collateral\,mv_i}{loan\,amount_i} - \hat{\beta}_2 * \frac{additional\,collateral\,mv_i}{loan\,amount_i} \]
\[ \widehat{lgd}_i = \min(\max(\widetilde{lgd}_i, 0), 1) \]
The realised LGD in CHF for our portfolio was 1174.87 million CHF while we would have estimated a LGD of 1187.62 million CHF for the portfolio using our model, hence we overestimated LGD by 12.75 million CHF.
As a bank we might decide that we prefer to overestimate rather than underestimate the LGD. To check whether we over- or underestimate LGD on average, let’s define \(x\) as the difference between the observed and estimated LGD for given real estate type:
\[ \mu(x) \begin{cases} > 0 & \text{if we underestimate LGD on average}\\ < 0 & \text{if we overestimate LGD on average}\\ = 0 & \text{if we neither under- nor overestimate LGD on average} \end{cases} \] Therefore, let’s test:
\[ H_0: \mu(x) \geq 0 \quad vs \quad H_A: \mu(x)<0 \]
With a p-value of 0.00 we can reject \(H_0\) for apartments, with p-values of 0.06 for single family houses and 0.21 for office buildings, we cannot reject \(H_0\).
ad hoc modification: decrease the coefficients for single family house and office building by one percentage point each:
percentage from MV | private apartments | private houses | corporate offices |
---|---|---|---|
mortgage | 0.77 (apartment) | 0.72 (single family house) | 0.65 (office building) |
additional collateral | 0.82 (retirement account) | 0.86 (retirement account) | 0.93 (cash account) |
After decreasing the coefficients for single family houses and office buildings and their corresponding additional collateral by one percentage point each, we can reject \(H_0\) for each segment with a p-value of 0.00.
The realised LGD in CHF for our portfolio was 1174.87 million CHF while we would have estimated a LGD of 1293.19 million CHF for the portfolio using our model, hence we overestimated LGD by 118.32 million CHF.
\[ mv\,collateral_i = \begin{cases} mv\,collateral_i & \text{if collateral used for loan of type }i\\ 0 & \text{otherwise} \end{cases} \]
\[ lgd_i = 1 - \sum_i\beta_i*\frac{mv\,collateral_i}{loan\,amount_i} + \epsilon_i \] - Idea: we get all coefficients of interest in one regression
Call:
lm(formula = I(lgd - 1) ~ 0 + house_collateral + appartment_collateral +
office_collateral + retirement_collateral + cash_collateral,
data = df)
Residuals:
Min 1Q Median 3Q Max
-0.20987 -0.06016 -0.01270 0.04014 0.48781
Coefficients:
Estimate Std. Error t value Pr(>|t|)
house_collateral -0.742875 0.006215 -119.54 <0.0000000000000002 ***
appartment_collateral -0.775577 0.004157 -186.57 <0.0000000000000002 ***
office_collateral -0.665729 0.004440 -149.93 <0.0000000000000002 ***
retirement_collateral -0.751765 0.059018 -12.74 <0.0000000000000002 ***
cash_collateral -0.883656 0.067523 -13.09 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1053 on 1448 degrees of freedom
Multiple R-squared: 0.9874, Adjusted R-squared: 0.9873
F-statistic: 2.268e+04 on 5 and 1448 DF, p-value: < 0.00000000000000022
single step regression:
percentage from MV | private apartments | private houses | corporate offices |
---|---|---|---|
mortgage | 0.78 | 0.74 | 0.67 |
additional collateral | 0.75 | 0.75 | 0.88 |
two step regression:
percentage from MV | private apartments | private houses | corporate offices |
---|---|---|---|
mortgage | 0.77 | 0.72 | 0.65 |
additional collateral | 0.82 | 0.86 | 0.93 |
\[ loss\,given\,default = \frac{loan - revenue\,from\,selling\,collateral}{loan} \]
technically, if the revenue from selling collateral exceeds loan amount (which can be the case), loss given default would be negative
however, the bank won’t receive more than the loan in this scenario (remember: non-economic LGD), hence:
\[ loss\,given\,default = \max\biggl(0, \frac{loan - revenue\,from\,selling\,collateral}{loan}\biggr) \]
result: LGD is censored with a lower limit of 0 and an upper limit of 1
Lets define \(y\) as the loss given default without the censoring process
\[ lgd_i = \min(\max(0, y_i), 1) \]
Call:
AER::tobit(formula = lgd ~ appartment_collateral + house_collateral +
retirement_collateral, left = 0, right = 1, dist = "logistic",
data = df[df$customer == "private", ])
Observations:
Total Left-censored Uncensored Right-censored
842 617 225 0
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.93431 0.14793 6.316 0.00000000026894 ***
appartment_collateral -0.81429 0.11954 -6.812 0.00000000000962 ***
house_collateral -0.72907 0.11746 -6.207 0.00000000053971 ***
retirement_collateral -0.78708 0.14184 -5.549 0.00000002873278 ***
Log(scale) -2.74016 0.06004 -45.639 < 0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Scale: 0.06456
Logistic distribution
Number of Newton-Raphson Iterations: 4
Log-likelihood: -85.26 on 5 Df
Wald-statistic: 100.8 on 3 Df, p-value: < 0.000000000000000222
Note: on LHS of vertical line are differences for apartments, on RHS for single family houses
if we have a strong belief in true DGP (or in this project even know it) one might use maximum likelihood estimation (MLE)
remember the true DGP:
\[ \text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}} * \text{e}^{-a-bX} - \frac{\text{other coll. MV}}{\text{loan amount}} * y \\ \phantom{a} \\ \text{whereby } X \sim \text{N}(0,1)\quad\text{and}\quad y \sim \text{Ber}(p) \]
using MLE, we would estimate the parameters \(a\), \(b\) and \(p\) of the DGP given the observed data
using the synthetic data, MLE (should) deliver the best estimation and hence would be a great benchmark
Source: H. Scheule, D. Rösch, and B. Baesens (2016). Credit Risk Analytics (Measurement Techniques, Applications, and Examples in SAS). Chapter 10.
Deployed version accessible via:
https://betiko.shinyapps.io/acrm/
Source code available on GitHub:
Estimation of LGD on single loan level
potential use case: loan officers can estimate LGD for given loan and might only give out loan if LGD is below certain threshold (and might request higher additional collateral from client that LGD falls below the threshold)
Estimation of LGD on simulated portfolio
potential use case: managers can get a feeling on how high expected LGD resp. expected loss might be for \(n\) new loans to decide on how many loans of each type they want to approve
.
├── R
│ ├── datahandling.R
│ ├── input_validation.R
│ ├── models.R
│ ├── simulation.R
│ └── utils.R
├── README.md
├── acrm.Rproj
├── calibrated_models
│ ├── corporate_office_building.rds
│ ├── linear_regression.rds
│ ├── private_appartment.rds
│ └── private_single_family_house.rds
├── data
│ └── lgd_dataset.csv
├── server.R
├── test
│ └── test_workflow.R
└── ui.R
server <- function(input, output) {
observeEvent(input$estimate, {
...
validation_result <- validate_input(model_input)
if (any(lapply(validation_result, function(x) x[["type"]]) == "warning")) {
... display warnings in box
}
if (any(lapply(validation_result, function(x) x[["type"]]) == "error")) {
... display errors in box
}
...
})
}
server <- function(input, output) {
observeEvent(input$estimate, {
...
if (any(lapply(validation_result, function(x) x[["type"]]) == "error")) {
output$lgd_estimation <- renderInfoBox({
valueBox(
"Calculation not possible",
"please see error message",
icon = icon("bug"),
color = "red"
)
})
} else {
... predict LGD
}
})
}
server <- function(input, output) {
observeEvent(input$estimate, {
...
if (any(lapply(validation_result, function(x) x[["type"]]) == "error")) {
... handle error
} else {
estimated_lgd <- two_step_estimation_estimate(input) |>
cap_prediction()
estimated_lgd_nom <- estimated_lgd * model_input$loan_amount
output$lgd_estimation <- renderValueBox({
... display predicted LGD
})
}
})
}
Limitations:
Possible extensions:
Thank you for your attention!