Loss Given Default

Developing a Loss-Given-Default (LGD) model

Bela Koch

March 26, 2024

Introduction

Theory

Definition of LGD:

Loss Given Default (LGD): loss that arises in default event
LGD commonly expressed as ratio of loss on exposure to the amount outstanding at default
Basel Capital Accord: LGD should measure economic loss, i.e. include all potential costs and benefits of default event
- example of costs: administrative costs, legal costs, time delays, …
- example of benefits: interest/penalties for delay, commissions, …
- all cash flows discounted at time of default event

Theory (cont.)

Definition of Default:

use real default for modeling LGD: default due to financial problems or insolvency of obligor
important: use same default definition for modeling LGD and PD
after default event, various actions can take place:
- cure: defaulter pays back all outstanding debt
- restructuring: recovery plan, e.g. extension of loan maturity to reduce monthly payments
- liquidation: bank takes possession of collateral assets and sells it

The Dataset

synthetic data
mortgage portfolio of one bank
1453 observations in total of defaulted customers
each row in data set is one independent loan

variable	data type	description
customer	categorical	type of customer (private or corporate)
loan_amount	numerical	loan granted to customer in CHF
real_estate_type	categorical	type of mortgage (single family house, apartment, office building)
mortgage_collateral_mv	numerical	market value of mortgage collateral at time of origination
additional_collateral_type	categorical	type of additional collateral (retirement account, cash account, none)
additional_collateral_mv	numerical	market value of additional collateral at time of origination
lgd	numerical	(non-economic) loss given default in percent of loan granted

target variable

lgd: (non-economic) Loss Given Default
continuous variable from 0 to 0.69 (theoretically up to 1)
57.67% of defaulted position did not result in a loss

categorical predictors

continuous predictors

compare segments: loss given default

compare segments: recovered value of mortgage collateral

compare segments: recovered value of additional collateral

The true DGP

The loss given default data was modeled as follows:

\[ \text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}} * \text{e}^{-a-bX} - \frac{\text{other coll. MV}}{\text{loan amount}} * y \]

whereby

\[ X \sim \text{N}(0,1)\quad\text{and}\quad y \sim \text{Ber}(p) \]

Parameters \(a\) and \(b\) are fixed but different for client types and real estate types (private apartment, private house, corporate office)

Parameter \(p\) can take one of three different values, depending on the customer type and collateral type (private retirement, private cash, corporate cash)

The true DGP: intuition

\[ \color{grey}{\text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}}} *\mathbf{\color{black}{\text{e}^{-a-bX}}} \color{grey}{- \frac{\text{other coll. MV}}{\text{loan amount}} * y } \]

\(e^{-a-bX}\in(0, 1]\) reflects the amount of market value which is obtained when selling the collateral
\(b\) in combination with \(X\sim N(0,1)\) introduces variability in the haircuts between different loans

The true DGP: intuition (cont.)

\[ \color{grey}{\text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}} *\text{e}^{-a-bX} - \frac{\text{other coll. MV}}{\text{loan amount}}} \mathbf{* y} \]

\[ \text{Ber}(p) = y = \begin{cases} 1 & \text{with probability } p\\ 0 & \text{with probability } 1-p \end{cases} \Rightarrow y = \begin{cases} 1 & \text{if (whole) additional collateral can be liquidized}\\ 0 & \text{if none of additional collateral can be liquidized} \end{cases} \]

Modelling

What are we trying to model?

Scenario: A customer defaults

Question: How much do we loose due to this default?

\[ loss\,given\,default = \max\biggl(0,\;\frac{loan - Revenue\,from\,selling\,collateral}{loan\,amount}\biggr) \]

Problem: We are forced to sell (potentially) under market value

\[ Revenue\,from\,selling\,collateral = (1-haircut) * market\,value \]

whereby \(haircut\in[0,1]\) is unknown in advance

What are we trying to model? (cont.)

Let’s define

\[ \beta_1 = (1-haircut_{mortgage}) \quad\text{and}\quad \beta_2 = (1-haircut_{additional\,collateral}) \]

Then, loss given default in our setting is given by

\[ nominal\,lgd_i = loan\,amount_i-\beta_1*mv\,mortgage_i-\beta_2*mv\,additional_i \]

resp.

\[ lgd_i = 1 - \beta_1 * \frac{mv\,mortgage_i}{loan\,amount}_i - \beta_2 * \frac{mv\,additional_i}{loan\,amount_i} \]

The haircuts might depend on the type of collateral

example: an apartment might be sold easier and therefore with less haircut compared to office building (risk driver: liquidity in corresponding market)

The Champion Model

Two Step Simple Regression Model

Step 1:

Only consider loans of certain mortgage type without additional collateral. Estimate \(\beta_1\):

\[ lgd_i = 1 - \beta_1*\frac{mv\,mortgage_i}{loan\,amount}_i + \epsilon_i \]

Step 2:

Now consider loans of same mortgage type but with additional collateral. Estimate \(\beta_2\):

\[ 1 - lgd_i - \hat{\beta}_1*\frac{mv\,mortgage_i}{loan\,amount}_i = \beta_2*\frac{mv\,additional_i}{loan\,amount_i} + \epsilon_i \]

assumption: \(\beta_1\frac{mv\,mortgage_i}{loan\,amount_i}\) and \(\beta_2\frac{mv\,additional_i}{loan\,amount_i}\) independent

Step 1: estimate \(\beta_1\) (example for apartment)

\[ 1-lgd_i = \beta_1 * \frac{mortgage\,collateral\,mv_i}{loan\,amount_i} \]


Call:
lm(formula = y ~ 0 + x)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.140193 -0.023305  0.006439  0.028488  0.094793 

Coefficients:
  Estimate Std. Error t value            Pr(>|t|)    
x 0.767774   0.002265     339 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04399 on 226 degrees of freedom
Multiple R-squared:  0.998, Adjusted R-squared:  0.998 
F-statistic: 1.149e+05 on 1 and 226 DF,  p-value: < 0.00000000000000022

Step 2: estimate \(\beta_2\) (example for retirement account)

\[ 1 - lgd_i-\hat{\beta}_1*\frac{mortgage\,collateral\,mv_i}{loan\,amount_i} = \beta_2*\frac{additional\,collateral\,mv_i}{loan\,amount_i} \]


Call:
lm(formula = y ~ 0 + x)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23179 -0.01373  0.01717  0.04128  0.09469 

Coefficients:
  Estimate Std. Error t value            Pr(>|t|)    
x  0.81690    0.02313   35.31 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04532 on 395 degrees of freedom
Multiple R-squared:  0.7594,    Adjusted R-squared:  0.7588 
F-statistic:  1247 on 1 and 395 DF,  p-value: < 0.00000000000000022

Results

percentage from MV	private apartments	private houses	corporate offices
mortgage	0.77 (apartment)	0.73 (single family house)	0.66 (office building)
additional collateral	0.82 (retirement account)	0.87 (retirement account)	0.94 (cash account)

Interpretation: on average 77% of the market value of the apartment measured at time of origination of the loan can be recovered if the apartment must be sold due to a default
Differences between the real estate types could be explained by liquidity in corresponding markets (assumption)
Differences between the additional collateral types could be explained by characteristics of the account used and characteristics of the corresponding obligor

Predict

predict LGD using \(\hat{\beta}_1\) and \(\hat{\beta}_2\):

\[ \widetilde{lgd}_i = 1 - \hat{\beta}_1*\frac{mortgage\,collateral\,mv_i}{loan\,amount_i} - \hat{\beta}_2 * \frac{additional\,collateral\,mv_i}{loan\,amount_i} \]

to ensure that the predicted LGD is between 0 and 1, following transformation is done after prediction:

\[ \widehat{lgd}_i = \min(\max(\widetilde{lgd}_i, 0), 1) \]

Evaluation: Portfolio Level

Evaluation: per real estate type

The realised LGD in CHF for our portfolio was 1174.87 million CHF while we would have estimated a LGD of 1187.62 million CHF for the portfolio using our model, hence we overestimated LGD by 12.75 million CHF.

Tweek model coefficients

As a bank we might decide that we prefer to overestimate rather than underestimate the LGD. To check whether we over- or underestimate LGD on average, let’s define \(x\) as the difference between the observed and estimated LGD for given real estate type:

\[ \mu(x) \begin{cases} > 0 & \text{if we underestimate LGD on average}\\ < 0 & \text{if we overestimate LGD on average}\\ = 0 & \text{if we neither under- nor overestimate LGD on average} \end{cases} \] Therefore, let’s test:

\[ H_0: \mu(x) \geq 0 \quad vs \quad H_A: \mu(x)<0 \]

With a p-value of 0.00 we can reject \(H_0\) for apartments, with p-values of 0.06 for single family houses and 0.21 for office buildings, we cannot reject \(H_0\).

Tweek model coefficients (cont.)

ad hoc modification: decrease the coefficients for single family house and office building by one percentage point each:

percentage from MV	private apartments	private houses	corporate offices
mortgage	0.77 (apartment)	0.72 (single family house)	0.65 (office building)
additional collateral	0.82 (retirement account)	0.86 (retirement account)	0.93 (cash account)

After decreasing the coefficients for single family houses and office buildings and their corresponding additional collateral by one percentage point each, we can reject \(H_0\) for each segment with a p-value of 0.00.

Evaluation: ad hoc coefficients

The realised LGD in CHF for our portfolio was 1174.87 million CHF while we would have estimated a LGD of 1293.19 million CHF for the portfolio using our model, hence we overestimated LGD by 118.32 million CHF.

Alternative Models

Single Step Simple Regression Model

First, create single variable for each collateral type with properties

\[ mv\,collateral_i = \begin{cases} mv\,collateral_i & \text{if collateral used for loan of type }i\\ 0 & \text{otherwise} \end{cases} \]

Then, run following regression:

\[ lgd_i = 1 - \sum_i\beta_i*\frac{mv\,collateral_i}{loan\,amount_i} + \epsilon_i \] - Idea: we get all coefficients of interest in one regression

Single Step Simple Regression Model (cont.)


Call:
lm(formula = I(lgd - 1) ~ 0 + house_collateral + appartment_collateral + 
    office_collateral + retirement_collateral + cash_collateral, 
    data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.20987 -0.06016 -0.01270  0.04014  0.48781 

Coefficients:
                       Estimate Std. Error t value            Pr(>|t|)    
house_collateral      -0.742875   0.006215 -119.54 <0.0000000000000002 ***
appartment_collateral -0.775577   0.004157 -186.57 <0.0000000000000002 ***
office_collateral     -0.665729   0.004440 -149.93 <0.0000000000000002 ***
retirement_collateral -0.751765   0.059018  -12.74 <0.0000000000000002 ***
cash_collateral       -0.883656   0.067523  -13.09 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1053 on 1448 degrees of freedom
Multiple R-squared:  0.9874,    Adjusted R-squared:  0.9873 
F-statistic: 2.268e+04 on 5 and 1448 DF,  p-value: < 0.00000000000000022

Single Step Simple Regression Model (cont.)

single step regression:

percentage from MV	private apartments	private houses	corporate offices
mortgage	0.78	0.74	0.67
additional collateral	0.75	0.75	0.88

two step regression:

percentage from MV	private apartments	private houses	corporate offices
mortgage	0.77	0.72	0.65
additional collateral	0.82	0.86	0.93

running the linear model on different subsamples or with feature engineering (for example by adding polynomials, using (log of) nominal values, dummies, new features such as \(\frac{\text{loan amount}}{\sum\text{collateral mv}}\), …) did not really change the results

Tobit Regression

assumption: there is an underlying linear model:

\[ loss\,given\,default = \frac{loan - revenue\,from\,selling\,collateral}{loan} \]

technically, if the revenue from selling collateral exceeds loan amount (which can be the case), loss given default would be negative
however, the bank won’t receive more than the loan in this scenario (remember: non-economic LGD), hence:

\[ loss\,given\,default = \max\biggl(0, \frac{loan - revenue\,from\,selling\,collateral}{loan}\biggr) \]

additionally, the bank cannot loose more than it lent to the customer (remember: non-economic LGD)

Tobit Regression (cont.)

result: LGD is censored with a lower limit of 0 and an upper limit of 1
Lets define \(y\) as the loss given default without the censoring process

\[ lgd_i = \min(\max(0, y_i), 1) \]

the tobit regression model accounts for this censoring

Tobit Regression: Results


Call:
AER::tobit(formula = lgd ~ appartment_collateral + house_collateral + 
    retirement_collateral, left = 0, right = 1, dist = "logistic", 
    data = df[df$customer == "private", ])

Observations:
         Total  Left-censored     Uncensored Right-censored 
           842            617            225              0 

Coefficients:
                      Estimate Std. Error z value             Pr(>|z|)    
(Intercept)            0.93431    0.14793   6.316     0.00000000026894 ***
appartment_collateral -0.81429    0.11954  -6.812     0.00000000000962 ***
house_collateral      -0.72907    0.11746  -6.207     0.00000000053971 ***
retirement_collateral -0.78708    0.14184  -5.549     0.00000002873278 ***
Log(scale)            -2.74016    0.06004 -45.639 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Scale: 0.06456 

Logistic distribution
Number of Newton-Raphson Iterations: 4 
Log-likelihood: -85.26 on 5 Df
Wald-statistic: 100.8 on 3 Df, p-value: < 0.000000000000000222

Tobit Regression: Evaluation

Note: on LHS of vertical line are differences for apartments, on RHS for single family houses

Maximum Likelihood Estimation

if we have a strong belief in true DGP (or in this project even know it) one might use maximum likelihood estimation (MLE)
remember the true DGP:

\[ \text{lgd} = 1-\frac{\text{real estate coll. MV}}{\text{loan amount}} * \text{e}^{-a-bX} - \frac{\text{other coll. MV}}{\text{loan amount}} * y \\ \phantom{a} \\ \text{whereby } X \sim \text{N}(0,1)\quad\text{and}\quad y \sim \text{Ber}(p) \]

using MLE, we would estimate the parameters \(a\), \(b\) and \(p\) of the DGP given the observed data
using the synthetic data, MLE (should) deliver the best estimation and hence would be a great benchmark

overview of possible models

Source: H. Scheule, D. Rösch, and B. Baesens (2016). Credit Risk Analytics (Measurement Techniques, Applications, and Examples in SAS). Chapter 10.

The Application

Demo

Deployed version accessible via:

https://betiko.shinyapps.io/acrm/

Source code available on GitHub:

https://github.com/bt-koch/acrm

Functionalities

Estimation of LGD on single loan level

potential use case: loan officers can estimate LGD for given loan and might only give out loan if LGD is below certain threshold (and might request higher additional collateral from client that LGD falls below the threshold)

Estimation of LGD on simulated portfolio

potential use case: managers can get a feeling on how high expected LGD resp. expected loss might be for \(n\) new loans to decide on how many loans of each type they want to approve

Implementation

.
├── R
│   ├── datahandling.R
│   ├── input_validation.R
│   ├── models.R
│   ├── simulation.R
│   └── utils.R
├── README.md
├── acrm.Rproj
├── calibrated_models
│   ├── corporate_office_building.rds
│   ├── linear_regression.rds
│   ├── private_appartment.rds
│   └── private_single_family_house.rds
├── data
│   └── lgd_dataset.csv
├── server.R
├── test
│   └── test_workflow.R
└── ui.R

Server Logic

server <- function(input, output) {
  
  observeEvent(input$estimate, {
    ...
  })
}

Server Logic: get user input

server <- function(input, output) {
  observeEvent(input$estimate, {
    model_input <- input |>
      read_input() |>
      preprocess_data()
    ...
  })
}

Server Logic: validate user input

server <- function(input, output) {
  observeEvent(input$estimate, {
    ...
    validation_result <- validate_input(model_input)
    
    if (any(lapply(validation_result, function(x) x[["type"]]) == "warning")) {
      ... display warnings in box
    }
    
    if (any(lapply(validation_result, function(x) x[["type"]]) == "error")) {
      ... display errors in box
    }
    ...
  })
}

Server Logic: hanlde error

server <- function(input, output) {
  observeEvent(input$estimate, {
    ...
    if (any(lapply(validation_result, function(x) x[["type"]]) == "error")) {
      
      output$lgd_estimation <- renderInfoBox({
        valueBox(
          "Calculation not possible",
          "please see error message",
          icon = icon("bug"),
          color = "red"
        )
      })
      
    } else {
      ... predict LGD
    }
  })
}

Server Logic: predict LGD

server <- function(input, output) {
  observeEvent(input$estimate, {
    ...
    if (any(lapply(validation_result, function(x) x[["type"]]) == "error")) {
      ... handle error
    } else {
      estimated_lgd <- two_step_estimation_estimate(input) |> 
        cap_prediction()
      estimated_lgd_nom <- estimated_lgd * model_input$loan_amount
      
      output$lgd_estimation <- renderValueBox({
        ... display predicted LGD
      })
    }
  })
}

Conclusion

Critical Review

Limitations:

very simple model (but therefore to interpret)
for some loans, we underestimate the LGD drastically (up to 0.5)
tweeking the coefficients could have been optimized (minimize shrinking of coefficients to fall below desired p-value)
only liquidation event considered, cures or recoveries not included in modeling process
other sources obligor might use to cover the obligations in default event are ignored
only non-economic LGD is modeled
synthetic data (probably) does not reflect all real world dynamics

Critical Review (cont.)

Possible extensions:

with more information about corresponding mortgage, maybe a more precise estimator for its haircut can be estimated (e.g. region, size, year of construction, etc.)
with more information about corresponding counterparty, maybe a more precise estimator for the haircut on retirement resp. cash account can be estimated (e.g. age of private client, business model of corporate client, etc.)

Questions?

Thank you for your attention!