Use R to explore a real-life data set, then preprocess the data set such that it’s in the appropriate format before applying the credit risk models. First, I examed the dataset
loan_data discussed in the video throughout the exercises in DataCamp.
- Goal: understand the number, percentage of defaults.
- To learn more about variable structures and spot unexpected tendencies in the data
- Examine the relationship between
default information is stored in the response variable
loan_status, where 1 represents a
default,and 0 represents
For example, you would expect that the proportion of defaults in the group of customers with
grade G (worst credit rating score) is substantially higher than the proportion of defaults in the
grade A group (best credit rating score).
- EL= PD* EAD * LGD
Components of expected loss ( EL) , Probability of default (PD), Exposure at default (EAD), Loss given default (LGD)
# Call CrossTable() on grade and loan_status
> CrossTable(loan_data$grade, loan_data$loan_status, prop.r = TRUE,
prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)
- Use hist() to create a histogram with only one argument:
loan_data$loan_amnt. Assign the result to a new object called
$breaksalong with the object
hist_1to get more information on the histogram breaks. Knowing the location of the breaks is important because if they are poorly chosen, the histogram may be misleading.
- Change the number of breaks in
hist_1to 200 by specifying the
breaksargument. Additionally, name the x-axis
"Loan amount"using the
xlabargument and title it
"Histogram of the loan amount"using the