The evolution of credit data and the role of machine learning in modern credit scoring
Synopsis
Establishing a stable and healthy financial system is vital for the government, banks and other financial institutions. Credit risk is one of the issues that banks face as it can arise from bad commercial loans without proper management. Instead of imposing high interest rates on all applicants to compensate for potential bad loans, banks can benefit from retrieving informative data about prospective applicants and classify the creditworthiness based on the data. High speed computers and growing availability of large databases, with respect to efficiency, bring forth Machine Learning (ML) methods to credit scoring and evaluation. Credit scoring can be automated by these methods, but deemed too complex for understanding rationality of output. Therefore, simplifying the model and scoring are questioned where the trade-offs can be challenging. The classification of credit scoring is regarded on 3 aspects; (1) the algorithms and techniques used, (2) how to assess and compare the performance of the classification models, and (3) practical levels on which this evaluation can be done. Majority of ML methods such as Neural Networks (NN), Extreme Learning Machines (ELM), Support Vector Machines (SVM), Gradient Boosting Trees (GBT) are regarded as black-box methods and difficult to interpret. Decision Trees (DT) are evaluated with integrated techniques such as Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanation (SHAP), K-nearest neighbor (KNN), Naïve Bayes models or linear logit models where nonlinear transformations can be stored out. Non-linear methods and transforms are regarded as hazier based on combos of large numbers of predetermined linear and non-linear factors, and GVFs are still used [2]. Evaluation of performance of credit scoring classification methods is primarily taken with binary classification measures such as True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), False Negative Rate (FNR), in addition to Domain Specific Typos and related metrics compared up on European Union (EU) policy driven thresholds. Cost-benefit ratio and its effects and propagation on ranking of classifiers are also included with regards to decreasing data and increased bank profitability.