Risk and compliance analytics using machine learning
Synopsis
Risk is an essential concept in all walks of life. All decisions contain an inherent level of risk. One can choose to be exposed to and exploit risk or choose to avoid risk and thus (presumably) yield lower expected returns. However, both the risk-averse and risk-seeking need clear risk analytics in order to quantitatively know where they stand. In the absence of such knowledge, even those who exploit risk might be left exposed to far greater risk than they anticipated. Similarly, the risk-averse should understand where their exposure starts and stops. The subject of risk analytics spans a wide horizon, from the estimations of the risk of exotic derivatives, such as path-dependent options in the finance arena, to the forecasting the propagation of risk in a complex network of manufacturing machines. All these pursuits are ultimately based on data; whether this is market data, sensor data or even expert knowledge, the knowledge is distilled and put into play by way of data. Data-driven modelling and analytics is a digital representation of the real world used to quantify, manage, and analyse risks. Risk projections, such as the computing and risk-piece of the picture, are then done in response to the chosen risk model. Like any analytics or data-driven effort, risk analytics also faces data challenges at each stage of the analytics workout, here outlined as data difficulties and addressed with unit operations and performance measures. Considerable frameworks for the auditing of data during risk analytics are outlined and network metrics are derived for this audit process. These solutions provided are meant as high-level guidelines, opening a conduit for academia to provide even greater granularity.
All classification models were built upon ensemble learning via original credit scoring architecture based on xgboost. The process of hyperparameter tuning was manually conducted and facilitated in order to attain a maximum accuracy level. Following global testing conducted via extended cross-validation and stratified KFold partitioning of the dataset, all models used in feedback - bucketing processing were model-agnostic, hence novel methods were integrated into every solution stream. Such capabilities included several dimensionality reduction approaches, optimised restriction of analysed dataset size and ranges as well as two sampling methods that help tackle the class imbalance issue of banking analysis datasets. Finally, a unified module was also built for cumulative interest benefits, simulations and visualisation of the models performance metrics.