Part I: Types and Complexity of Models, and Unobservable or Omitted Variables or Relationships
Since the financial crisis, it’s not unusual to read articles here and there about the “failure of models.” For example, a recent piece in Scientific American critiqued financial model “calibration,” proclaiming in its title, Why Economic Models are Always Wrong. In the mortgage business, for example, it is important to understand where models have continued to work, as well as where they failed, and what this all means for the future of your servicing and origination business.
I also see examples of loose understanding about best practices in relation to the shortcomings of models that do work, and also about the comparative strengths and weaknesses of alternative judgmental decision processes. With their automation efficiencies, consistency, valuable added insights, and testability for reliability and robustness, statistical business models driven by extensive and growing data remain all around us today, and they are continuing to expand. So regardless of your views on the values and uses of models, it is important to have a clear view and sound strategies in model usage.
A Categorization: Ten Types of Models
Business models used by financial institutions can be placed in more than ten categories, of course, but here are ten prominent general types of models:
- Statistical credit scoring models (typically for default)
- Consumer- or borrower-response models
- Consumer- or borrower-characteristic prediction models
- Loss given default (LGD) and Exposure at default (EAD) models
- Optimization tools (these are not models, per se, but mathematical algorithms that often use inputs from models)
- Loss forecasting and simulation models and Value-at-risk (VAR) models
- Valuation, option pricing, and risk-based pricing models
- Profitability forecasting and enterprise-cash-flow projection models
- Macroeconomic forecasting models
- Financial-risk models that model complex financial instruments and interactions
Types 8, 9 and 10, for example, are often built up from multiple component models, and for this reason and others, these model categories are not mutually exclusive. Types 1 through 3, for example, can also be built from individual-level data (typical) or group-level data. No categorical type listing of models is perfect, and this listing is also not intended to be completely exhaustive.
The Strain of Complexity (or Model Ambition)
The principle of Occam’s razor in model building, roughly translated, parallels the business dictum to “keep it simple, stupid.” Indeed, the general ordering of model types 1 through 10 above (you can quibble on the details) tends to correspond to growing complexity, or growing model ambition.
Model types 1 and 2 typically forecast a rank-ordering, for example, rather than also forecasting a level. Credit scores and credit scoring typically seek to rank-order consumers in their default, loss, or other likelihoods, without attempting to project the actual level of default rates, for example, across the score distribution. Scoring models that add the dimension of level prediction increase this layer of complexity.
In addition, model types 1 through 3 are generally unconditional predictors. They make no attempt to add the dimension of predicting the time path of the dependent variable. Predicting not just a consumer’s relative likelihood of an event over a future time period as a whole, for example, but also the event’s frequency level and time path of this level each year, quarter, or month, is a more complex and ambitious modeling endeavor. (This problem is generally approached through continuous or discrete hazard models.)
While generalizations can be hazardous (exceptions can typically be found), it is generally true that, in the events leading up to and surrounding the financial crisis, greater model complexity and ambition was correlated with greater model failure. For example, at what is perhaps an extreme, Coval, Jurek, and Stafford (2009) have demonstrated how, for model type 10, even slight unexpected changes in default probabilities and correlations had a substantial impact on the expected payoffs and ratings of typical collateralized debt obligations (CDOs) with subprime residential mortgage-backed securities as their underlying assets. Nonlinear relationships in complex systems can generate extreme unreliability of system predictions.
To a lesser but still significant degree, the mortgage- or housing-related models included or embedded in types 6 through 10 were heavily dependent on home-price projections and risk simulation, which caused significant “expected”-model failures after 2006. Home-price declines in 2007-2009 reached what had previously only been simulated as extreme and very unlikely stress paths. Despite this clear problem, given the inescapable large impact of home prices on any mortgage model or decision system (of any kind), it is generally acceptable to separate the failure of the home-price projection from any failure of the relative default and other model relationships built around the possible home-price paths. In other words, if a model of type 8, for example, predicted the actual profitability and enterprise cash flow quite well given the actual extreme path of home prices, then this model can be reasonably regarded as not having failed as a model per se, despite the clear, but inescapable reliance of the model’s level projections on the uncertain home-price outcomes.
Models of type 1, statistical credit scoring models, generally continued to work well or reasonably well both in the years preceding and during the home-price meltdown and financial crisis. This is very largely due to these models’ relatively modest objective of simply rank-ordering risks, in general. To be sure, scoring models in mortgage, and more generally, were strongly impacted by the home price declines and unusual events of the bubble and subsequent recession, with deteriorated strength in risk separation. This can be seen, for example, in the recent Vantage Score stress-test study, VantageScore 2.0 Stress Testing, which shows the lowest risk separation ability in the states with the worst home-price and unemployment outcomes (CA, AZ, FL, NV, MI). But these kinds of significant but comparatively modest magnitudes of deterioration were neither debilitating nor permanent for these models. In short, even in mortgage, scoring models generally held up pretty well, even through the crisis—not perfectly, but comparatively better than the more complex level-, system-, and path-prediction models. (see footnote 1)
Scoring models have also relied more exclusively on microeconomic behavioral stabilities, rather than including macroeconomic risk modeling. Fortunately the microeconomic behavioral patterns have generally been much more stable. Weak-credit borrowers, for example, have long tended to default at significantly higher rates than strong credit borrowers—they did so preceding, and right through, the financial crisis, even as overall default levels changed dramatically; and they continue to do so today, in both strong and weak housing markets. (see footnote 2)
As a general rule overall, the more complex and ambitious the model, the more complex are the many questions that have to be asked concerning what could go wrong in model risks. But relative complexity is certainly not the only type of model risk. Sometimes relative simplicity, otherwise typically desirable, can go in a wrong direction.
Unobservable or Omitted Variables or Relationships
No model can be perfect, for many reasons. Important determining variables may be unmeasured or unknown. Similarly, important parameters and relationships may differ significantly across different types of populations, and different time periods. How many models have been routinely “stress tested” on their robustness in handling different types of borrower populations (where unobserved variables tend to lurk) or different shifts in the mix of borrower sub-populations? This issue is more or less relevant depending on the business and statistical problem at hand, but overall, modeling practice has tended more often than not to neglect robustness testing (i.e., tests of validity and model power beyond validation samples).
Several related examples from the last decade appeared in models that were used to help evaluate subprime loans. These models used generic credit scores together with LTV, and perhaps a few other variables (or not), to predict subprime mortgage default risks in the years preceding the market meltdown. This was a hazardous extension of relatively simple model structures that worked better for prime mortgages (but had also previously been extended there). Because, for example, the large majority of subprime borrowers had weak credit records, generic credit scores did not help nearly as much to separate risk. Detailed credit attributes, for example, were needed to help better predict the default risks in subprime. Many pre-crisis subprime models of this kind were thus simplified but overly so, as they began with important omitted variables.
This was not the only omitted-variables problem in this case, and not the only problem. Other observable mortgage risk factors were oddly absent in some models. Unobserved credit risk factors also tend to be correlated with observed risk factors, creating greater volatility and unexplained levels of higher risk in observed higher-credit-risk populations. Traditional subprime mortgages also focused mainly on poor-credit borrowers who needed cashout refinancing for debt consolidation or some other purpose. Such borrowers, in shaky financial condition, were more vulnerable to economic shocks, but a debt consolidating cashout mortgage could put them in a better position, with lower total monthly debt payments that were tax deductible. So far, so good—but an omitted capacity-risk variable was the number of previous cashout refinancings done (which loan brokers were incented to “churn”). The housing bubble allowed weak-capacity borrowers to sustain themselves through more extracted home equity, until the music stopped. Rate and fee structures of many subprime loans further heightened capacity risks. A significant population shift also occurred when subprime mortgage lenders significantly raised their allowed LTVs and added many more shaky purchase-money borrowers last decade; previously targeted affordable-housing programs from the banks and conforming-loan space had instead generally required stronger credit histories and capacity. Significant shifts like this in any modeled population require very extensive model robustness testing and scrutiny. But instead, projected subprime-pool losses from the major purchasers of subprime loans, and the ratings agencies, went down in the years just prior to the home-price meltdown, not up (to levels well below those seen in widely available private-label subprime pool losses from 1990’s loans).
Rules and Tradition in Lieu of Sound Modeling
Interestingly, however, these errant subprime models were not models that came into use in lender underwriting and automated underwriting systems for subprime—the front-end suppliers of new loans for private-label subprime mortgage-backed securities. Unlike the conforming-loan space, where automated underwriting using statistical mortgage credit scoring models grew dramatically in the 1990s, underwriting in subprime, including automated underwriting, remained largely based on traditional rules.
These rules were not bad at rank-ordering the default risks, as traditional classifications of subprime A-, B, C and D loans showed. However, the rules did not adapt well to changing borrower populations and growing home-price risks either. Generic credit scores improved for most subprime borrowers last decade as they were buoyed by the general housing boom and economic growth. As a result, subprime-lender-rated C and D loans largely disappeared and the A- risk classifications grew substantially.
Moreover, in those few cases where statistical credit scoring models were estimated on subprime loans, they identified and separated the risks within subprime much better than the traditional underwriting rules. (I authored an invited article early last decade, which included a graph, p. 222, that demonstrated this, Journal of Housing Research.) But statistical credit scoring models were scarcely or never used in most subprime mortgage lending.
In Part II, I’ll discuss where models are most needed now in mortgages.
 While credit scoring models performed better than most others, modelers can certainly do more to improve and learn from the performance declines at the height of the home-price meltdown. Various approaches have been undertaken to seek such improvements.
 Even strategic mortgage defaults, while comprising a relatively larger share of strong-credit borrower defaults, have not significantly changed the traditional rank-ordering, as strategic defaults occur across the credit spectrum (weaker credit histories include borrowers with high income and assets).