Please ensure Javascript is enabled for purposes of website accessibility
top of page
  • Writer's pictureSharad Chaudhary

An Introduction to MachineSP

The Problem

The US has one of the largest and most developed mortgage markets in the world with nearly two-thirds of all households holding a mortgage. Daily transaction volumes in securities created by pooling these mortgages rank second only to US Treasuries. The confluence of these two factors results in significant demand for tracking the repayment rates of mortgagors for both micro (analyzing individual investments) and macro reasons (the financial health of consumers and institutions holding mortgage-backed securities), across a range of market participants (mutual funds, hedge funds, commercial banks, regulators etc.).

Over time, three of the key government entities (the “Agencies”) responsible for guaranteeing mortgage securitizations, Fannie Mae, Freddie Mac, and Ginnie Mae, have responded to this demand by releasing more data on mortgage-backed securities (MBS) and the underlying loans that collateralize them. The real explosion in publicly available mortgage loan level data dates back to the last ten years and the total volume of data currently available is on the order of ~10 billion records.

However, we believe that the market has barely scratched the surface with respect to making use of this data for activities such as inferring housing market conditions, predicting borrower repayment rates, and/or trading MBS. This is because most large financial institutions are still wedded to traditional IT systems that are inadequate for datasets of this magnitude and in general have been slow to adopt the enormous recent advances in cloud computing, distributed processing, and machine learning due to dealing with the fallout from the Financial Crisis. While smaller specialized technology vendors have addressed some of these gaps, they have: (a) tended to just focus on the data instead of looking at data exploration and model building as tightly coupled activities, (b) created platforms that are targeted towards users who have the bandwidth and background to learn specialized querying languages, (c) have not fully exploited the data because of their lack of in-house domain expertise, and/or (d) built systems that suffer from middling performance and out-of-date design. Finally, the pricing of these vendor systems has not fully adjusted to the fact that the total cost of technology ownership has dropped dramatically over the past few years due to the virtualization of software and hardware.

The Solution

Today, technologies appropriate to the scale of mortgage data are available. Their potential benefits include: 1) significant improvement in performance and efficiency of traditional tools, 2) flexibility in accommodating a wide range of users and use cases, 3) the potential for dramatic cost reduction, and 4) making possible new, true loan-level analytics. On the other hand, these technologies, which have emerged over the last few years, also require specialized expertise, and substantial research and experimentation to adapt to domain needs.

MachineSP’s data and analytics platform, StoryBook, leverages these technological advancements:

  • Both data and hardware are available on demand on the cloud, with no need for users to maintain expensive servers or software licenses.

  • Agency MBS loan- and pool-level data from eMBS is available in normalized form, with the integration of relevant data, such as State-level house price indices and mortgage rates (for estimating refinancing incentives).

  • Standard MBS trading and research workflows are supported through a flexible and intuitive user-interface. Examples include associating any pool with its prepayment benchmark, sorting a heterogeneous collection of pools into homogeneous groups and analyzing their prepayment behavior, visualizing prepayment S-curves, tracking the evolution of the prepayment S-curve over time etc.

  • Sub-second query response time for most queries.

Here, it is worthwhile to point out that there are numerous database and machine learning platforms offering generalized industry solutions.* StoryBook works specifically within the mortgage domain, and incorporates extensive research and experimentation with different tools and technology to optimize them for mortgage market use cases.

213 views0 comments

Recent Posts

See All

Mortgage Rate and Incentive Calculations in StoryBook

Conforming Mortgage Rates in StoryBook The mortgage rates used in StoryBook are based on the most reliable and long-running survey of primary mortgage rates for conforming loans -- Freddie Mac's Prima

Ginnie Mae Loan-level Data on BD4

Introduction In addition to providing pool-level A factors on GNMA MBS pools on the 4th business day (BD4) of each month, StoryBook also offers a loan-level data set that is derived from the loan liqu

Extended-Term (ET) Loans in GNMA Pools

Background An ET loan corresponds to a loan modification that lowers a borrower's monthly payments by restructuring an existing 30-year loan into a 40-year term (or some term between 30 and 40 years),


bottom of page