This post was guest authored by AWS Advanced Consulting Partner Quantiphi.
The mortgage industry is highly complex and largely dependent on documents for the information required across different stages in their business value chain. Day-to-day operations for mortgage underwriting, property appraisal, and mortgage insurance underwriting are heavily dependent on the comprehension of different types of documents. The slow pace of document transfer between different business units of an organization slows down the overall approval process, leading to poor customer experience.
The mortgage loan approval process usually takes multiple weeks because a multitude of user-submitted documents are scrutinized at each stage to assess the underlying risk. Organizations need the right information at the right time to increase operational efficiency and better document management.
In the wake of COVID-19, the mortgage industry is reeling under pressure to undergo a digital transformation to provide a better customer experience. Large companies are cutting down capital and operational expenditure to sustain operations. The need for operational efficiency is higher than ever.
This post analyzes the role of machine learning (ML) solutions in document extraction in the mortgage industry to enhance business operations.
We highlight the key aspects of Quantiphi’s document processing solution built on AWS, and unveil how it helped a US-based mortgage insurance company address document management challenges through artificial intelligence (AI) and ML techniques.
Quantiphi is an AWS Partner Network (APN) Advanced Consulting Partner with AWS competencies in Machine Learning, Financial Services, Data & Analytics, and DevOps. Quantiphi also has multiple AWS Service Delivery designations, recognizing its expertise in leveraging specific AWS services.
Lenders usually have to manually sieve through large volumes of loan packages containing structured and unstructured information to classify documents and identify key information. The identified information is further used for risk assessment. Most of this key information is usually contained in paragraphs, key-value pairs, and tables.
These lenders usually receive loan packages in bulk containing different types of documents such as W2, tax statements, 1008 forms, and so on. Currently, people have to first classify these documents manually and extract the relevant information. Therefore, mortgage firms are looking for meaningful ways of incorporating cognitive capabilities and solutions into their existing mortgage processing pipeline to automate the identification of key information and facilitate easy risk scoring in order to develop operational excellence and reduce manual efforts.
Quantiphi’s cognitive document processing solution combines state-of-the-art AI and ML services from AWS with Quantiphi’s custom document processing models to digitize a wide variety of mortgage documents. Quantiphi’s solution leverages services like Amazon Textract, Amazon SageMaker, Amazon Comprehend, Amazon Kendra, and Amazon Augmented AI (Amazon A2I) to help mortgage firms extract information from structured and unstructured documents, classify them into document types, and further address needs around risk assessment through ML.
Mortgage underwriting is done to assess the underlying risk for each application by analyzing the multitudes of user-submitted documents, such as W2 or I9 forms, tax returns, loan application (1003) forms, underwriting transmittal (1008) forms, demographic addendum, credit reports, bank account statements, and paycheck stubs. For example, the underwriting transmittal (1008) form contains the summary of the key information used during the risk assessment such as monthly income, qualifying rate, property details, and occupancy status. Paycheck stubs are another example of such documents, used to understand a borrower’s income in order to be sure that the borrower is able to repay the loan.
Similarly, property appraisal documents such as chain of title document and deed documents (assignment, trust, quitclaim) along with the property appraisal report are used to complement the property valuation process. Deed documents are processed to establish ownership and legal rights to a property. For example, if the lender sells a mortgage loan to another lender, they need to issue an assignment of deed of trust to give the new lender the same legal rights to the property.
Based on the inherent structure of the different types of mortgage documents, we have defined three broad segments to classify these documents:
Examples of structured documents include the loan application (1003) form, underwriting transmittal summary (1008) form, verification of employment (1005) form, and W2 form.
Consider underwriting the transmittal summary (1008) form. Quantiphi’s solution uses the standardized 1008 document as a reference for training, which is then used for extraction (see the following screenshot).
Key information that can be extracted from 1008 includes borrower and co-borrower names, property address, SSN, sales price, and appraised value.
Semi-structured documents include pay stubs, bank statements, credit reports, and loan estimates. Here, Quantiphi’s solution uses a generic key-value pair and table detection model to extract the relevant features. Searching for certain common keywords results in a more efficient extraction.
The following screenshot shows data extraction from a pay stub.
Key information that can be extracted from includes paid period, deductions, net pay, 401K summary, and more.
Unstructured documents include deeds documents, appraisal reports, and more. Consider the assignment of deed of trust. Quantiphi’s Solution uses custom NLP techniques like entity recognition and syntax analysis to extract information (see the following screenshot).
Key information that can be extracted from the assignment of deed of trust includes the date of assignment, assignor, assignee, executor name, principal sum, and more.
Quantiphi’s cognitive document processing solution works across all types of structured and unstructured mortgage documents. Some key aspects of Quantiphi’s solution are as follows:
The extracted information can be further fed into a risk assessment module to enable risk scoring of submissions in which low-risk applications are auto-approved and high-risk applications are marked for human review.
Quantiphi’s cognitive document processing solution is capable of achieving over 90% accuracy, provides substantial cost reductions, and facilitates better visibility of the mortgage processing workflow while assuring faster processing.
Let’s look at how Quantiphi built this solution by using a combination of AI and ML services provided by AWS.
Components used in the architecture ensure that the complete solution remains robust and scalable while providing high performance and reliability to process the incoming workload of documents in a cost-effective manner.
The following diagram illustrates the architecture of Quantiphi’s solution.
The architecture consists of the following elements:
For this post, we present a use case in which the client is a leading US-based mortgage insurance company with a suite of mortgage, risk, real estate, and title services.
The client had millions of scanned pages of mortgage documents containing both handwritten and typed content, which were manually parsed to extract information and classify them accordingly. Processing of new mortgage loans was extremely time-consuming due to manual handling of over 400 different document types.
Quantiphi developed an AI virtual assistant that takes user-uploaded documents and automatically classifies the documents and pages contained in them into different categories, such as bank statements, credit reports, tax returns, and property tax bills and statements.
To augment the consumption of information, the solution highlights key entities (with bounding boxes) present in them. The user can review and edit the extraction results via a custom reviewer UI tool, which is then used for accuracy benchmarking and re-training purposes.
Amazon QuickSight was used to build an interactive dashboard for presenting accuracy metrics and reconciliation.
The solution successfully digitized the processing of more than 50 million pages yearly, with an accuracy of over 97% in classification and 90% in the extraction of more than 40 different data points like borrower’s name, loan amount, and so on.
Quantiphi succeeded in expanding the customer’s profit margin by lowering its document processing costs. Their processing efficiency was enhanced through quick and accurate extraction and detection of data while eliminating manual efforts to greatly reduce the loan processing time.
Traditional methods of mortgage loan processing are manual in nature and highly time-consuming. Customers are often asked to provide a large number of documents that lenders have to manually go through for assessment.
Quantiphi’s cognitive document processing solution expedites the process by automating information extraction from the documents. Mortgage companies can use Quantiphi’s solution to increase their operational efficiency and significantly reduce their mortgage processing time.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
Arnav Gupta is AWS Practice Head at Quantiphi.
Bhaskar Kalita is FSI Head at Quantiphi.