Client Success

A leading specialty insurance provider automates data extraction from its loss run reports with ML

There are many industries that can’t be covered with standard insurance policies, like construction, healthcare, environment, and energy sectors. Specialty insurance is drafted by taking into consideration these industries’ high-risk and unique nature. The client is a leader in this space and provides specialty insurance to both businesses and individuals in the US. 

For their commercial automobile insurance operations, they receive loss run reports from various carrier agencies. The loss run reports are generated by carrier proprietary systems to show the claim against the insurance policies. Because of the sheer number of different carriers, these reports come in diverse file formats and structure with no standardization. 

To perform analysis for risk and future underwriting, the client had to manually identify, extract, and store the values of the key fields or entities from these reports. They opened each file, looked for the key field values, and populated a spreadsheet – an extremely time-consuming and tiring manual processing of unstructured reports that often led to delays and inaccuracies in the claims.

Persistent, with its digital consultancy expertise, built an ML-powered modernized solution to automate this process. The solution utilized an ML model powered by Google Cloud and Vision API to first convert all the files into one standard text format. Then the model was trained for entity extraction of the key fields like – carrier name, type, amount – on AutoML. Through the training of the model, the Persistent team discovered that even the newer file formats have some level of consistency and entities can be extracted from them too. The model performed to a precision of 98.72%, recall of 98.81%, and accuracy of 98.76%.

A suite of Google Cloud products that power the ML solution

Document AI Solution is the backbone of this project. With its suite of APIs, it is able to create usable data from document sources such as PDFs. Google’s Cloud Vision API was chosen for text extraction from the PDFs. AutoML was used to quickly train the model from labeled data. All the reports are stored in Cloud StorageCompute Engine runs the conversion of reports from PDF to txt to jsnol. App Engine Flex is used to update the workflow of extracting entities from the reports.

A more robust approach with efficiency and productivity gains

For the client, this method has proved more robust than an ETL-based approach. Document AI has helped the client reduce error rates, increase the processing volume, improve the processing turnaround time, boost productivity, and gain additional insights. 

A huge set of entities are being extracted now – account name, claim number, feature type, payment status, reserve, and recoveries. And these automated reports are being made available for easy search, query, and consumption, and to derive actionable insights. Now, the client can quickly assign risk scores to their policies

Asheesh Sharma, Google Business Unit, Persistent

Contact us

    You can also email us directly at

    You can also email us directly at