Intelligent Entity Extraction & Recognition

In many industries including banking and finance, a large number of documents such as forms and invoices are still in paper format. There is a lot of manual effort to identify and import parts of information from these paper documents or their scans. Although, there are plenty of available OCR systems to digitize and convert paper documents to machine-readable text, identifying text for relevant entities such as names of banks, people, cities, etc. still remains a challenge.

Persistent Systems has developed a solution that can be easily integrated with different OCR systems to digitize scanned document pages and identify relevant entities from such documents. With multiple machine learning components, the solution can be specifically tuned for different kinds of entity extraction from the text and can also be regularly re-trained with new feedback, thus improving the machine learning algorithm accuracy. This intelligent entity extraction solution has been deployed successfully with multiple customers, providing more than 90% coverage and accuracy for many different types of entities extractions.

Customer Success Snapshot

A leading service provider in the banking and financial domain across USA, Australia, and New Zealand.

Challenges
  • Extraction of specific named entities like numbers, dates, and names from free-form text data was time consuming and prone to human errors.
  • Due to global presence, adaptation to different versions and structures of documents across multiple countries and states was essential.
  • Identifying and labelling entities to map them to specific columns in a database was needed to train the machine learning algorithm.
  • We needed a robust Machine Learning system that can continuously improve with feedback from the deployed model.
Persistent Solution
  • A machine learning component was built to continuously learn and distinguish between the correctly and incorrectly identified entities.
  • The solution was developed in Python, using Python-based packages such as NLTK, Numpy, Pandas, Scikit learn, Tensorflow, etc., and was scalable to analyze millions of document pages every day.
Result
  • More than 90% accuracy observed for entities such as numbers and dates.
  • The solution helped the customer reduce the human effort for manual conversion by 80%.
  • Boosted productivity and scalability with consistent accurate results as human errors were reduced heavily.

You May Be Interested In

VIEWS

Whitepaper: Facial Analysis: Face Recognition, Expression Recognition, and Gender Identification

VIEWS

How machine learning can help with demonetisation and counterfeit currency

VIEWS

Anomaly Detection in IIoT: A Case Study using Machine Learning

VIEWS

Machine learning will be the most exciting tech trend of 2018: Persistent Systems CTO

OFFERINGS

Engage Persistent Systems to conduct a Machine Learning Workshop for you!

OFFERINGS

Watch how ChatOps uses AI & ML to integrate and automate DevOps process

Start typing and press Enter to search

Contact Us
close slider

Contact Us
Have a question? Drop us a line and we will get in touch with you!

Yes, I would like Persistent to contact me on the information provided above. Click Here to read our full Privacy Notice.