Named Entity Recognition – Intelligent Entity Extraction
In many industries including banking and finance, a large number of documents such as forms and invoices are still in paper format. There is a lot of manual effort to identify and import parts of information from these paper documents or their scans. Although, there are plenty of available OCR systems to digitize and convert paper documents to machine-readable text, identifying text for relevant entities such as names of banks, people, cities, etc. still remains a challenge.
Persistent Systems has developed a solution that can be easily integrated with different OCR systems to digitize scanned document pages and identify relevant entities from such documents. With multiple machine learning components, the solution can be specifically tuned for different kinds of entity extraction from the text and can also be regularly re-trained with new feedback, thus improving the machine learning algorithm accuracy. This intelligent entity extraction solution has been deployed successfully with multiple customers, providing more than 90% coverage and accuracy for many different types of entities extractions.
Customer Success Snapshot
A leading service provider in the banking and financial domain across USA, Australia, and New Zealand.
- Extraction of specific named entities like numbers, dates, and names from free-form text data was time consuming and prone to human errors.
- Due to global presence, adaptation to different versions and structures of documents across multiple countries and states was essential.
- Identifying and labelling entities to map them to specific columns in a database was needed to train the machine learning algorithm.
- We needed a robust Machine Learning system that can continuously improve with feedback from the deployed model.
- A machine learning component was built to continuously learn and distinguish between the correctly and incorrectly identified entities.
- The solution was developed in Python, using Python-based packages such as NLTK, Numpy, Pandas, Scikit learn, Tensorflow, etc., and was scalable to analyze millions of document pages every day.
- More than 90% accuracy observed for entities such as numbers and dates.
- The solution helped the customer reduce the human effort for manual conversion by 80%.
- Boosted productivity and scalability with consistent accurate results as human errors were reduced heavily.
You May Be Interested In
Whitepaper: Facial Analysis: Face Recognition, Expression Recognition, and Gender Identification