Data Processing – Case study​

LinkedIn
Email

BUSINESS PROBLEM

The Data Pipeline from our client was running at about 1 task per minute. The main issue was that the processing required custom business logic, and some tasks even required OCR capabilities. ​When a large volume of data needed to be processed, it usually took several days​


SOLUTION 

The DE team initially came up with a few potential solutions and discussed them with the customer to ensure we picked the most suitable solution​ – We decided to use parallel processing via message queues​ – The concept is that instead of using a single computer for all the data processing, we will split the data management and processing parts to separate worker nodes (different computers)​ – This approach allows us to completely decouple the data processing and data orchestration applications​


IMPACT 

Using this approach, we were able to process over 50 tasks per minute​ – There was minimal cost increase, as most of the solution was architected to run with limited external dependencies​ – This also allows the software to be very flexible and can adapt to bigger data volumes​

Please fill in all required fields below.