80% performance improvements in data pipelines

Built scalable and high performing data pipelines in an Azure data platform to ingest huge volumes of data from large retailers

Business Challenge

The client is a retail data vendor who captures, collates, and analyzes POS and sales data to generate insightful reports for CPG manufacturers. Their existing system had issues with regards to performance, scalability, stability, and cost. This made it challenging for them to onboard data from large retailers onto their existing platform.

Sigmoid Solution

Sigmoid revamped the ETL architecture by performing comprehensive analysis and benchmarking various components and business rules. The ETL data pipeline and data reconciliation codebase were rewritten and incremental changes were made to quickly test the impact. The Spark transformations code tuning was done based on the data size, cluster size and memory footprint, apart from implementing the autoscaling features in the Spark cluster.

Business Impact

Built more stable data pipelines and enabled faster data processing, reducing the execution time of Spark transformations from 90+ mins to under 15 mins.

Find out what data services can do for you.