80% performance improvements in data pipelines
Built scalable and high performing data pipelines in an Azure data platform to ingest huge volumes of data from large retailers
The client is a retail data vendor who captures, collates, and analyzes POS and sales data to generate insightful reports for CPG manufacturers. Their existing system had issues with regards to performance, scalability, stability, and cost. This made it challenging for them to onboard data from large retailers onto their existing platform.
Sigmoid revamped the ETL architecture by performing comprehensive analysis and benchmarking various components and business rules. The ETL data pipeline and data reconciliation codebase were rewritten and incremental changes were made to quickly test the impact. The Spark transformations code tuning was done based on the data size, cluster size and memory footprint, apart from implementing the autoscaling features in the Spark cluster.
Built more stable data pipelines and enabled faster data processing, reducing the execution time of Spark transformations from 90+ mins to under 15 mins.