The client operates the world’s largest advertising exchange, connecting thousands of brands with consumers globally. They wanted to improve profitability by cutting down on operational costs involved in maintaining the existing on-premise infrastructure. With complex interconnected systems, they faced the following challenges:
- Scaling with different sources and ETL pipelines
- Longer query processing time
- Inefficiency due to manual data analysis
To enable faster data processing, improve data accuracy, predict ETL bottlenecks and reduce maintenance cost, Sigmoid created a new ETL framework while streamlining existing databases and sending data from ad exchanges to a centralized repository warehouse — Google BigQuery. Sigmoid developed an Apache Spark-based ETL framework, hosted on Google Cloud, to set up new pipelines within minutes and manage feeds running in real-time.
Sigmoid improved the data accuracy by nearly 2.4 times and ingested over 150 billion rows of data daily from 120+ feeds. The solution resulted in $2.5 million annual cost savings and improved customer satisfaction with 15x faster data processing. It also ensured that the system has minimum manual intervention and is low maintenance with better fault tolerance.