Automate data ingestion to enable near real-time access to insights

Reading Time: 5 minutes

Data Ingestion to Enable Near Real-time Access to Insights

CPG Companies across the globe are looking to get more insights into the latest sales trends from their retailers, to make informed decisions across the supply chain and marketing. However, the lack of availability of granular data from retailers is a huge challenge to derive meaningful insights.

COVID-19 has further fueled the need to have insights from retailer data to track the tremendous shift in consumer buying behavior, especially the CPG brands. Forbes reports that US CPG sales increased by 10.3% last year. This is 5 times the normal rate of change year on year.

While the growth of CPG companies was unprecedented, they faced challenges such as supply chain disruptions, unprecedented volatility in demand, reduction in supplies, and lack of clarity about future demand. Collecting and effectively analyzing different types of data from retailers including PoS, shopping cart, and other sales data mitigate these challenges by generating fast and granular insights necessary to make timely business decisions. Companies are also looking to create automated databases, data ingestion pipelines and automated data analysis from the data generated.

In this post, we provide detailed insight into how a leading FMCG firm was able to generate real-time insights leveraging the latest sales data from retailers and e-commerce websites to enable them with better reporting and forecasting. It leveraged the power of quality data and near real-time analytics to enable business users across the organization to make timely and informed business decisions on a daily basis.

The challenge

The client is a leading American FMCG operating in multiple categories including household, and personal care products. Their products are stocked by all leading retailers both online and in brick-and-mortar stores.

As the company was trying to have real-time access to sales trends, they faced the following challenges:

  • Lack of customer-level data from many retailers
  • The older data pipelines were monolithic and teams across geographies were downloading the data manually before exporting it to BigQuery to observe the trends
  • There were several missing data elements for events like holidays and weekends
  • The data was only available to a chosen few members of the organization, making it difficult for the people on the ground to understand how their business was performing
  • Most importantly, the report generation process was manual which made the whole process of generating insights slow. It typically took a week’s time to access particular data from the retailers

To address these challenges, the company was looking for ways to generate fast and granular insights, create data ingestion pipeline and automated reports from the retailer’s dashboard, for which the first step was to create a data lake.

There were challenges from retailers’ perspective too. For instance, some retailer dashboards did not provide key metrics such as ‘out of stock’ and ‘lost buy boxes’.

Creating Data Lake For CPG Company

Sigmoid created a data foundation for the client by building a Data Lake of the sales and forecast data from the retail vendors’ dashboards. We built a data lineage and processes to ensure data quality before pushing it to BigQuery, for further analysis.

Since the older pipelines were monolithic requiring manual interventions at various steps for currency conversion and aggregations, Sigmoid performed data harmonization between external sources and internal products and hierarchy. A binary search was implemented to identify data discrepancies rather than performing a full scan.

Additionally, our approach created several generic steps like drop rows/drop columns/remove characters, etc. which can be used in different sources with the minimum development effort. We even automated the download of reports from the vendor’s dashboard and monitored the correct flow of data from the dashboard to BigQuery.

The solution architecture consisted of the following steps:

  • Granular ingestion, generic implementation
  • Scripts for anomaly detection and fix
  • Use internal FX data for USD conversion
  • Automation on Google Cloud Platform
  • Created specific processes to request and fulfill backfill of data

Sigmoid data ingestion

Owing to the new data foundation, the CPG company now has faster access to data due to the elimination of manual processes. Sigmoid was also able to accomplish tasks such as:

  • Currency conversion before sending the data from Amazon dashboard to BigQuery using the Forex data
  • Verify a year-old data and reinstate it for the mismatched dates
  • Enabled data ingestion at all granularities (day, week, month, year)
  • Designing and monitoring of the ETL Process


Using Sigmoid’s approach to real-time data analytics, the CPG company can currently download up to 300 different types of datasets every day spread across 14 countries and 22 companies. In a short span of 12 weeks, Sigmoid was able to automate data ingestion for one of America’s largest retailers which has the highest share of total sales in the country.

Sigmoid was able to provide the following benefits to the CPG company:

  • Automation of reports across 10+ retailers
  • Enabled the client with faster access to data, thereby reducing the time to get actionable insights
  • Cut down the time to access data from 7 days to 2 hours, thereby reducing the time to generate actionable insights
  • Enhance reporting across retailers and geographies
  • Cost-saving of $1.5 million due to reduced dependence on third party reports
  • Avoid stockout scenarios and Lost Buy Box scenario
  • Accessibility to same quality data to over hundreds of employees which were earlier accessible to only the leadership teams

Download the case study document here.

About the author

Srishti is Content Marketing Manager at Sigmoid with a background in tech journalism. She has extensively covered Data Science and AI space in the past and is passionate about technologies defining them.

Transform data into real-world outcomes with us.