The customer is a leading semiconductor manufacturer that produces chips for markets ranging from mobility to computing and industrial IoT. Once the chips were manufactured and ready to be shipped out, data such as chip description, cost, chip size, etc., (categorical and numerical data) was collated into the product master data, used by various processes and functions. The sales team used the data for cost estimation whereas the finance team used it for revenue forecasting. However, the table had many missing values and issues with data quality due to gaps in data management, resulting in lost sales opportunities and revenue forecasting errors.
We adopted a classification-based approach and used Amazon’s neural network-based deep learning library, Datawig, to create prediction models that could impute missing values. The deep learning library predicted the values of missing string data (categorical and textual data) with high accuracy, which the typical classification method fails to accomplish. It allowed us to create vectors and columns from the string data without the need to code. We also productionized classification-based ML models using SageMaker. An approval mechanism was set up to approve or reject the model based on the model score and key performance metrics.
Sigmoid’s solution saved time, cost, reduced manual efforts and brought a significant correction in revenue forecasting. The new solution was 90% more efficient in predicting categorical data and recorded a standard deviation of less than 0.5 for the numeric columns.