Building robust data pipelines for a leading investment bank to make quality datasets ready for ML use cases
Built and automated data pipelines by carrying out data transformation and loading data into a central repository for further analysis by various teams
The leading global investment bank wanted to collect the data from multiple providers of financial data across the globe with different file formats, data fields, and data types, and feed it into a single data repository. This required mapping and transforming the data into a common file format and creating data pipelines for continuous loading of data at regular intervals. The customer also wanted to ensure regular maintenance of pipelines and resolution in case of failure.
We worked with the customer’s custom-built Integrated Development Environment (IDE), languages, and an in-house system to run pipelines. We acquired and loaded data sources, handled IDE to write models, carried out transformations as per the transformation code written, and worked with specific datasets to create code for the models. The pipeline developed for the customer handled the entire flow — from ingestion to pushing the data into the database — in an automated way and scheduled data at regular intervals. We also ensured continuous maintenance and monitoring of data pipelines.
We created over 400 data pipelines that allowed easy incorporation of new datasets from over 100 financial data providers across the globe to be readily available on a daily basis. We also helped define a process to identify, fix and deploy any issues encountered during the pipeline runs.
Vendors’ data can be onboarded on a daily basis
Pipelines created and maintained
Efficiency in identifying and fixing issues in pipeline runs