Data Streams & Ingestion Pipeline
In a high-stakes project requirements for the GOV.UK
public sector firm, I along with my team was tasked with migrating critical data from legacy source systems,
such as Oracle and Microsoft Access, to modern, scalable targets like Postges/Amazon Aurora and Amazon Neptune.
The objective was to ensure a seamless transition with minimal downtime while maintaining data integrity, data quality and meeting GOV.UK compliance standards.
Also, an additional Business requirement was to process and transform data in real-time to support the firm's analytical and operational needs on the UK borders.
Key Challenges Solved
- Volume of data (in-country and external)
- Switching to Skyscape or AWS cloud
- Data loss prevention mechanism
- Data replaying mechanism
- Data quality check & data exception handling
- Environment management and provisioning
- Real-time data processing and transformation
- Test coverage of end-to-end data pipeline
- PII data compliance
My Key Contributions
To address the critical business requirements, I collaborated with a multidisciplinary team of architects, DevOps engineers, BAs and SDETs to implement an MVP data pipeline solution initially.
As we refined the pipeline, we tackled additional challenges and requirements, ultimately deploying a reliable product into production.
- Leveraged Kafka for real-time data streaming, orchestrating continuous data flow from Oracle and MS Access to PostgreSQL and then to Neptune.
- Employed comprehensive unit and end-to-end integration testing strategies to anticipate and mitigate potential data/functional issues.
- Employed comprehensive NFR testing strategies to anticipate and mitigate potential load and reliability issues, ensuring a smooth operation.
- Created Cookie-Cutter developer and tester templates for Python multi-module projects to minimize human error and save expensive build hours
Tech & Operational Documentation
- Created Confluence run-books and work-instructions pages for Operations team
- Created Confluence live-support instruction pages for Operations and Tech-support team
- Created API and service contracts for external dependencies, aimed at low-tech teams
- Created Business Spec versioning guidelines for external dependencies, aimed at low-tech teams
- Created Swagger/OpenApi documentation versioning for developers and testers
Tech Stack Used
+ DATA STREAMING:
Kafka
+ CONTAINERISTION:
Docker,K8s
+ DATABASES:
Postgres,Oracle,Neptune,Aurora
+ LANGUAGES:
Python,Java
+ INFRA:
AWS
+ CI CD:
Jenkins
+ SCM :
Git,Bitbucket
+ SCRIPTS:
tf,cf,bash
+ VISUALIZATION:
Grafana,Prometheus