Data Engineer
Current- Empowered stakeholders with efficient data management by building, maintaining, and provisioning ETL pipelines using Python and Spark on AWS services.
- Implemented Python scripts and Spark optimization techniques, along with code refactoring, to mitigate ETL pipeline bottlenecks, resulting in a 20% reduction in data transfer time.
- Successfully orchestrated the migration of data from transactional databases to the data warehouse, reducing data retrieval time by 15% and ensuring zero… Show more
- Successfully orchestrated the migration of data from transactional databases to the data warehouse, reducing data retrieval time by 15% and ensuring zero data loss throughout the process.
- Ensured the integrity of the production environment by conducting up to 100+ unit tests written in Python prior to deployment, effectively preventing potential issues.
- Technologies stack: Python (PySpark, Pandas, Awsglue, Boto3, Unittest, Pytest), Spark, Terraform, Gitlab, AWS (Glue, Lambda, Step Functions, EventBridge, RDS, S3, ECS, Fargate), GCP (Cloud Storage, Dataproc), Airflow.