Data Engineer
Current- Develop data pipelines for transforming streaming data from Kafka Streams and loading them into AWS S3 buckets and Snowflake data warehouse
- Designed and developed near real time spark streaming solution for processing CDC events from Oracle Golden Gate and Ingesting them onto Amazon S3 for long term storage and streaming onto Snowflake for Tableau.
- Defining and coding Airflow Orchestration jobs for processing batch workloads to be launched in AWS EMR.
- Built a PySpark framework to detect and process data that failed data quality checks during migration to AWS S3, resolving over 5400 data rejection errors for about 1500 datasets, reprocessing data and preventing data.