Sr Data Engineer
Current
Oklahoma City, Oklahoma, US
- Design and developed the streaming framework which ingest the data from sources like Kafka, S3 to AWS S3, Kafka, GreenPlum.
- Developed the pipeline to ingest the data from MySql to AWS S3 using Spark & deployed to Kubernetes cluster using Helm Charts.
- Developed the helm chart for a Spark job to deploy multiple jobs in Kubernetes cluster.
- Design and developed the Spark data optimization pipeline which deletes the data of a given AWS S3 delta lake dataset and performs optimize & vacuum using PySpark.
- Worked on Kubernetes deployments of Spark jobs using Helm charts.
- Implemented jobs to ingest streaming data from Kafka to Greenplum database using GPSS (Greenplum Streaming Server).