Data Engineer Ii
Current- Developed and maintained scalable big data pipelines using PySpark/Python and AWS services (S3, Athena, EMR/EC2, Glue, SQS/SNS, etc.), processing millions of rows of health data efficiently and accurately
- Built structured and unstructured data models via RDS (Redshift, Aurora), NoSQL DB, and Elastic Search; Refined schema and performed query tuning for fast and efficient access by data applications
- Designed and implemented scalable and maintainable data pipelines; Optimized and improved performance by conducting Spark, cluster, and database tuning
- Automated and monitored jobs through Airflow, reducing manual workload by 30% weekly and ensuring timely data delivery
- Proficient in writing SQL across several dialects to operate in relational databases (MySQL, PostgreSQL, SQL Server) for data governance and database management