Data Engineer
- Accessing Data Lake in Databricks using Azure AD Service principal application.
- Developed Pyspark code to read, transform and write data for Batch Processing.
- Build ETL Script to implement Spark Structured Streaming that guarantees exactly-once stream processing.
- Designing Spark Cluster for a Streaming ETL Process.
- Autoscaling in spark clusters and Spot instances.
- Understanding the use of inferSchema.