Data Engineer And Analytics Engineer
Current•Extracted data using REST APIs of different media and ads platforms like FB, Google, YouTube by developing application using Scala with cats and functional programming to write it to AWS S3 in compressed (gzip) format.•Successfully migrated legacy data warehouses to Amazon Redshift, utilizing AWS Data Migration Service and Redshift Spectrum to enable efficient data querying across both historical and current datasets, enhancing data analytics capabilities.•Hands on experience on Spark Tech, SQL, and Hive to build data ETL pipelines which loads the raw data into Delta Lake•Worked on Jenkins to automate building, testing, and deploying Docker applications and facilitating CI/CD process.•Deployed and managed Kafka clusters on Kubernetes using Helm charts, ensuring high availability and scalability. •Developed scalable data processing pipelines using Node.js, enhancing ETL workflows and reducing data processing time by 30%.•Developed Airflow DAGS using python to trigger the pods in Kubernetes cluster and run spark jobs on Databricks cluster.•Leveraged JSON for efficient data serialization and deserialization in RESTful APIs, improving data retrieval speeds and API response times.•Utilized jQuery for DOM manipulation and event handling to improve the interactivity and user experience of data-heavy web applications.•Implemented and optimized ML algorithms for low latency environments using tools like Apache Kafka for real-time data streaming.Apache Flink for stream processing, and TensorRT for high-performance inference.•Designed and maintained distributed systems leveraging Apache Spark for large-scale data processing, Kubernetes for container orchestration, and Consul for service discovery and configuration management.•Ported legacy ETL from Hive to Databricks Spark, boosting performance by up to 1000% and cutting AWS EC2 costs by 50%.•Utilized Google BigQuery and Dataflow/Data Fusion for seamless data processing and analytics.