Associate Product Architect
Current• Developed PySpark scripts for efficient onboarding of CSV files, transforming them into Parquet format for optimized storage and processing.• Implemented data quality checks and validation procedures to ensure accuracy and reliability of the processed data.• Orchestrated end-to-end data pipelines using Apache Airflow, ensuring seamless execution and timely processing of tasks.• Utilized Amazon EMR (Elastic MapReduce) for distributed computing, executing PySpark scripts at scale to handle large datasets.• Collaborated with the operations team to address and resolve errors and issues, leveraging extensive log analysis and debugging techniques.• Worked closely with the ops team to enhance system performance, identify bottlenecks, and optimize data processing workflows.• Created tables in Amazon Athena for efficient and interactive query access to the processed data.• Collaborated with AWS S3 for storing and managing the processed data, optimizing storage costs and accessibility.