Data Science Intern
Developed a scalable PySpark pipeline, filtering over 1 million IP addresses across diverse data sources, increasing data processing efficiency by 30%.Automated the extraction of campaign data from Parquet files and calculated daily data quality metrics, reducing manual effort by 20%.Conducted multidimensional analysis of campaign data across tables and.