As a Data Engineer at PayPal, I enable data-driven decision making by transforming raw data into strategic insights. I have a Bachelor from PES University, where I learned the fundamentals of software development and distributed systems.I have over three years of experience in data engineering, working with various cloud platforms, big data tools, and SQL-based reporting tools. At PayPal, I gather business requirements, design data models, analyze existing systems, and propose improvements in processes. I also develop a process for loading and transforming data, and create data set process for data mining and data modeling. Previously, I have migrated ETL jobs to Pyspark scripts, created pipelines in Azure Data Factory, implemented CDC process using SCD Type 2 Conversion, and used AWS Kinesis for real-time data streaming and processing. I am passionate about data engineering and always eager to learn new technologies and best practices. I value teamwork, innovation, and customer satisfaction, and I believe I can bring diverse perspectives and experiences to the team.
-
Data EngineerHyatt Regency Sep 2020 - Apr 2022United States• Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks• Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processed the data in Azure Databricks.• Migrated ETL jobs to PySpark scripts to perform transformations, joins, and pre-aggregations before storing the data into HDFS.• Created pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, and Pipelines to extract, transform, and load data from different sources like Azure SQL, Blob Storage, Azure SQL DW, and write-back tools.• Implemented CDC process using SCD Type 2 Conversion and performed CDC on one year's worth of historical data.• Leveraged Cloudera's ecosystem, including tools like Apache Spark, Apache Hive, and Apache Kafka, to implement end-to-end data processing pipelines for batch and real-time data analytics.• Created Application Interface Documents for the downstream to create new interfaces to transfer and receive files through Azure Data Share.• Used MPP engines like Azure Synapse to process data, utilizing Azure SQL pools to scale up/down.• Designed and implemented an integration framework using Apache Spark to replace Informatica BDE tool, applying transformations and quality checks on data.• Used Python for Spark programming, leveraging data structures like dictionaries and lists for small data sets.• Troubleshot various Spark issues like data skew, memory issues, join issues, and tuned configurations.• Designed and implemented data platforms on Cloudera and CDP for data analysis and reporting, integrating with cloud services such as Amazon S3 and Redshift for seamless data integration and storage.• Designed and deployed data pipelines using Data Lake, Databricks, and Airflow. -
Data EngineerE*Trade From Morgan Stanley Oct 2018 - Aug 2020• Collaborated with Architects and Subject Matter Experts to review business requirements and build sources to target data mapping documents and pipeline design documents.• Worked in an Agile environment with weekly Sprints and daily Scrum meetings.• Used AWS data pipeline for Data Extraction, Transformation and Loading from heterogeneous data sources.• Experience with AWS Kinesis for real-time data streaming and processing. Familiarity with Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics. Knowledge of using Kinesis with other AWS services such as S3, Lambda, and DynamoDB.• Familiarity with SQL-based data warehousing and reporting tools such as Tableau and PowerBl.• Writing to Glue metadata catalog which in turn enables us to query the refined data from Athena.• Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue.• Written PySpark job in AWS Glue to merge data from multiple tables and in Utilizing Crawler to populateAWS Glue data Catalog with metadata table definitions.• Designed and developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.• Implemented Spark RDD transformations to Map business analysis and applied actions on top of transformations.Skills: Python (Programming Language) • Tableau • MySQL • Hadoop • Scala • Amazon Web Services (AWS) • SQL • Microsoft SQL Server • Data Analysis • Extract, Transform, Load (ETL) • Data Warehousing -
Data EngineerForay Software Private Limited Aug 2016 - Sep 2018IndiaIdentified analytical solutions to extract business insights by conducting analysis on large amounts of business data with an emphasis on identifying trends and determining root cause analysis to drive effective business decisions. • Used AWS services like EC2 and S3 for small data sets processing and storage. Experienced in Maintaining the Hadoop cluster on AWS EMR.• Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.• Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift and SQL Server using Python. • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.• Experience in using and tuning relational databases (e.g. Microsoft SQL Server, Oracle, MySQL) and columnar databases (e.g. Amazon Redshift).• Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.• Connected Redshift to Tableau and created company dashboards using KPIs that helped to communicate product metrics, trends, and other key indicators to the leadership.• Implemented AWS Datapipeline to trigger scheduled EMR jobs in the AWS big data stack and alerts on the success and failures of the Datapipeline jobs.
Frequently Asked Questions about M P
What is M P's role at the current company?
M P's current role is Experienced Data Engineer | Transforming Raw Data into Strategic Insights 🚀 Data Engineer | Turning Raw Data to Insights | ETL Expert | Python, SQL, Spark | Let's Innovate Together! #Data #Tech.
Not the M P you were looking for?
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial