Hello, My name is Karthik and I am a seasoned Senior Data Engineer with 11+ years of experience in development, design, integration, and presentation using Java and extensive expertise in Big Data and Hadoop ecosystems. Proficient in tools such as Hive, Pig, Flume, Sqoop, Zookeeper, Spark, Kafka, Snowflake, Python, HUDI, CDC, and AWS, successfully implemented numerous big data projects on platforms like Cloudera Horton Works and AWS. Has a proven track record of managing and optimizing Hadoop clusters, implementing GCP DLP policies, and developing ETL pipelines with Spark and Scala. His skills extend to cloud computing, including GCP and AWS, and NoSQL databases such as HBase, Cassandra, and MongoDB. Skilled in implementing HUDI for efficient data ingestion and real-time processing, ensuring data consistency and integrity across large-scale data environments. Proficient in designing and optimizing HUDI workflows to manage incremental data updates and upsert operations seamlessly within Hadoop clusters.Excels in collaborative environments, leveraging his extensive technical knowledge to deliver robust and scalable data solutions.
-
Senior Big Data EngineerUpsUnited States -
Senior Data EngineerUps May 2022 - PresentMaryland, United StatesDesigned and implemented scalable big data pipelines using AWS services like S3, Redshift, EMR, and Athena for real-time logistics analytics. Integrated Palantir with AWS Data Lake solutions to enable advanced analytics and real-time insights for decision-making. Designed and developed workflows in Palantir Foundry to optimize data integration and transformation processes for enterprise-wide reporting. Developed real-time data pipelines in Azure Databricks for Workday and PeopleSoft integration, transforming HR and payroll data into analytics-ready formats. Designed and implemented an Enterprise Data Lake for diverse analytics, processing, storage, and reporting needs, handling large, dynamic datasets. Ensured high-quality reference data through cleaning, transformation, and integrity operations in collaboration with stakeholders and solution architects. Ingested CDC data using HUDI, efficiently managing inserts, updates, and deletes. Utilized tools like EMR for transforming and moving large datasets. Automated data cataloging and ETL jobs, enhancing efficiency and reliability. Integrated Talend with Hadoop, Hive, Spark, Pyspark, and MySQL for seamless data processing. Leveraged Spark SQL for ETL processes using Scala and Python, and conducted unit, integration, and web application testing with Pytest. Developed reusable ETL frameworks for RDBMS to Data Lake transitions. Migrated and maintained databases, converting Oracle and MS SQL Server databases to PostgreSQL and MySQL. Built a multi-terabyte Data Warehouse infrastructure and monitored performance, setting up alerts for system outages and developing ETL job schedules with Matillion ETL package. -
Senior Data EngineerAmerican Express Jul 2019 - Apr 2022Columbus, Ohio Metropolitan AreaDesigned and built a multi-terabyte Data Warehouse infrastructure for large-scale data handling, managing millions of records daily. Built comprehensive dashboards in Palantir to track key metrics and KPIs for financial data analysis and reporting. Designed and built multi-terabyte Data Warehouse infrastructure on Redshift, incorporating data from Workday and PeopleSoft systems. Established and managed Snowflake architecture, including databases, schemas, and warehouses, to support diverse data requirements. Utilized data cataloging tools for efficient data retrieval and executed SQL queries for analysis. Scheduled, tested, and debugged ETL components using DataStage, and wrote reusable mapplets and Oracle PL/SQL stored procedures. Developed and managed ETL jobs to enhance data warehousing capabilities. Managed ETL pipelines and CDC processes to capture and process real-time data changes, ensuring timely updates. Implemented solutions for automated operational processes and developed SOAP and REST web services. Applied data warehousing concepts in staging tables using advanced ETL tools. Integrated HUDI for efficient processing and incremental data updates, ensuring data consistency and accuracy. Used CDC mechanisms to maintain data freshness by processing updates and inserts promptly. -
Data EngineerFragma Data Systems Dec 2017 - Jun 2018Hyderabad, Telangana, IndiaDesigned and developed Hadoop-based Bigdata analytic solutions and engaged clients in technical discussions. Implementing real-time analytics to derive insights from streaming data using Azure Stream Analytics, based on CDC feeds. Worked on multiple Azure platforms like Azure Data Factory, Azure Synapse, Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight. Worked on the creation and implementation of custom Hadoop applications in the Azure environment. Created ADF Pipelines to load data from an on-prem to Azure SQL Server database and Azure Data Lake storage. Developed complicated Hive queries to extract data from various sources (Data Lake) and to store it in HDFS. Used Azure Data Lake Analytics, HDInsight/Databricks to generate Ad Hoc analysis. Developed custom ETL solutions, batch processing, and real-time data ingestion pipeline to move data in and out of Hadoop using PySpark and shell scripting. Data Ingestion to at least one Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Worked on all aspects of data mining, data collection, data cleaning, model development, data validation, and data visualization. Experienced in managing Azure Data Lake Storage (ADLS), Databricks Delta Lake and an understanding of how to integrate with other Azure Services. -
Aws Python DeveloperKgtiger Mar 2016 - Nov 2017Hyderabad, Telangana, IndiaOrchestrated end-to-end deployment of web applications on AWS, optimizing efficiency and leveraging S3 buckets. Implemented AWS CLI Auto Scaling and CloudWatch Monitoring, enhancing system performance. Implementing CDC to capture changes in customer behavior or market trends in real-time. This involves integrating CDC with Apache Kafka or similar streaming platforms for continuous data ingestion and processing. AWS Glue's Data Catalog helps in organizing metadata, making data discoverable and queryable for analytics and reporting purposes. AWS Glue would have been used to automate the extraction, transformation, and loading of data from various sources into AWS data lakes or data warehouses like Amazon Redshift. Automated continuous integration with Git, Jenkins, and custom Python and Bash tools. Developed server-side modules deployed on AWS Compute Cloud, utilizing languages such as Java, PHP, Node.js, and Python. Utilized AWS Lambda for DynamoDB Auto Scaling and implemented a robust Data Access Layer. Integrating HUDI with AWS service such as S3 for data storage, EMR for processing, and Redshift for data warehousing, ensuring data consistency and accuracy. Automated nightly builds with Python, reducing pipeline failure efforts by 70%. Employed AWS SNS for automated email notifications and messages post nightly runs. Developed tools for AWS server provisioning, application deployment, and basic failover among regions. Apache Hudi enables real-time or near real-time data integration and processing. This is essential for applications requiring timely data updates and analytics. -
Big Data EngineerEpam Systems Jun 2013 - Feb 2016Hyderabad, Telangana, IndiaProvided recommendations for transitioning to Hadoop with MapReduce, Hive, Sqoop, Flume, and Pig Latin. Developed Spark applications for data validation, cleansing, and custom aggregations, importing data into Spark RDDs for processing. Managed cluster operations like node commissioning/decommissioning and high availability. Imported/exported data using Flume and analyzed it with Hive and Pig. Setup and benchmarked Hadoop/HBase clusters, including on Amazon EC2. Developed applications in various Hadoop technologies and integrated Hive with HBase and Sqoop. Transformed relational databases to HDFS and HBase tables with Sqoop. Integrated Talend and SSIS with Hadoop for ETL operations and installed various Hadoop ecosystem components like Hive, Pig, Flume, Sqoop, and Oozie. Utilized Flume for log data collection and aggregation.
Frequently Asked Questions about Karthik P
What company does Karthik P work for?
Karthik P works for Ups
What is Karthik P's role at the current company?
Karthik P's current role is Senior Big Data Engineer.
Who are Karthik P's colleagues?
Karthik P's colleagues are Jenni Timonsson, Mikko Mankin, Reginald Portis, Zhao Melody, Donna Cummings, Scott Wicker, Lynn Khambong.
Not the Karthik P you were looking for?
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial