Harsha R Email and Phone Number

cupertino, california, united states

Harsha R's Location

United States, United States

About Harsha R

As a Senior Data Engineer with 9 plus expertise in both Analyst and Engineer roles, I am passionate about leveraging data to drive meaningful insights, through My proficiency in SQL, Python, While data visualization tools allows me to uncover trends, patterns, and correlations, empowering data-driven decision-making.In both roles, I am committed to data accuracy, quality, and security.With a strong blend of analytical and technical skills, I am poised to make a significant impact in any data-driven organization. I am excited to contribute my experience in both Data Analysis and Data Engineering to unlock the full potential of data and drive success.

Harsha R's Current Company Details

Apple

View

cupertino, california, united states

Website:: apple.com
Employees:: 218112

Harsha R Work Experience Details

Sr. Data Engineer

Apple Oct 2021 - Present

Austin, Texas, United States

Involved in several client meetings to understand the migration requirements and other requirements. Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra. Developed Spark scripts by using Scala shell commands as per the requirement. Created Pipelines in ADF using Linked Services/ Datasets/ Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Developed Azure Databricks Notebooks to apply the business transformations and perform data cleansing operations. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive. Optimized the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's. Created logical and physical data models of existing and new Data solutions for developers and other users to understand data structure. Developed complex data pipelines using Azure Databricks and Azure Data Factory (ADF) to create a consolidated and connected data lake environment. Developed Databricks Python Notebooks to Join, filter, pre-aggregate, and process the files stored in Azure Data Lake Storage. Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark. Developed Python code to gather the data from HBase and designs the solution to implement using PySpark. Developed an automated process in Azure cloud that can ingest data daily from web service and load in into Azure SQL DB.

View
Data Engineer

Kroger Oct 2019 - Apr 2021

Atlanta, Georgia, United States

Responsible for requirement analysis, design, coding, and implementation phases of the project. Consumed the data from Kafka using Apache Spark. Loaded the data into Spark RDD and do in-memory data Computation to generate the Output response. Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data. Developed Streaming pipelines using Azure Event Hubs and Stream Analytics to analyze data. Developed Scala scripts, and UDFFs using both Data frames/ SQL/ Data sets and RDD/ MapReduce in Spark for Data Aggregation, queries and writing data back into the OLTP system through Sqoop. Set up and maintained the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, and Azure SQL Data warehouse. Worked on AWS Data pipeline to configure data loads from S3 to Redshift. Extracted, transformed and loaded data from various heterogeneous data sources and destinations using AWS Redshift. Migrated data from on-premises to AWS storage buckets. Developed Python scripts to hit REST APIs and extract data to AWS S3. Worked with Apache Spark ecosystem such as Spark, Spark Streaming, Spark RDD and Spark SQL using Scala and Python. Selected and generated data into CSV files and stored them in AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift. Responsible for Data Cleansing, Data Wrangling activities using Python Pandas and NumPy. Developed metadata-based scalable frameworks in Azure Databricks to minimize architectural complexity. Responsible for creating Hive tables, and loading and analyzing data using Hive queries.

View
Data Engineer

Comcast Sep 2017 - Oct 2019

Philadelphia, Pennsylvania, United States

Extensively used Agile Methodology as Organization Standard to implement the data models. Worked on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop.Worked on Hadoop components such as HDFS, Yarn, Resource Manager, Node Manager, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce. Used Impala to read, write and query the data in HDFS. Designed and developed Security Framework to provide fine-grained access to objects in AWS S3 using AWS Lambda and DynamoDB. Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, and S3. Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Implemented Data Quality framework using AWS Athena, Snowflake, Airflow and Python. Written automated HBase test cases for data quality checks using HBase command line tools. Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data. Automated the complex workflows using the Apache Airflow workflow handler. Responsible for loading bulk amounts of data in HBase using MapReduce by directly creating H-files and loading them. Involved in HDFS maintenance and administering it through Hadoop-Python

View
Big Data Engineer

Elevance Health Jan 2016 - Sep 2017

Virginia, United States

Designed and developed various modules in the Hadoop Big Data platform and processed data using MapReduce, Hive, Sqoop, Kafka and Oozie. Developed real-time data processing applications by using Scala and Python. Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce. Developed MapReduce programs to parse the raw data, populate staging tables and store the refined detain partitioned tables in the EDW. Developed Python scripts to update content in the database and manipulate files. Designed and developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift. Responsible for Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark. Analyzed, designed, and build Modern data solutions using AWS IA as a service to support the visualization of data. Worked on AWS compute services such as EC2, IAM, Elastic MapReduce (EMR), EBS and accessing Instance metadata. Handled importing data from AWS S3 to HDFS, and performed transformation and action functions using Spark to get the desired output. Implemented AWS Lambdas to drive real-time monitoring dashboards from system logs. Worked on Azure web application, App services, Azure storage, Azure SQL Database, Virtual machines, Fabric controller, Azure AD, Azure search, and notification hub. Created HBase tables and column families to store the user event data. Wrote complex MapReduce programs. Developed Hive scripts in Hive QL to de-normalize and aggregate the data.

View
Hadoop Developer

First Source Mar 2013 - Aug 2015

Kolkata, West Bengal, India

Involved in Requirements Analysis and design an Object-oriented domain model. Worked extensively with Sqoop for importing metadata from Oracle. Developed Hive queries to process the data and generate the data cubes for visualization. Implemented schema extraction for Parquet and Avro file Formats in Hive. Developed Scripts and Batch Jobs to schedule a bundle (group of coordinators) that consists of various Hadoop Programs using Oozie. Developed MapReduce programs to clean and aggregate the data. Worked on optimizing Hive queries, joins to handle different data sets. Involved in ETL, Data Integration and Migration by writing Pig scripts. Integrated Hadoop with Solr and implemented search algorithms. Created Hive tables, loaded with data and wrote Hive queries that will run internally in MapReduce way. Developed Shell, and Python scripts to automate and provide Control flow to Pig scripts. Worked in creating HBase tables to load large sets of semi-structured data coming from various sources. Designed and implemented HBase and associated RESTful web service. Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig. Responsible for Cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files. Worked on Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.

View

Harsha R Education Details

Krishna University, Machhlipattanam

Bachelors

View

Frequently Asked Questions about Harsha R

What company does Harsha R work for?

Harsha R works for Apple

What is Harsha R's role at the current company?

What schools did Harsha R attend?

Harsha R attended Krishna University, Machhlipattanam.

Who are Harsha R's colleagues?

Harsha R's colleagues are Matthew East, Nurul Anggi, Ivan Filinskyy, John O’connor, Agnese R., Pankaj Deshpande, 肖景钊.

Not the Harsha R you were looking for?

Harsha R

Fremont, Ca

View
Harsha R.

Senior Full Stack .Net Developer With 8+ Yoe, Actively Seeking New Roles In C2C And Cth Roles. Let'S Connect And Explore Synergies For Mutual Growth And Success!

United States

View
Harsha R

Data Analyst| Aws Certified Solutions Architect| Google Certified Data Analytics Professional Certificate| Aws| Python| Etl| Tableau| Jenkins| |Shell Scripting

Exton, Pa

View
Harsha R

Sr Front End Developer

Ashburn, Va

View

View similar profiles

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles

Get direct phone numbers & mobile contacts

Access company data & employee information

Works directly on LinkedIn - no copy/paste needed

Get Chrome Extension - Free

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.