Harshith R

Harshith R Email and Phone Number

Data Engineer at Humana | Actively Seeking C2C/C2H Positions | Data Engineer I Big Data | Python | SQL | AWS | Hadoop | Azure | PySpark | Kafka | Yarn HDFS | Scala | ETL | Informatica I Tableau | PowerBI | Airflow @ Capital One
Plano, Texas, United States
Harshith R's Location
Dallas, Texas, United States, United States
About Harshith R

10+ Years experienced Data Engineer with a demonstrated history of working in the automotive industry Skilled in Apache Beam, Big Data, Amazon Web Services (AWS), Azure, Python (Programming Language), and Data Engineering. Strong information technology professional with a Bachelor's degree focused in Computer Science from Sreenidhi Institute of Science and Technology.

Harshith R's Current Company Details
Capital One

Capital One

View
Data Engineer at Humana | Actively Seeking C2C/C2H Positions | Data Engineer I Big Data | Python | SQL | AWS | Hadoop | Azure | PySpark | Kafka | Yarn HDFS | Scala | ETL | Informatica I Tableau | PowerBI | Airflow
Plano, Texas, United States
Website:
capitalone.com
Employees:
63917
Harshith R Work Experience Details
  • Capital One
    Capital One
    Plano, Texas, United States
  • Humana
    Data Engineer
    Humana Sep 2023 - Present
    Louisville, Kentucky, Us
    •Day-to-day responsibilities involve analyzing existing data structures, formats, and dependencies in the Hadoop environment. Designed and implemented a comprehensive migration strategy for transferring data to Data Bricks on Azure. •Bulk loading and unloading from and to external stages (Azure Data Lake) using snowflake using COPY command and file formats in Snowflake. •Created many PySpark and Spark SQL scripts in synapse notebooks for performing data transformations as per business requirements. •Designed and developed the scripts for data ingestion using PySpark and Spark SQL in Azure Databricks and orchestrated them by using the data factory pipelines. •Developed an Azure Function using Python to automate the download of external source data from a third-party API and stored the data in Azure Blob Storage, resulting in a significant reduction in manual effort and increased data accuracy. •Migration of on-premise data (SQL Database/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2). •Implemented Jenkins pipelines into Azure pipelines to drive all microservice builds out to the Docker registry and then deployed to Kubernetes, Created Pods, and managed using Azure Kubernetes Service(AKS). •Created data bricks notebooks with delta format tables and implemented lake house architecture. Also created generic notebooks for a common set of activities to reduce code redundancy and improve re-usability. •Worked on Databricks design and development of Databricks pipelines, including developing Delta Shares and full Unity Catalog implementation (manage environment, access, security) •Worked on a direct query using PowerBI to compare legacy data with the current data and generated reports and stored on dashboards. Environment: Python, Spark, Pyspark, SQL, NoSQL, Snowflake, Databricks, Hadoop, Azure Data Lake, Databricks, Data Factory, Azure Synapse, PowerBI
  • Capital One
    Data Engineer
    Capital One Jun 2022 - Jun 2023
    Mclean, Va, Us
    • Day-to-day responsibility includes migrating the data from legacy locations to cloud and perform transformation per data model changes between the databases.• Collaborated with cross functional teams to design, develop, test, implement and support technical solutions utilizing the programming languages like Java, Scala, Python and Open RDBMS and NoSQL databases and Cloud based data warehousing services such as Redshift and Snowflake.• Developed ETL pipelines in and out of data warehouse using combination of python and Snowflake SnowSQL writing SQL queries against Snowflake• Developed migration tool using python with selenium framework to migrate large level enterprise datasets. Later developed the migration app by integrating the APIs, AWS Lambda functions and leveraged Step functions for end-to-end automation. • Used Lambda functions and Step Functions to trigger Glue jobs and orchestrate the data pipeline.• Wrote AWS Lambda functions in python for AWS’s Lambda which invokes python scripts to perform various transformations and analytics on large datasets in EMR clusters. • Created different AWS Lambda functions and API gateways, to submit data via API Gateway that is accessible via Lambda function.• Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.• Worked on developing APIs to convert various batch files into streaming datasets(Kafka).• Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.• Created reports with complex calculations using Tableau and Quick Sight dashboards to perform data analysis and data validation.• Created data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Quick Sight.Environment: Python, Java, Scala, Spark, Shell scripting, Kafka, SQL, NoSQL, Snowflake, AWS EMR, EC2, AWS S3, AWS Redshift, Redshift Spectrum, RDS, Lambda, AWS Glue
  • Cummins Inc.
    Data Engineer
    Cummins Inc. May 2020 - May 2022
    Columbus, Indiana, Us
    •Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake.•Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python & SnowSQL.•Involved in creating master UNIX shell scripts to call DataStage jobs, move files around using FTP and various Unix commands, validate the record counts in the incoming external files.•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.•Created DAGs to run the Airflow and designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines.•Implement Spark Kafka streaming to pick up the data from Kafka and send to Spark pipeline. •Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. •Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services •Migration of on-premise data (SQL Database/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2). •Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.•Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.•Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.Environment: Cloudera, Hadoop, Pig, Hive, Informatica, HBase, MapReduce, Azure, HDFS, Sqoop, Impala, SQL, Azure Data Lake, Data Factory, Tableau, Python, SAS, Flume, Oozie, Linux.
  • Quotient
    Data Engineer
    Quotient Mar 2018 - Apr 2020
    Columbia, Md, Us
    •Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.•Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.•Implemented Proof of concepts for SOAP & REST APIs, REST APIs to retrieve analytics data from different data feeds.•Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool gpd backwards.•Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage patterns.•Created data bricks notebooks using Python(PySpark), Scala and Spark SQL for transforming the data that is stored in Azure Data Lake stored Gen2 from Raw to Stage and Curated zones.•Built numerous technology demonstrators using Confidential Edison Arduino shield using Azure EventHub and Stream Analytics, integrated with PowerBI and Azure ML to demonstrate the capabilities of Azure Stream Analytics.•Implemented Synapse Integration with Azure Databricks notebooks which reduce 80% of development word and also achieved 90% performance improvement on synapse. •Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the Sql Activity.Environment: Python, Pyspark, Scala, Azure Databricks, Azure Data Lake, Azure SQL DB, Azure SQL DW, Azure Synapse Sqoop, Impala, SQL, Power Bi, Python, SAS, , Linux.
  • Albertson
    Data Engineer
    Albertson May 2016 - Feb 2018
    London, Gb
    • Employed Amazon Kinesis to Stream, Analyze and Process real-time Logs from Apache application server and Amazon Kinesis Firehose to store the Processed Log Files in Amazon S3 Bucket .• Implemented data ingestion and handling clusters in real time processing using Kafka• Experience in working with Flume and NiFi for loading log files into Hadoop.• Involved in migrating existing traditional ETL jobs to Spark and Hive Jobs on new cloud data lake. • Wrote complex spark applications for performing various de-normalization of the datasets and creating a unified data analytics layer for downstream teams. • As part of Data Lake team, involved in ingesting 207 source systems which include databases (DB2, MySQL, oracle), flat files, mainframe files, xml files into the Data Lake Hadoop environment which are later explored by reporting tools. • Primarily responsible for fine-tuning long-running spark applications, writing custom spark udfs, troubleshooting failures, etc., • Involved in building a real-time pipeline using Kafka and Spark streaming for delivering event messages to downstream application team from an external rest-based application. • Involved in creating Hive scripts for performing ad-hoc data analysis required by the business teams. • Creating Databricks notebooks using SQL, Python and automated notebooks using jobs. • Creating Spark clusters and configuring high concurrency clusters using Databricks to speed up the preparation of high-quality data. • Design, Develop and test dimensional data models using star and Snowflake schema methodologies under the Kimball method.• Used broadcast variables in spark, effective & efficient Joins, caching and other capabilities for data processing. • Involved in continuous Integration of application using Jenkins. Environment: AWS EMR, Spark, HiveSQL, HDFS, Sqoop, Kafka, Impala, Oozie, HBase, Pyspark, Scala, DataBricks, Flume, NiFi, Snowflake
  • Ceequence Technologies Pvt Ltd
    Spark Developer
    Ceequence Technologies Pvt Ltd Apr 2014 - Jan 2016
    Guindy, Tamil Nadu, In
    • Imported required modules such as Keras and NumPy on Spark session, also created directories for data and output. • Read train and test data into the data directory as well as into Spark variables for easy access and proceeded to train the data based on a sample submission. • The images upon being displayed are represented as NumPy arrays, for easier data manipulation all the images are stored as NumPy arrays. • Created a validation set using Keras2DML in order to test whether the trained model was working as intended or not. • Defined multiple helper functions that are used while running the neural network in session. Also defined placeholders and number of neurons in each layer. • Created neural networks computational graph after defining weights and biases. • Created a TensorFlow session which is used to run the neural network as well as validate the accuracy of the model on the validation set. • After executing the program and achieving acceptable validation accuracy a submission was created that is stored in the submission directory. • Executed multiple SparkSQL queries after forming the Database to gather specific data corresponding to an image. Environment: Scala, Python, PySpark, Spark, Spark ML Lib, Spark SQL, TensorFlow, NumPy, Keras, PowerBI.
  • Brio Technologies Private Limited
    Data Analyst
    Brio Technologies Private Limited Oct 2013 - Mar 2014
    Hyderabad, Telangana, In
    • Involved in designing physical and logical data model using ERwin Data modeling tool. • Designed the relational data model for operational data store and staging areas, Designed Dimension & • Fact table’s fordata marts. • Extensively used ERwin data modeler to design Logical/Physical Data Models, relational database design. • Created Stored Procedures, Database Triggers, Functions and Packages to manipulate the database and • to apply the business logic according to the user's specifications. • Created Triggers, Views, Synonyms and Roles to maintain integrity plan and database security. • Creation of database links to connect to the other server and Access the required info. • Integrity constraints, database triggers and indexes were planned and created to maintain data integrity • and to facilitate better performance. • Used Advanced Querying for exchanging messages and communicating between different modules. • System analysis and design for enhancements Testing Forms, Reports and User Interaction. Environment: Oracle 9i, SQL* Plus, PL/SQL, ERwin, TOAD, Stored Procedures.

Harshith R Education Details

  • Sreenidhi Institute Of Science And Technology
    Sreenidhi Institute Of Science And Technology
    Computer Science

Frequently Asked Questions about Harshith R

What company does Harshith R work for?

Harshith R works for Capital One

What is Harshith R's role at the current company?

Harshith R's current role is Data Engineer at Humana | Actively Seeking C2C/C2H Positions | Data Engineer I Big Data | Python | SQL | AWS | Hadoop | Azure | PySpark | Kafka | Yarn HDFS | Scala | ETL | Informatica I Tableau | PowerBI | Airflow.

What schools did Harshith R attend?

Harshith R attended Sreenidhi Institute Of Science And Technology.

Who are Harshith R's colleagues?

Harshith R's colleagues are Cynthia Pierce, Alexis Mayfield, Fernando Olvera Carreño, Madeleine Payne-Heneghan, Jonathan Sowers, Alan Beam, Brandon Gomez.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.