Tharun Kumar Email and Phone Number
Tharun Kumar is a Azure Data Engineer at SEI | Data Engineer | Databricks at SEI.
-
Azure Data EngineerSei Mar 2021 - PresentOaks, Pennsylvania, United States•Performed all phases of software engineering including requirements analysis, application design, and code development & testing.•Developed and maintained end-to-end operations of ETL data pipeline and worked with large data sets in azure data factory.•Increased the efficiency of data fetching by using queries for optimizing and indexing.•Wrote SQL queries using programs such as DDL, DML and indexes, triggers, views, stored procedures, functions and packages.•Worked on Azure Data Factory to integrate data of both on-prem (MYSQL, Cassandra) and cloud (Blob storage, Azure SQL DB) and applied transformations to load back to snowflake. •Deployed Data Factory for creating data pipelines to orchestrate the data into SQL database.•Developed custom activities using Azure Functions, Azure Databricks, and PowerShell scripts to perform data transformations, data cleaning, and data validation.•Working on Snowflake modelling using data warehousing techniques, data cleansing,Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.•Analytical approach to problem-solving; ability to use technology to solve business problems using Azure data factory, data lake and azure synapse.•Developed ELT/ETL pipelines to move data to and from Snowflake data store using combination of Python and Snowflake Snow SQL•Developing ETL transformations and validation using Spark-SQL/Spark Data Frames with Azure data bricks and Azure Data Factor•Developed and optimized code for Azure Functions to extract, transform, and load data from various sources, such as databases, APIs, and file systems.•Designed, built, and maintained data integration programs in a Hadoop and RDBMS •Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project •Collaborated with DevOps engineers to developed automated CI/CD and test-driven development pipeline using azure as per the client requirement. -
Azure Data EngineerStryker Oct 2018 - Feb 2021New Jersey, United States•Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.•Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks.•Perform ETL using Azure Data Bricks, Migrated on premise Oracle ETL process to azure synapse analytics.•Worked on Migrating SQL database to Azure Data Lake, Azure Data lake analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse.•Controlling and granting database access and Migrating on Premise databases to azure data lake store using Azure Data Factory.•Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development.•Developed enterprise level solution using batch processing and streaming framework (using Spark Streaming, apache Kafka.•Processed the schema oriented and non-schema-oriented data using Scala and Spark.•Created Partitions, Buckets based on State to further process using Bucket based Hive joins. •Created Hive Generic UDF's to process business logic that varies based on policy. •Imported Data using Sqoop to load Data from MySQL to HDFS on regular basis.•Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.•Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).•Load and transform large sets of structured, semi structured, and unstructured data.•Written Hive queries for data analysis to meet the Business requirements.Skills: Amazon Web Services (AWS) · Data Pipelines · R (Programming Language) · Data Analytics · SQL Server Reporting Services (SSRS) · NoSQL · SQL Server Integration Services (SSIS) · SQL Server Analysis Services (SSAS) · Microsoft Power BI · Data Quality · Data Models · Linux · PySpark · Data Analysis · Relational Databases · Big Data · Python (Programming Language) -
Big Data EngineerBosch Usa May 2016 - Sep 2018Allenhurst, New Jersey, United States•Imported Data using Sqoop to load Data from MySQL to HDFS on regular basis.•Performing aggregations on large amounts of data using Apache Spark, Scala, and landing data in Hive warehouse for further analysis.•Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).•Load and transform large sets of structured, semi structured, and unstructured data.•Written Hive queries for data analysis to meet the Business requirements.•Built HBASE tables by leveraging on HBASE Integration with HIVE on the Analytics Zone.•Hands on experience in using Kafka, Spark streaming, to process the streaming data in specific use cases.•Developed data pipeline using Flume, Sqoop to ingest customer behavioral data histories into HDFS for analysis.•Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.•Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data.•Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities.•Migrated the existing data to Hadoop from RDBMS (Oracle) using Sqoop for processing the data.•Implemented CICD pipelines to build and deploy the projects in Hadoop environment.•Using JIRA to manage the issues/project workflow.•Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data.•Developed automated testing scripts using Informatica Data Validation Option to ensure data accuracy and consistency across different systems.•Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.•Used Zookeeper to coordinate, synchronize and serialize the servers within the clusters. •Worked on Oozie workflow engine for job scheduling. -
Hadoop DeveloperLancaster Technologies Private Limited Aug 2014 - Apr 2016India•Worked on GIT to maintain source code in Git and GitHub repositories.•Prepared an ETL framework with the help of Sqoop and hive to be able to frequently bring in data from the source and make it available for consumption.•Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables.•Rigorously used Spark -Scala (RDD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector API's for various tasks (Data migration, Business report generation etc.)•Developed Spark Streaming application for real time sales analytics.•Analyzed the source data and handled efficiently by modifying the data types. Used excel sheet, flat files, CSV files to generated PowerBI ad-hoc reports.•Analyzed the SQL scripts and designed the solution to implement using PySpark.•Extracted the data from other data sources into HDFS using Sqoop.•Handled importing of data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.•Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and transformation processes.•Extracted the data from MySQL into HDFS using Sqoop.•Implemented automation for deployments by using YAML scripts for massive builds and releases•Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop.•Implemented Data classification algorithms using MapReduce design patterns.•Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs.
Frequently Asked Questions about Tharun Kumar
What company does Tharun Kumar work for?
Tharun Kumar works for Sei
What is Tharun Kumar's role at the current company?
Tharun Kumar's current role is Azure Data Engineer at SEI | Data Engineer | Databricks.
Who are Tharun Kumar's colleagues?
Tharun Kumar's colleagues are Holm Paz, Shane Mccarthy, Steven Praplaski, Leo Amorim, Devon Stein, Debbie Gibson, Will Keylor.
Not the Tharun Kumar you were looking for?
-
Tharun Kumar
Irving, Tx -
Tharun Kumar
United States -
Tharun Kumar
Founder And Cto At Ringneck | Voice Conversational Ai | Automating Customer Service IndustryUnited States -
1google.com
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial