Eswar Kumar is a Sr. Data Engineer at MSA.
-
Sr. Data EngineerMsa May 2022 - PresentMedina, Medina, Sa• Involved in developing Shell and Python scripts to automate in the big data environment. • Analyzed large and critical datasets using Cloudera, HDFS, MapReduce, Hive, Hive UDF, Pig, Sqoop and Spark.• Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.• Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.• Developed Dashboard reports on Tableau.• Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.• Implemented a batch process to load the heavy volume data loading using Apache Dataflow framework using Nifi in Agile development methodology.• Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, Zookeeper and Mongo DB using Python. • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS.• Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR. -
Sr. Data EngineerMetlife Mar 2021 - Apr 2022New York, Ny, Us• Design and Develop ETL Processes in AWS Glue to migrate accidents data from external sources like S3, Text Files into AWS Redshift.• Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation.• Created service accounts using Terraform with the respective roles to supporting the services deployed for managing the GCP Tech Stack.• Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data warehouse.• Migrating an entire oracle database to BigQuery and using of power bi for reporting.• Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.• Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.• Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, Parquet/Text Files into AWS Redshift.• Extensive Knowledge and hands-on experience implementing PaaS, IaaS, SaaS style delivery models inside the Enterprise (Data center) and in Public Clouds using like AWS, Google Cloud, Apache spark and Kubernetes etc.• Transform and Load, Designed, developed and validated and deployed the Talend ETL processes for the Data Warehouse team using PIG, Hive.• Applied required transformation using AWS Glue and loaded data back to Redshift and S3.• Experience in analyzing and writing SQL queries to extract the data in Json format through Rest API calls with API Keys, ADMIN Keys and Query Keys and load the data into Data warehouse.• Designed and implemented ETL pipelines between from various Relational Data Bases to the Data Warehouse using Apache Airflow.• Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python. • Worked on Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark. -
Sr. Data EngineerPaypal Jan 2020 - Feb 2021San Jose, Ca, Us• Implemented Responsible AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, and Auto scaling groups, Optimized volumes and EC2 instances. • Wrote Terraform templates for AWS Infrastructure as a code to build staging, production environments & set up build & automations for Jenkins. • Configured Elastic Load Balancers (ELB) with EC2 Auto scaling groups. • Created Amazon VPC to create public-facing subnet for web servers with internet access, and backend databases & application servers in a private-facing subnet with no Internet access. • Created AWS Launch configurations based on customized AMI and use this launch configuration to configure auto scaling groups. • Utilized Puppet for configuration management of hosted Instances within AWS Configuring and Networking of Virtual Private Cloud (VPC). • Utilized S3 bucket and Glacier for storage and backup on AWS. • Using Amazon Identity Access Management (IAM) tool created groups & permissions for users to work collaboratively. • Implemented /setup continuous project build and deployment delivery process using Subversion, Git, Jenkins, IIS, Tomcat. • Connected continuous integration system with GIT version control repository and continually build as the check-in's come from the developer. • Knowledge in build tools Ant and Maven and writing build.xml and pom.xml respectively. • Knowledge in authoring pom.xml files, performing releases with the Maven release plug-in and managing Maven repositories. Implemented Maven builds to automate JAR and WAR files. • Designed and built deployment using ANT/ Shell scripting and automate the overall process using git and MAVEN. • Implemented a Continuous Delivery frameworks using Jenkins, Ansible/puppet, and Maven & Nexus in Linux environment. • Wrote Terraform, Cloud formation templates for AWS Infrastructure as a code to build staging, production environments & set up build & automations for Jenkins. -
Data EngineerAbbott May 2017 - Oct 2019Abbott Park, Illinois, Us• Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics. Data Ingestion to one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.• Designed and Implemented cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs and designing tabular models in Azure analysis services. Have extensive knowledge in creating pipeline jobs, and schedule triggers using Azure data factory.• Develop modern data solutions by analyzing, designing, and constructing them with the Azure PaaS service.• Used various GCP components such as Dataflow with python SDK, DataProc, BigQuery, Composer (Airflow), Gsuite for impersonation of the service accounts, Cloud IAM, Cloud Pub/Sub, Cloud functions for handling functions as service requests, Cloud data fusion, cloud GCS, Cloud data catalogue.• Worked on implementing Data Lake in Google Big Query, Google Cloud Storage, SQL Scripts to load data to BigQuery, and Composer for running the Talend and Query Scripts.• Worked with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Dataframe API, Spark Streaming, and MLlib, and worked explicitly on PySpark.• Responsible for estimating the cluster size, monitoring, and troubleshooting the Spark Databricks cluster.• Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, and Validation and verified its performance over MR jobs.• Installed and configured Hadoop Map Reduce, Hive, HDFS, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.• Worked with Hadoop and developed Oozie workflow for automation and storage optimization. -
Data EngineerGebbs Healthcare Solutions Jul 2015 - Apr 2017Los Angeles, Ca, Us• Developed data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations, used Pandas, NumPy, Spark in Python for developing Data Pipelines, also for performing Data Cleaning, features scaling, features engineering.• Hands-on experience with Snowflake utilities, Snow SQL, SnowPipe, Big Data model techniques using Python and Java and Data Integrity checks have been handled using Hive queries, Hadoop, and Spark.• Developed ETL Pipelines in and out of data warehouse, developed major reports using advanced SQL queries in Snowflake.• Built ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL writing SQL queries against Snowflake.• Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.• Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.• Replaced existing MapReduce jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.• Developed workflow in Oozie to automate the tasks of loading the data into Nifi and pre-processing with Pig. • Worked on Apache NIFI to decompress and move JSON files from local to HDFS. • Worked on Hortonworks-HDP distribution.• Hands on experience working AWS EMR and run spark jobs on EMR clusters. Developed the Spark code for AWS Glue jobs.• Supported Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances. -
Etl DeveloperMsn Laboratories Sep 2013 - Jun 2015Hyderabad, Telangana State, In• Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.• Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.• Designing the schema and created DDL Scripts for ETL Metadata tables to store the run time metrics of the Datastage Jobs.• Used Informatica Power center to load data from Flat files to DB2, XML Files to DB2, Flat files to Oracle.• Implemented Hive Partitioning and Bucketing on the collected data in HDFS.• Experienced with Spark Context, Spark-SQL, Data Frame, Datasets, Spark YARN.• Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.• Designed the ETL processes using Informatica to extract, transform and load data from multiple input sources like oracle, SQL server to target Oracle DB.• Experience with writing Pig scripts to analyze & process large datasets and run scripts on Hadoop cluster.• Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.• Written Hive queries for data analysis to meet the business requirements.• Experience in running Hadoop streaming jobs to process large amount of data comprised in different formats. • Created Tableau dashboards/reports for data visualization, Reporting and Analysis and presented it to Business.• Design and develop spark job with Scala to implement end to end data pipeline for batch processing• Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards.
Eswar Kumar Education Details
-
Kakatiya Institute Of Technology & Science, Yerragattu Hillocks, Bheemaram, Hasanparthy, WarangalComputer Science
Frequently Asked Questions about Eswar Kumar
What company does Eswar Kumar work for?
Eswar Kumar works for Msa
What is Eswar Kumar's role at the current company?
Eswar Kumar's current role is Sr. Data Engineer.
What schools did Eswar Kumar attend?
Eswar Kumar attended Kakatiya Institute Of Technology & Science, Yerragattu Hillocks, Bheemaram, Hasanparthy, Warangal.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial