Sumanth R Email and Phone Number

Sr.GCP Data Engineer and Big Data Engineer at @ Mr. Cooper

Frisco, TX, US

Sumanth R's Location

United States, United States

About Sumanth R

Sumanth R is a Sr.GCP Data Engineer and Big Data Engineer at Mr. Cooper.

Sumanth R's Current Company Details

Mr. Cooper

View

Sr.GCP Data Engineer and Big Data Engineer

Frisco, TX, US

Sumanth R Work Experience Details

Sr.Gcp Data Engineer And Big Data Engineer

Mr. Cooper

Frisco, Tx, Us

View
Senior Data Engineer

Fifth Third Bank Mar 2023 - Present

Cincinnati, Ohio, Us

•Development and implementation of a robust ETL pipeline using PySpark to process vast volumes of banking transaction data from multiple sources, ensuring data quality, integrity, and regulatory compliance.•Designed and optimized data models in Snowflake to accommodate complex banking data structures, enabling efficient storage, retrieval, and analysis of customer transactions, balances, and financial products.•Utilized PySpark to perform advanced analytics and predictive modeling on banking datasets, enabling insights into customer behavior, risk assessment, and fraud detection, contributing to data-driven decision-making processes.•Developed custom PySpark functions and libraries for feature engineering, anomaly detection, and pattern recognition in banking data, enhancing the accuracy and effectiveness of predictive models and analytical workflows.•Collaborated with stakeholders including business analysts, data scientists, and regulatory experts to understand banking requirements, translate them into technical specifications, and deliver scalable and compliant data solutions.•Designed and implemented disaster recovery and backup strategies for banking data stored in Snowflake , ensuring business continuity and compliance with industry regulations and internal policies.

View
Senior Data Engineer

Mr. Cooper Feb 2022 - Feb 2023

Dallas, Tx, Us

 Responsible for ingesting large volumes of marketing data and Adobe Analytics Clickstream data from different channels directly in GCS backed data lake. Responsible for ingesting user profile information and account details from internal Datawarehouse intoGCS Data Lake. Designed robust, reusable, and scalable data driven solutions and data pipeline using Dataflow frameworks to automate the ingestion, processing and delivery of both structured and semi structured batch and real time data streaming data. Deployed Dataproc cluster to support spark framework functionality. Developed Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume. Used PySpark to Ingest large volumes of customer profile data and clickstream data from various data sources. Worked on Kafka and google Pub/Sub to collect into GCS and process through Dataproc Clusters Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files, worked on Dataproc . Worked on troubleshooting spark application to make them more error tolerant. Built Data Flow pipeline to ingest data from source systems and apply Scala, PySpark transformations using Dataproc clusters  Worked on fine-tuning spark applications to improve the overall processing time for the pipelines. Wrote Kafka producers to stream the data from external rest APIS to Kafka topics. Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to Big Query. Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations, and other capabilities. Developed Spark applications utilizing Data frames and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.

View
Senior Data Engineer

Merck Nov 2019 - Jan 2022

Rahway, New Jersey, Us

 Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena. Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services. Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams. Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud. Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Worked extensively on fine tuning python, Scala and spark applications and providing production support to various pipelines running in production. Imported Metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud. Developed interactive shell & python scripts to schedule data cleansing and data loading process. Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data. Expertise in optimizing EC2 performance for data-intensive workloads, including tuning networking, storage, and compute resources Set up Snowflake connections through private link from AWS EC2 and AWS EMR to secure data transfers between application and database. Experience building data pipelines using AWS Lambda to process and transform large volumes of data in real-time. Configured AWS RDS/Redshift to use Hadoop Ecosystem on AWS infrastructure. Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines. Experience building automated deployment pipelines for AWS Lambda functions using tools like AWS CloudFormation Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis /consumption.

View
Data Engineer

Molina Healthcare Jun 2017 - Oct 2019

Long Beach, California, Us

 Involved in analyzing system failures, identifying root causes, and recommended course of actions. Wrote the python & shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions. Managed & scheduled Jobs on a Hadoop cluster and designed data warehouse using Hive. Created ETL guidelines document which involves coding standards, naming conventions for development and production support log and root cause analysis documents for troubleshooting DataStage jobs. Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS. Developed the Pig UDF’S to pre-process the data for analysis. Develop Hive queries for the analysts. Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig. Cluster co-ordination services through Zookeeper. Collected the logs data from web servers and integrated in to HDFS using Flume. Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users. Responsible to manage data coming from different sources and involved in loading data from UNIX file system to HDFS. Flexible with full implementation of spark jobs with PySpark API and Spark Scala API. Used Apache Kafka for large-scale data processing, handling real-time analytics and real streaming of data. Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.

View
Software Developer

Avon Technologies (I) Private Ltd. Apr 2014 - Mar 2017

Hyderabad, Andhra Pradesh, In

• Developed and maintained robust web applications ensuring the secure and efficient handling of pharmaceutical data.• Utilized Visual Studio for integrated development, debugging, and testing, ensuring the timely delivery of high-quality software solutions.• Designed visually appealing user interfaces using MS Expressions 2008 TS, enhancing the user experience for pharmaceutical professionals.• Applied expertise to architect and develop innovative solutions tailored to the unique needs of the pharmaceutical industry.• Implemented both MVC (Model-View-Controller) and Web Forms design patterns, aligning development with project requirements and industry standards.• Enhanced web application interactivity by implementing asynchronous request handling using AJAX, improving the efficiency of pharmaceutical data processing.• Developed and maintained XML-based solutions, incorporating XML and XSLT for effective data transformation and presentation within the pharmaceutical domain.

View

Frequently Asked Questions about Sumanth R

What company does Sumanth R work for?

Sumanth R works for Mr. Cooper

What is Sumanth R's role at the current company?

Sumanth R's current role is Sr.GCP Data Engineer and Big Data Engineer.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles

Get direct phone numbers & mobile contacts

Access company data & employee information

Works directly on LinkedIn - no copy/paste needed

Get Chrome Extension - Free

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.