Suprith Son Dubba

Suprith Son Dubba Email and Phone Number

Senior Data Engineer | | Python| Cloud Specialist (AWS, Azure,GCP) | ETL & Data Pipeline Development | Data Warehousing & Analytics Enthusiast |Power BI & Tableau Reporting @ Truist
charlotte, north carolina, united states
Suprith Son Dubba's Location
Phoenix, Arizona, United States, United States
About Suprith Son Dubba

Senior Data Engineer. cloud migrations (AWS, Azure, GCP), and building scalable data solutions. I specialize in designing and implementing data pipelines, ETL workflows, and data warehousing solutions using tools like Hadoop, Spark, Kafka, and Hive. My work spans across cloud platforms (AWS, Azure, GCP), where I’ve successfully migrated on-prem systems, optimized data processing, and delivered business insights through Power BI, Tableau, and SQL.I am skilled in data modeling, analytics, and cloud integration, using Agile and Scrum methodologies to lead complex projects. Passionate about leveraging data to drive efficiency and business value, I focus on delivering high-performance, scalable solutions for enterprise needs.

Suprith Son Dubba's Current Company Details
Truist

Truist

View
Senior Data Engineer | | Python| Cloud Specialist (AWS, Azure,GCP) | ETL & Data Pipeline Development | Data Warehousing & Analytics Enthusiast |Power BI & Tableau Reporting
charlotte, north carolina, united states
Website:
truist.com
Employees:
8972
Suprith Son Dubba Work Experience Details
  • Truist
    Senior Data Engineer
    Truist Jan 2022 - Present
    Charlotte, North Carolina, United States
    Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR. Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.Developed Spark Applications by using Scala Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.Created, altered, and deleted topics (Kafka Queues) when required with varying Performance tuning using partitioning, bucketing of Impala tables.Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.Created data partitions on large data sets in S3 and DDL on partitioned data.Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
  • Cardinal Health
    Senior Data Engineer
    Cardinal Health Jul 2019 - Dec 2021
    Dublin, Oh, United States
    Developed Spark scripts using Python on Azure HDInsight for data aggregation, validation, and performance verification of vehicle telemetry data, optimizing over traditional MapReduce jobs, and built pipelines to move hashed and un-hashed EV performance data from Azure Blob to Data Lake.Utilized Azure HDInsight to monitor and manage Hadoop clusters, performing advanced procedures like analyzing EV sensor data for text analytics and in-memory computing using Spark with Python, and created pipelines for transferring on-premises vehicle diagnostic data to Azure Data Lake.Worked on ingesting EV telematics and battery performance data into various Azure services, such as Azure Data Lake, Azure Storage, Azure SQL, and Azure Data Warehouse. Processed data in Azure Databricks while building complex ETL jobs to transform vehicle diagnostics and energy usage data visually using Azure Data Factory, Databricks, and Azure SQL Database.Analyzed SQL scripts, designed solutions for implementation using PySpark, and enhanced and optimized Spark scripts for tasks like aggregating vehicle efficiency data, grouping fleet performance metrics, and mining insights from EV telemetry. Loaded vehicle performance data into Spark RDD for in-memory computations to evaluate real-time driving patterns.Converted Hive/SQL queries into Spark transformations using Spark RDDs and PySpark to analyze EV battery lifecycle metrics, optimized algorithms in Hadoop using Spark Context, Spark SQL, DataFrames, and Pair RDDs for vehicle energy usage. Performed analytics on EV data using Spark API over Hadoop YARN.
  • T-Mobile
    Data Engineer
    T-Mobile Nov 2017 - Jun 2019
    Overland Park, Kansas, United States
    Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR. Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.Developed Spark Applications by using Scala Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.Created, altered, and deleted topics (Kafka Queues) when required with varying Performance tuning using partitioning, bucketing of Impala tables.Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.Created data partitions on large data sets in S3 and DDL on partitioned data.Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.Monitored and troubleshot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.Created YAML file to push the application in Pivotal Cloud Foundry. Deployed Spark application and java web services in pivotal cloud foundry.Implemented rapid-provisioning and life-cycle management for using Amazon EC2, Chef, and custom Ruby/Bash scripts.Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.Involved in configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
  • Sopan Technologies
    Data Engineer
    Sopan Technologies Jul 2015 - Aug 2017
    Hyderabad, Telangana, India
    As a Big Data Engineer at Sopan Technologies, I worked on large-scale data processing and analytics projects utilizing Hadoop and Spark technologies. My primary focus was on processing raw data at scale and building efficient ETL pipelines for real-time and batch data processing. Processed raw data at scale using the Hadoop Big Data platform, loading disparate datasets from various environments into HDFS.Developed ETL data flows using Hadoop ecosystem components such as Spark, Spark Streaming, and Spark SQL with Scala, ensuring efficient data processing.Led the development of large-scale, high-speed, and low-latency data solutions for real-time reporting, data warehousing, and long-term data storage. Improved performance and optimized existing algorithms using Spark Context, Spark-SQL, and DataFrame APIs, enhancing the efficiency of data processing.
  • Keypoint Technologies
    Data Engineer
    Keypoint Technologies May 2013 - Jun 2015
    Gachibowli,Hyderabad
    As a Hadoop Engineer at KeyPoint Technologies, I was responsible for developing scalable data solutions using the Hadoop ecosystem and Snowflake for handling large-scale data processing and storage. Worked on Snowflake Shared Technology Environment to provide stable infrastructure, secure environments, reusable frameworks, and automated utilities such as Secured Database Connections, Code Review, Build Process, and Deployment Process (SCBD).Migrated data from Amazon Redshift data warehouse to Snowflake, utilizing ETL processes to transfer data from sources to targets.Built dimensional data vault architecture on Snowflake for scalable and optimized data storage and retrieval.Developed a scalable Hadoop cluster running Hortonworks Data Platform (HDP 2.6) to handle large-scale data processing.Developed Spark code using Scala and Spark-SQL for faster data processing and optimized performance by leveraging Spark Context, PairRDDs, and Spark SQL.

Suprith Son Dubba Education Details

Frequently Asked Questions about Suprith Son Dubba

What company does Suprith Son Dubba work for?

Suprith Son Dubba works for Truist

What is Suprith Son Dubba's role at the current company?

Suprith Son Dubba's current role is Senior Data Engineer | | Python| Cloud Specialist (AWS, Azure,GCP) | ETL & Data Pipeline Development | Data Warehousing & Analytics Enthusiast |Power BI & Tableau Reporting.

What schools did Suprith Son Dubba attend?

Suprith Son Dubba attended Mvsr Engineering College.

Who are Suprith Son Dubba's colleagues?

Suprith Son Dubba's colleagues are Kerrie Mcgarrigle, Kristin Lineberry, Robert Stephens, Scott Edmondson, Jhon Carvajal, Christina Zutty, Jacob Herrin.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.