Rakesh B Email and Phone Number

Senior Data Engineer | 12+ Years in Big Data, ETL, DevOps & Cloud Solutions (AWS, Azure) | Expert in PySpark, Kafka, and Data Integration for Banking, Healthcare, & Telecom at @ PNC

pittsburgh, pennsylvania, united states

Rakesh B's Location

West Jordan, Utah, United States, United States

About Rakesh B

Overall 12+ years of IT industry experience with extensive experience in application designing, data analysis, data modeling and implementation of enterprise class systems; spanning Big Data, Data Integration, Object Oriented programming and Advanced Analytics across Banking, Health Care and Telecommunication domain. Developed and optimized ETL pipelines using PySpark to process large-scale datasets efficiently and perform data transformations.Building data APIs with Python for seamless data access and integration.Hands-on experience in analyzing data using Python, SQL, Microsoft Excel, PySpark, Spark SQL for Data Mining, Data Cleansing and Machine Learning.Implemented DevOps CI/CD pipelines to automate deployment and testing of data engineering applications.Writing Python scripts for data extraction, transformation, and loading tasks.Prepared and loaded data into Power BI for visualization and reporting.Designed and deployed scalable database solutions using Amazon Aurora, ensuring high availability and performance.Integrated NoSQL databases with other systems for real-time data access and processing.Integrating Python with big data tools such as PySpark for scalable processing.Created ETL jobs with AWS Glue to handle data extraction, transformation, and loading.Proficient in designing, scheduling, and monitoring workflows using Airflow's Directed Acyclic Graph (DAG) structure.Proficient in Data Warehouse and Data Analytics design, Azure Data Lakes, and Business Intelligence tools. Advanced SQL, PL/SQL, and ETL expertise, along with skills in Databricks and Power BI. Designed and implemented scalable streaming data architectures using Kafka within AWS, ensuring high-throughput data ingestion and processing.Built automated ETL pipelines in Python to handle data extraction, transformation, and loading.Ensured Medallion architecture compliance with enterprise data governance standards, implementing robust security measures for data encryption and access control.Experience in the development of ETL processes and frameworks for large-scale, complex datasets. Experience with application development on Linux, python, RDBMS, NoSQL, and ETL solutions.Experience operating very large Data Warehouses. Experienced in writing Spark Applications in Scala.Worked with AWS-based Data ingestion and transformations, setting up data in AWS using the S3 bucket and configuring instance backups to the S3 bucket.Experience in working with Azure cloud platform (Data Lake, Databricks, Blob Storage, Data Factory)

Rakesh B's Current Company Details

Pnc

View

Senior Data Engineer | 12+ Years in Big Data, ETL, DevOps & Cloud Solutions (AWS, Azure) | Expert in PySpark, Kafka, and Data Integration for Banking, Healthcare, & Telecom

pittsburgh, pennsylvania, united states

Website:: pnc.com
Employees:: 48208

Rakesh B Work Experience Details

Senior Data Engineer

Pnc Jan 2023 - Present

Salt Lake City, Utah, United States

Designed and set up Enterprise Data Lake on AWS to support storing, processing, analytics, and reporting of large and dynamic datasets using services like S3, EC2, ECS, AWS Glue, SNS, SQS, DMS, and Kinesis.Choose suitable ETL frameworks like Apache Spark or AWS Glue based on project needs and data volume.Configured and maintained PostgreSQL databases for reliable data storage.Developed and managed scalable ETL pipelines on Databricks to process large datasets… Show more Designed and set up Enterprise Data Lake on AWS to support storing, processing, analytics, and reporting of large and dynamic datasets using services like S3, EC2, ECS, AWS Glue, SNS, SQS, DMS, and Kinesis.Choose suitable ETL frameworks like Apache Spark or AWS Glue based on project needs and data volume.Configured and maintained PostgreSQL databases for reliable data storage.Developed and managed scalable ETL pipelines on Databricks to process large datasets efficiently.Developed and managed data pipelines using Databricks on AWS to handle large-scale data processing and analytics.Developed and maintained REST APIs to facilitate seamless communication between microservices and external systems. Developed data processing pipelines using PySpark to handle large-scale datasets efficiently.Designed and implemented Kafka clusters to handle large-scale, real-time data streaming and processing.Developed Python scripts for data transformation and integration, ensuring high performance and reliability across various data sources.Used AWS Databricks to build and manage scalable data pipelines and analytics workflows in the cloud.Used ELT to handle large volumes of data by leveraging the target system's processing power for transformations.Leveraged Kafka API to build scalable event-driven architectures, supporting high-throughput data pipelines.Managed infrastructure as code with Terraform, enabling version control and collaboration on infrastructure changes.Developed and managed workflows using Airflow to automate and schedule data pipelines.Developed interactive dashboards in Power BI to visualize complex data trends and insights, improving stakeholder engagement and data accessibility.Optimized REST API performance by implementing caching, load balancing, and efficient data serialization techniques storage. Integrated PySpark with various data sources, including HDFS, S3, and Hive, to streamline data ingestion and transformation. Show less

View
Senior Data Engineer

Farmers Insurance Group Mar 2019 - Dec 2022

West Jordan, Utah, United States

Designed and configured Azure Cloud relational servers and databases, optimizing infrastructure based on business requirements.Developed CI/CD processes using Azure DevOps to streamline software delivery and deployment.Ensured data accuracy and consistency across different sources by implementing data validation techniques during the ETL process.Integrated Delta Live Tables with Databricks notebooks to automate data workflows and enhance analytics capabilities.Collaborated on… Show more Designed and configured Azure Cloud relational servers and databases, optimizing infrastructure based on business requirements.Developed CI/CD processes using Azure DevOps to streamline software delivery and deployment.Ensured data accuracy and consistency across different sources by implementing data validation techniques during the ETL process.Integrated Delta Live Tables with Databricks notebooks to automate data workflows and enhance analytics capabilities.Collaborated on designing and optimizing data pipelines with Azure Data Factory, enabling seamless integration across different data sources. Analyzed and processed data with Databricks Lakehouse to gain insights.Developed scalable data pipelines and ETL processes using Azure Databricks for big data processing.Worked with Cloud Security and DevOps teams to integrate security features into Azure-based data solutions, ensuring compliance and reliability.Developed and deployed robust data pipelines on Azure Databricks, utilizing the lakehouse architecture to enhance data accessibility and analytics.Developed applications and data processing scripts using Scala to leverage its functional programming capabilities.Developed and optimized data pipelines using Azure Databricks to process large-scale data efficiently.Integrated ADF with other Azure services, such as Data Lake and SQL Database, for seamless data processing.Applied advanced analytical methods using Python to identify trends and patterns in large datasets, supporting strategic planning initiatives.Integrated Azure Cosmos DB with other Azure services to support real-time data processing and analytics.Implemented data visualization using Python tools like Matplotlib and Seaborn to present insights effectively.Designed cloud-native data solutions on Azure, integrating various services like Azure Data Lake and Azure SQL Database for comprehensive data management. Show less

View
Big Data Engineer

Novartis May 2017 - Feb 2019

Tempe, Arizona, United States

Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, ).Worked on building the data pipelines (ELT/ETL Scripts), extracting the data from different sources (MySQL, AWS S3 files), transforming, and loading the data to the Data Warehouse (AWS Redshift)Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API… Show more Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, ).Worked on building the data pipelines (ELT/ETL Scripts), extracting the data from different sources (MySQL, AWS S3 files), transforming, and loading the data to the Data Warehouse (AWS Redshift)Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.Created SQL scripts for daily extracts, ad-hoc requests, reporting and analyzing large data sets from S3 using AWS Athena, Hive and Spark SQL.Creation of ETL, built data pipelines using spark SQL, PySpark, AWS Athena and AWS Glue.Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop.Using Spark Context, Spark-SQL, Spark MLLib, Data Frame, Pair RDD and Spark YARN.Learner data model which gets the data from Kafka in real time and persist it to Cassandra.Performed API calls using the python scripting. Performed reads and writes to S3 using Botto3 libraries.Developed Kafka consumer API in python for consuming data from Kafka topics.Consumed Extensible Markup Language (XML) messages using Kafka and processed the XML file using Spark Streaming to capture User Interface (UI) updates.Performed Raw data ingestion into S3 from kinesis firehouse which would trigger a lambda function and pit refined data into another S3 bucket and write to SQS queue as aurora topics.Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a Data pipeline system. Developed custom AWS Step Functions state machines using AWS Lambda functions, allowing for greater flexibility and customization in workflow design Show less

View
Data Engineer

Cable & Wireless Communications Nov 2014 - Apr 2017

Dallas, Texas, United States

Research and recommend a suitable technology stack for Hadoop migration considering current enterprise architecture.Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie, and Talend ETL.Extensively used Spark stack to develop preprocessing jobs, including RDD, Datasets, and Data frame APIs to transform the data for upstream consumption.Build and maintain scalable data pipelines using the Hadoop… Show more Research and recommend a suitable technology stack for Hadoop migration considering current enterprise architecture.Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie, and Talend ETL.Extensively used Spark stack to develop preprocessing jobs, including RDD, Datasets, and Data frame APIs to transform the data for upstream consumption.Build and maintain scalable data pipelines using the Hadoop ecosystem and other open-source components like Hive, and HBase. Architected and managed data orchestration workflows using Databricks Jobs and Apache Airflow, ensuring timely execution and monitoring of ETL processes across different data sources.Designed and Implemented the ETL process using Talend Enterprise Big Data Edition to load the data from Source to Target Database.Implemented best practices for performance optimization in Databricks, such as optimizing partitioning and caching, leading to a 40% improvement in query and job execution times.Involved in implementing and integrating NoSQL databases like HBase and Cassandra.Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persist the data in HDFS.Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR, and MapR distribution.Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets.Created ETL processes for cleaning and enriching incoming data, improving data quality by 25% and reducing data-related incidents by 20%.Implemented data warehousing solutions on AWS Redshift, ensuring scalability and optimizing query performance.Streamlined data integration with Apache Kafka, enabling real-time data updates. Show less

View
Data Engineer

Lululemon Jun 2012 - Oct 2014

Houston, Texas, United States

Played a role in gathering requirements, analysis of entire system and providing estimation on development, testing efforts.Developed ETL’s using PySpark. Used both Dataframe API and Spark SQL API.Using Spark, performed various transformations and actions and the final result data is saved back to HDFS from there to target database SnowflakeMigrated an existing on-premises application to AWS . Used AWS services like EC2 and S3 for small data sets processing and storage… Show more Played a role in gathering requirements, analysis of entire system and providing estimation on development, testing efforts.Developed ETL’s using PySpark. Used both Dataframe API and Spark SQL API.Using Spark, performed various transformations and actions and the final result data is saved back to HDFS from there to target database SnowflakeMigrated an existing on-premises application to AWS . Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMRStrong experience and knowledge of real time data analytics using Spark Streaming, Kafka and FlumeConfigured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFSDesign and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS RedshiftUsed Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.Worked in building ETL pipeline for data ingestion, data transformation, data validation on cloud service AWS, working along with data steward under data compliance.Used Pyspark for extract, filtering and transforming the Data in data pipelines.Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack Elasticsearch KibanaUsed Data Build Tool for transformations in ETL process, AWS lambda, AWS SQSWorked on scheduling all jobs using Airflow scripts using python. Adding different tasks to DAG’s and dependencies between the tasks.Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Show less

View
Big Data Developer

Hcltech Jul 2010 - Mar 2012

Applied SQL-based transformations based on business logic outlined in mapping sheets, enhancing data integration and alignment with business requirements.Implemented big data solutions to manage and process large volumes of data efficiently using distributed computing frameworks.Created technical and design documents for ETL processes, providing comprehensive guidelines and documentation for each module.Utilized big data tools like Hadoop and Spark to perform complex data processing… Show more Applied SQL-based transformations based on business logic outlined in mapping sheets, enhancing data integration and alignment with business requirements.Implemented big data solutions to manage and process large volumes of data efficiently using distributed computing frameworks.Created technical and design documents for ETL processes, providing comprehensive guidelines and documentation for each module.Utilized big data tools like Hadoop and Spark to perform complex data processing tasks and analytics on massive datasets.Initiated collaboration with cross-functional teams to address technical challenges, offering creative solutions that led to improved project outcomes.Created detailed technical documentation for ETL processes, ensuring clarity and consistency for future development efforts.Used Apache Spark with Hadoop for real-time data processing and analytics, improving performance for large datasets.Designed and implemented data processing solutions using Hadoop and Spark, enabling scalable analytics across large datasets.Implemented Sqoop-based data synchronization solutions to maintain consistency between Hadoop and external databases, enhancing data integrity and reliability.Identified and articulated technical issues, providing insights on their impact to prioritize resolution efforts effectively.Developed comprehensive documentation of data ingestion, processing, and presentation phases to facilitate knowledge sharing and adherence to data governance standards.Engineered a MongoDB sharding strategy that improved read/write performance by 15% and supported scalable data growth, ensuring robust data management capabilities.Integrated cloud services for data processing, leveraging AWS tools like Glue and EMR to enhance data management capabilities.Integrated Hive tables with other big data technologies such as Hadoop and HBase, facilitating comprehensive data processing and analysis workflows. Show less

View

Frequently Asked Questions about Rakesh B

What company does Rakesh B work for?

Rakesh B works for Pnc

What is Rakesh B's role at the current company?

Rakesh B's current role is Senior Data Engineer | 12+ Years in Big Data, ETL, DevOps & Cloud Solutions (AWS, Azure) | Expert in PySpark, Kafka, and Data Integration for Banking, Healthcare, & Telecom.

Who are Rakesh B's colleagues?

Rakesh B's colleagues are Amanda Macko, Nicolas Scarpa, Kerry Howard, Ashley Burello, Jared Allen, Carol Brocker, Jack Viadero.

Not the Rakesh B you were looking for?

Rakesh B

Morrisville, Nc

View
Rakesh B

Senior Data Engineer | Etl, Data Quality

Mckinney, Tx

View
Rakesh B

Full Stack Java Developer

Austin, Tx

View
Rakesh B

Boston, Ma

View

View similar profiles

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles

Get direct phone numbers & mobile contacts

Access company data & employee information

Works directly on LinkedIn - no copy/paste needed

Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.

Security Check