Y Samatha Email and Phone Number
Have 5+ years experience in data driven projects - worked in almost all phases of data flow in different domains like retail, health, banking, insurance etc., Have worked in all phases of datawarehousing - gathering, data modelling, data profiling, ETL phases (ingestion, extraction, transformation and loading), data visualizations etc.,Also have worked on BigData - Hadoop components like HDFS, Hive, Sqoop, Spark (SQL)And in Cloud technology, worked on Azure cloud components - ADLS (Data lake), Azure data factory, Azure Databricks, Synapse etc.,
Pentair
View- Website:
- pentair.com
- Employees:
- 6450
-
Azure Data EngineerPentair Sep 2023 - PresentMinneapolis, Minnesota, United States• Designed, developed, and deployed high-performance ETL pipelines to extract, transform data from ERP systems (SAP HANA), CVS files, and APIs in the Azure cloud using Azure Data Factory, Azure Databricks, and Azure Synapse SQL Pools.• Implemented full and incremental data loading strategies and implemented a medallion architecture to optimize data processing, querying time, and minimize resource consumption.• Implemented a polybase mechanism to load the data from file-based sources into the Azure Synapse data warehouse tables and designed the data warehouse to host the fact and dimension tables in star and snowflake schema models.• Hive Partitioning, indexing, and caching strategies have been implemented to improve query performance and reduce processing time using Scala and PySpark performance tuning techniques.• Utilized Python, PySpark, and SQL for executing data validation, cleansing, and transformation operations in Databricks notebooks to ensure data quality and integrity.• Involved in redesigning the existing architecture and estimating the cluster size, and monitoring of the Spark data bricks cluster.• Utilized Auto Loader and Delta Live Tables to process and analyze real-time data streams. • Experience in building streaming applications in Azure Databricks Notebooks using Kafka, Event Hubs, and Spark structured streaming for real time data ingestion.• Improving the performance of Hive and Spark tasks.• Wrote Python and PySpark scripts to parse XML, CSV, and JSON documents and load the data in Azure Data Lake storage Gen 2.• Proficient in error handling, troubleshooting, and provided production support to resolve high priority incidents and coding issues meeting SLA.• Employed Azure DevOps to track work items, manage code repositories, and automate CI/CD pipelines using GitHub. -
Azure Data EngineerVisa Aug 2022 - Aug 2023• Designed and implemented data pipelines to extract data from various sources, transform it, and load it into Snowflake using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.• Wrote Data bricks notebooks (Python, PySpark, and Scala) for handling large volumes of data, transformations, and computations in Azure Databricks.• Optimized ETL processes using performance tuning techniques like partitioning, indexing, and caching and using advanced SQL queries to implement stored procedures, window functions, and Common Table Expressions (CTE’s) resulting in 35% increase in data quality and performance.• Skilled in utilizing Databricks Notebooks, Scala and PySpark programming to develop and execute complex data transformation, machine learning, and advanced analytics tasks.• Designed and managed data pipelines within Azure Synapse Analytics to automate ETL processes, streamlining the extraction, transformation, and loading of data for further analysis. • Proficient in leveraging Snowflake's Time Travel and Fail-Safe features to recover data to specific points in time and protect against data loss.• Utilized Delta Live Tables for streaming data and Delta Lake's ACID transactional capabilities to maintain data integrity and consistency in complex data pipelines.• Skilled in designing and managing tasks and streams within Snowflake, writing complex SQL queries using SnowSQL, while leveraging SnowPipe for automating the seamless ingestion of streaming data into Snowflake tables, ensuring real-time analytics and data availability.• Involved in developing the Logic Apps for email notifications and developing custom business transformations.• Worked in Agile framework using Agile Tools, participated in 2-week sprints, document preparation, knowledge sharing sessions, daily scrums and review meetings, to update and track progress and ensure effective project management and execution. -
Data EngineerTcs_India Apr 2019 - Sep 2021Hyderabad, Telangana, India•Developed ETL data pipelines using SSIS and Azure services like Azure Data factory, Azure Databricks, Azure Data Lake storage, created mappings/workflows, data migration flow.•Made use of SQL, Python, and Spark Scripts using Scala, and Spark SQL in Databricks to extract, transform, and load the data to Azure Synapse Analytics Dedicated SQL pool.•Optimized and fine-tuned Spark jobs in Azure Synapse Spark Pools to maximize performance and resource utilization. •Written Scala, PySpark, and python script notebooks for Azure Databricks transformation task.•Used Pandas and NumPy packages in Python for Data Cleansing and validating the source data. •Designed and developed ETL pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.•Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.•Created Databricks Job workflows which extracts data from SQL server and uploads the files to SFTP using PySpark and python.•Enhanced Azure Functions code to efficiently extract, transform, and load data from various sources such as databases, APIs, and file systems.•Built a common SFTP download or upload framework using Azure Data Factory and Databricks. Maintain and support Teradata architectural environment for EDW Applications.•Implemented logical and physical modeling using ERWIN data modeler. Involved in physical database design, data sourcing, data loading, data transformation, monitoring and optimizing SQL Server performance tuning.•Wrote SQL queries, including DDL, DML, and diverse database objects (indexes, triggers, CTEs, views, stored procedures, functions, and packages) for data manipulation and retrieval.•Administrated, monitored SQL Server, performed database backups, and upgrades across production and non-production environments using native tools like SQL Server Management Studio (SSMS) and SQL Agent. -
Big Data EngineerChetana Technology Solutions Apr 2018 - May 2019Hyderabad, Telangana, India• Experience in building end to end data pipelines on Hadoop Data Platforms and Azure Cloud services such as Azure Data Factory, and Azure Databricks for extracting, loading, and transforming large sets of structured, semi-structured, and unstructured data.• In-depth knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.• Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, DataFrames, and Spark SQL API’s and Scala in data bricks notebooks.• Expertise in writing Spark RDD transformations, actions, Data Frame's, Persistence (Caching), Accumulators, Broadcast Variables, Case classes for the required input data and performed the data transformation using Spark-core in data bricks.• Performed Incremental Load from RDBMS into Hadoop distributed File system HDFS in Parquet file format using Sqoop and further applied spark transformations.• Loaded data into Hive Tables and involved in creating Hive internal and external tables to perform ETL on data.• Worked on Normalization and De-normalization techniques for optimum performance in relational and dimensional databases environments like SQL Server using SQL queries.• Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.• Created HBase tables to store various data formats of data coming from different sources.• Responsible for importing log files from various sources into HDFS using Flume.• Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.• Created User Defined Functions (UDF), User Defined Aggregated (UDA) Functions in Pig and Hive.• Support for the cluster, topics on the Kafka manager. Cloud formation scripting, security and resource automation.• Played a significant role in establishing the CI/CD pipeline using Jenkins, and GitHub.
Y Samatha Education Details
-
Electrical, Electronics And Communications Engineering -
Data Science
Frequently Asked Questions about Y Samatha
What company does Y Samatha work for?
Y Samatha works for Pentair
What is Y Samatha's role at the current company?
Y Samatha's current role is Looking for contract roles | Certified Data Engineer.
What schools did Y Samatha attend?
Y Samatha attended Jawaharlal Nehru Technological University, Anantapur, University Of North Texas.
Who are Y Samatha's colleagues?
Y Samatha's colleagues are Patrick Niles, Crystal Cline, Arlene Davis, Djamel Djamel, Dave Thomas, Jenny Scobie, Ken Clack.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial