Sandeep M

Sandeep M Email and Phone Number

Senior Data Engineer @ MasterControl
Peachtree Corners, GA, US
Sandeep M's Location
Peachtree Corners, Georgia, United States, United States
About Sandeep M

I am a Big data Engineer with around 9+ years of experience in designing, developing and deploying data-driven solutions using big data and cloud technologies. I am currently working as a AZURE Data Engineer at mastercontrol corp, where I am responsible for migrating the client data warehouse architecture from on-premises to AZURE cloud and implementing data movements using AZURE Data Factory and Databricks.My core competencies include building scalable and reliable data pipelines, data models and data structures using various big data tools and frameworks such as Hadoop, Spark, Hive, Kafka and NoSQL databases. I also have experience in working with different cloud platforms such as Microsoft Azure, AWS and GCP and leveraging their services and capabilities for data engineering and analytics. I am passionate about driving data-driven solutions that deliver value and insights for my clients..

Sandeep M's Current Company Details
MasterControl

Mastercontrol

View
Senior Data Engineer
Peachtree Corners, GA, US
Employees:
796
Sandeep M Work Experience Details
  • Mastercontrol
    Senior Data Engineer
    Mastercontrol
    Peachtree Corners, Ga, Us
  • Mastercontrol Global Limited
    Azure Data Engineer
    Mastercontrol Global Limited Jan 2022 - Present
    Salt Lake City, Utah, United States
     Worked on end-to-end ETL processes using ADF, ensuring seamless data integration and quality. Hands on experience on creating spark clusters and parameters in the databricks notebooks. I have implemented delta lake concepts in databricks. Experienced in implementing SCD1, SCD2 using delta lake and data validations using pyspark and scheduling databricks notebooks using databricks jobs.Hands on knowledge on using dbutils commands and processing different file formats like csv, parquet, Json etc. Hands on knowledge on different connectors in databricks like creating mount point to connect to data lake storage and creating JDBC connections from databricks to azure synapse Strong knowledge of analytics functions like window, rank and dense_rank etc. and integrating databricks with azure key vault. Developed pyspark code for data cleansing like trimming of columns and duplicate checks, key duplicates etc. and SCD Type1 using merge functionality. Hands on experience in implementing different types of joins in pyspark and SQL. Having experience in implementing Azure Data factory(v2) pipeline components such as linked services, Datasets and Activities. Implemented multi tables full load from on-premises to cloud and incremental load from on-premises to cloud and Implemented Audit in Azure Data Factory using stored procedure. Implemented the scheduling in Azure Data Factory using Triggers and Dynamic Data loading using a Config. Table using components Lookup, Foreach and Copy Data activities for the tables which are configured in the Config. Table and experience on Key Vaults. Scheduling ADF pipelines using scheduled triggers and event-based triggers. Working experience Pipelines in ADF using Linked Services/Datasets/Pipeline to Extract and load data from different sources like Azure SQL, ADLS, Blob storage, Azure SQL Data warehouse. Involved in Bug fixes and code debugging and job monitoring in Production Environment.
  • Guardian Life Insurance Limited
    Aws Data Engineer
    Guardian Life Insurance Limited Feb 2021 - Dec 2021
    Bethlehem, Pennsylvania, United States
     Established and developed serverless applications on AWS using the Serverless Framework and Python's boto3 module. Build serverless applications utilizing AWS Lambda, API Gateway, and DynamoDB, resulting in a reduction in infrastructure costs and an improvement in scalability. Designed and developed end-to-end data pipelines using AWS Glue, Apache Airflow and Apache Spark to extract, transform, and load data from various sources into data warehouses. Integrated data quality checks and data governance mechanisms to ensure data accuracy and consistency throughout the organization. Created and designed ETL processes in AWS Glue to import various kinds of from outside sources into AWS Redshift. Monitored and maintained the health and performance of Amazon Redshift clusters through monitoring tools and dashboards. Managed backups and disaster recovery procedures to ensure data integrity and business continuity. Implemented data encryption and security measures to protect sensitive data in compliance with industry standards. Designed and optimized data models, including defining schemas and setting up indexes, to enhance data storage and retrieval efficiency in Snowflake. Developed and maintained data pipelines using Apache Airflow and custom scripts to extract, transform, and load (ETL) data from diverse sources into the RedShift data warehouse. Set up proactive monitoring and alerting systems to detect and address data loads, query performance, and system health issues. Utilized Apache Spark for distributed computing tasks, improving processing speeds.
  • Standard Chartered
    Azure Data Engineer
    Standard Chartered Nov 2019 - Jan 2021
    Nyc, Ny
     Responsible for creating a data lake on the Azure Cloud Platform to improve business teams' use of Azure Synapse SQL for data analysis. Utilized Azure SQL as an external hive meta store for Databricks clusters so that metadata is persisted across multiple clusters. Employed Azure Data Lake Storage as a data lake and made sure that spark and hive tasks immediately sent all the processed data to ADLS. Strong Experience working with Azure Databricks runtimes and utilizing data bricks API for automating the process of launching and terminating runtimes. Experience in integrating Snowflake data with Azure Blob Storage and SQL Data Warehouse using Snow Pipe. Employed resources like SQL Server Integration Services, Azure Data Factory, and other ETL tools to identify the route for transferring data from SAS reports to Azure Data Factory. Developed PowerShell scripts for automation and configuration management. Implemented data processing workflows using PySpark, leveraging the power of Python for Spark. Developed and optimize Spark pipelines for efficient large-scale data processing.  Designed and deployed infrastructure using Azure Resource Manager (ARM) templates. Implemented and managed SQL Data Warehousing (SQL DW) solutions for analytical processing.  Maintained Azure Data Factory to consume data from several source systems and transferred data from upstream to downstream systems using Azure Data Factory as an orchestration tool.  Developed a Pipelines in Azure Data Factory (ADF) using Linked Services/Datasets/Pipeline to extract, transform, and load data from a variety of sources, including Azure SQL, Blob storage, an Azure SQL data warehouse, write-back tools, and backwards.  Implemented and optimized data storage solutions using Azure Cosmos DB. Utilized Azure Analysis Services for multidimensional data analysis and reporting.
  • Extarc Software Solutions Pvt. Ltd.
    Hadoop Developer
    Extarc Software Solutions Pvt. Ltd. May 2017 - Oct 2019
    Hyderabad, Telangana, India
     Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables. Rigorously exerted Spark -Scala (RDD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector APIs for various tasks (Data migration, Business report generation etc.) Developed Spark Streaming application for real time sales analytics. Prepared an ETL framework with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption. Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project. Engineered complex data pipelines using tools such as Databricks, processing terabytes of data to drive decision-making. Analyzed the source data and handled efficiently by modifying the data types. Worked on excel sheet, flat files, CSV files to generated PowerBI ad-hoc reports. Analyzed the SQL scripts and designed the solution to implement using PySpark. Extracted the data from other data sources into HDFS using Sqoop. Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS. Extracted the data from MySQL into HDFS using Sqoop. Implemented automation for deployments by using YAML scripts for massive builds and releases. Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop. Implemented Data classification algorithms using MapReduce design patterns. Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs.  Worked on GIT to maintain source code in Git and GitHub repositories.
  • Triniti Advanced Software Labs Private Limited
    Data Warehouse Consultant
    Triniti Advanced Software Labs Private Limited May 2014 - Apr 2017
    Hyderabad, Telangana, India
    * Worked with Stakeholders regarding business requirements, functional specifications and enhancements, based on the business needs created technical design and functional specification documents.*Data profiling the source files and developing data model and mappings for smaller requirements.*Implemented CDC, SCD2, SCD1 Delta load, Snapshot and transactional fact tables, headers and footers to Flat File, File list.* Active participation in weekly calls with data modeling and analyst teams to understand and work on any new requirements.*Analyzing data from source systems to design the solution for the business requirement.*Developed Complex Mappings in Informatica using Power Center transformations (Source Qualifier, Joiner, Lookups, Filter, Router, Aggregator, Expression, XML Update and Sequence generator transformations), Mapping Parameters/Variables, Parameter files, SQL overrides, Transformation Language.*Developed Unix scripts for SFTP file transfers and Target table truncate operations.*Implemented partitioning at database level for better performance.*Implemented Push Down Optimization (PDO) for better performance when source data is huge.*Provided support to the QA team for various testing phases of ETL development.*Scheduled the Workflows to run on a daily and weekly basis using Control-M Scheduling tool.

Sandeep M Education Details

  • Harvard High School
    Harvard High School

Frequently Asked Questions about Sandeep M

What company does Sandeep M work for?

Sandeep M works for Mastercontrol

What is Sandeep M's role at the current company?

Sandeep M's current role is Senior Data Engineer.

What schools did Sandeep M attend?

Sandeep M attended Harvard High School.

Who are Sandeep M's colleagues?

Sandeep M's colleagues are Barbara Stromness, Scott Hyland, Dave Adams, Jacob Russo, Robert Harris, Louise Cliche, Pmp, Adriana Chandler.

Not the Sandeep M you were looking for?

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.