Sushma S Email and Phone Number

Big data engineer at Exchange, Looking for C2C or C2H positions only at @ The Exchange

Sushma S's Location

Washington DC-Baltimore Area, United States, United States

About Sushma S

Sushma S is a Big data engineer at Exchange, Looking for C2C or C2H positions only at The Exchange.

Sushma S's Current Company Details

The Exchange

View

Big data engineer at Exchange, Looking for C2C or C2H positions only

Sushma S Work Experience Details

Senior Big Data Engineer

The Exchange Jan 2021 - Present

Dallas, Tx, Us

• Lead Business Intelligence reports development efforts by working closely with Micro strategy, Teradata, and ETL team• Senior level SQL query skills (Oracle and TSQL) in analyzing and validating SSIS ETL database data ware house processes• Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.• Migrated existing MapReduce programs to Spark using Scala and Python• Performed Data ingestion using SQOOP, Apache Kafka, Spark Streaming and FLUME.• Created HBase tables to store various data formats of personally identifiable information data coming from different portfolios.• Implemented data access jobs through Pig, Hive, Tez, Solr, Accumulo, Hbase, and Storm.• Worked on Implementation of a log producer in Scala that watches for application logs transform incremental log and sends them to a Kafka and Zookeeper based log collection platform• Assisted with FATCA testing using internal software to ensure that proper controls were in place for the new regulation• Involved in importing real time Jet.com data to HDFS from Kafka and implemented and scheduled hourly runs using automic/uc4.• Developed simple/complex MapReduce jobs using Hive and Pig.• Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.• Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.• Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.• Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python.• Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.

View
Big Data Engineer

Cnhi Jul 2019 - Dec 2020

Montgomery, Al, Us

• Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.• Files extracted from Hadoop and dropped on daily hourly basis into S3. Working with Data governance and Data quality to design various models and processes.• Responsible for working with various teams on a project to develop analytics-based solution to target customer subscribers specifically.• Develop, Deploy and Troubleshoot the ETL Work Flows using Hive, Pig and Sqoop.• Developed Python, Shell/Perl Scripts and Power shell for automation purpose and Component unit testing using Azure Emulator.• Used Zookeeper to provide coordination services to the cluster. Experienced in managing and reviewing Hadoop log files.• Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.• Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.• Experienced with performing CURD operations in HBase.• Hands on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.• Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.• Excellent knowledge on Confidential Azure Services, Amazon Web Services and Management.• Involved in in importing real time data to Hadoop using Kafka and implemented zombie runner job for daily imports.• Built a new CI pipeline. Testing and deployment automation with Docker, Swamp, Jenkins and Puppet. Utilized continuous integration and automated deployments with Jenkins and Docker.• Designed and developed many Real time applications in Talend with Spark and Kafka.

View
Big Data Engineer

Davita Healthcare Partners Inc Feb 2017 - Jul 2019

• Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location• Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS).• Created HBase tables to store variable data formats of PII data coming from different portfolios.• Designed and Developed Real Time Data Ingestion frameworks to fetch data from Kafka to Hadoop.• Troubleshoot user's analyses bugs (JIRA and IRIS Ticket).• Worked with SCRUM team in delivering agreed user stories on time for every Sprint. • Worked on analyzing and resolving the production job failures in several scenarios.• Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for data sets processing and storage.• Developed data pipelines using Hive, Pig, and MapReduce.• Designed data models with industry standards up to 3rd NF (OLTP) and de normalized (OLAP) data marts with Star & Snow flake schemas. • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements. • Assisted operation support team for transactional data loads in developing SQL Loader & Unix scripts• Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive.• Extensively worked on Python and build the custom ingest framework.• Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Data Engineer

Jmr Infotech Oct 2015 - Nov 2016

Bangalore, Karanataka, In

• Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Informatica, and creating ETL jobs in Informatica. • Implemented Bucketing and Partitioning using hive to assist the users with data analysis. • Used Oozie scripts for deployment of the application and perforce as the secure versioning software• Imported Legacy data from SQL Server and Teradata into Amazon S3.• Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.• Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.• Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. • Develop database management systems for easy access, storage, and retrieval of data. • Expert in creating Hive UDFs using Java to analyze the data efficiently. • Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. • Developed a Script that copies avro formatted data from HDFS to External tables in raw layer.• Created PySpark code that uses Spark SQL to generate dataframes from avro formatted raw layer and writes them to data service layer internal tables as orc format.• In charge of PySpark code, creating dataframes from tables in data service layer and writing them to a Hive data warehouse.• Worked with SCRUM team in delivering agreed user stories on time for every Sprint. • Worked on analyzing and resolving the production job failures in several scenarios.• Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.• Knowledge on implementing the JILs to automate the jobs in production cluster.

View
Data Engineer

Dimension Data Jun 2013 - Aug 2015

Bryanston, Johannesburg, Za

• Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.• Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts• Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.• Created consumption views on top of metrics to reduce the running time for complex queries.• Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.• Used Spark SQL to create structured data by using data frames and querying from other data sources using JDBC and hive.• Used Apache NiFi to automate the data movement between different Hadoop systems.• Compare the data in a leaf level process from various databases when data transformation or data loading takes place. I need to analyze and look into the data quality when these types of loads are done (To look for any data loss, data corruption).• Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer• Developed Mappings using Transformations like Expression, Filter, Joiner and Lookups for better data messaging and to migrate clean and consistent data

View

Frequently Asked Questions about Sushma S

What company does Sushma S work for?

Sushma S works for The Exchange

What is Sushma S's role at the current company?

Sushma S's current role is Big data engineer at Exchange, Looking for C2C or C2H positions only.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles

Get direct phone numbers & mobile contacts

Access company data & employee information

Works directly on LinkedIn - no copy/paste needed

Get Chrome Extension - Free

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.