Pratik S

Pratik S Email and Phone Number

Data Engineer @ Samsung Electronics
Frisco, TX, US
Pratik S's Location
Frisco, Texas, United States, United States
About Pratik S

Pratik S is a Data Engineer at Samsung Electronics.

Pratik S's Current Company Details
Samsung Electronics

Samsung Electronics

View
Data Engineer
Frisco, TX, US
Pratik S Work Experience Details
  • Samsung Electronics
    Data Engineer
    Samsung Electronics
    Frisco, Tx, Us
  • Samsung Electronics
    Data Engineer
    Samsung Electronics Dec 2018 - Present
    Suwon-Si, Gyeonggi-Do, Kr
     Responsible for building an Enterprise Data Lake to bring ML ecosystem capabilities to production andmake it readily consumable for data scientists and business users. Processing and transforming the data using AWS EMR to assist the Data Science team as per businessrequirement. Developing Spark applications for cleansing and validation of the ingested data into the AWS cloud. Working on fine-tuning Spark applications to improve the overall processing time for the pipelines. Building a data pipeline and performed analytics using AWS stack (S3, Glue, EMR, RedShift, Lambda). Loading data into S3 buckets, then filtered and loaded into RedShift meeting columnar database designconsiderations. Performed ETL operations using Python, SparkSQL, S3 and Redshift on terabytes of data to obtaininsights. Involved in developing a custom-built Rest API to support real time customer analytics for datascientists and leveraging it with AWS API Gateway. Developing AWS Lambda functions in Python and Java to perform event-driven processing. Developing Spark applications using PySpark to perform various enrichments of click stream data. Import and export data into HDFS from RDMS using Sqoop to populate the tables in Hive.  Using Apache Airflow to automate the workflows by building Directed Acyclic Graphs (DAG) in Python. Used Reporting tools like Tableau to connect with Athena for generating daily reports of data. Applying security policies using AWS IAM (Identity and Access Management) to control access to thedata in cloud. Configuring S3 buckets with various life cycle policies to archive the infrequently accessed data tostorage classes based on requirement. Using Jenkins for build and continuous integration in software development. Elevating code into the Development, Test and Production environments on schedule.Environment: AWS, Spark, HDFS, Hive, EMR, S3, RedShift, API Gateway, Lambda, Athena, Python, Airflow,Sqoop, HBase, Oracle
  • Geico
    Data Engineer
    Geico Mar 2017 - Dec 2018
    Chevy Chase, Md, Us
     Responsible for migrating terabytes of on-premises enterprise data to AWS S3. Involved in ingesting large volumes of credit data from multiple provider data sources to AWS S3. Implemented Data warehouse solutions in AWS Redshift by migrating the data to Redshift from S3. Automated the jobs and data pipelines using AWS Step Functions, AWS Lambda and configuredvarious performance metrics using AWS Cloud watch. Integrated API Gateway with Lambda functions to access the REST API’s. Developing Spark applications to build data pipelines to extract and process data from differentsources of batch and streaming data. Developed ETL pipelines using Spark and Hive for performing various business specifictransformations. Involved in creating Hive tables, loading, and analyzing data using hive scripts by implementingPartitioning and Bucketing in Hive. Working with data science team to do preprocessing and feature engineering and assisted in running aMachine Learning algorithm in production. Validate, manipulate and perform exploratory data analysis tasks using Pandas, NumPy, ScikitLearn andPySpark to interpret and extract insights from large data sets consisting of millions of records. Developed Spark code using Scala, Spark-SQL, Spark-Streaming to perform real-time analysis andprocessing of data.  Load the data into Spark RDD and do in-memory data computations to generate the Output response.  Worked on the ETL process consisting of data transformation, mapping and loading using Informatica. Designed Cassandra data model and migrated data from external data sources to Cassandra usingSpark.  Used Power BI for building interactive dashboards and reporting purposes. Containerized machine learning models developed in different languages using Docker.
  • Hsbc
    Data Engineer
    Hsbc Oct 2015 - May 2016
    London, Gb
     Responsible for building scalable and distributed data solutions using Cloudera CDH. Migrated existing MapReduce jobs into Spark jobs involving transformations and actions using SparkRDDs, Data Frames and Spark SQL API’s using Python. Used Spark-Streaming to divide streaming data into batches as an input to Spark engine for batchprocessing using Python. Involved in migrating large amounts of data from on-prem Cloudera cluster to EC2 instances deployedon Elastic MapReduce (EMR) cluster. Developed an ETL pipeline to extract archived logs from disparate sources and stored in S3 data lake. Analyzed and optimized pertinent data stored in Snowflake using PySpark and SparkSQL. End-to-end performance tuning of Hadoop clusters and MapReduce routines against large data sets. Involved in collecting and aggregating large amounts of log data using Apache Flume and staging datain HDFS for further analysis. Used custom Serdes like Regex SerDe, JSON SerDe, CSV SerDe etc., in Hive to handle multiple formats. Designed and developed Hive external and managed tables to store staging and historical data. Worked with Snappy Compression to store ORC file data in Hive tables effectively. Performed ETL operations using with Informatica power center for data extraction, staging, applytransformations and to store in target data centers. Parsed complex files using Informatica Data Transformations (normalizer, Lookup, Source Qualifier,Expression, Aggregator, Sorter, Rank and Joiner) and loaded them into databases. Created complex SQL queries and scripts to extract, aggregate and validate data from MS SQL, Oracle,and flat files using Informatica and loaded into a single data warehouse repository. Created dashboards in Tableau to create meaningful metrics for decision making. Developed Oozie workflow for scheduling and orchestrating the pipeline.  Also worked on resolving several tickets generated when issues arise in production pipelines.
  • Screative
    Data Scientist/Analyst
    Screative Jun 2012 - Aug 2015
     Responsible for clarifying business objective, data cleaning, data preprocessing, Exploratory data analysis,feature scaling, machine learning modeling, model tuning and model testing. Worked closely with internal stakeholders such as business teams, product managers, engineering teams andpartner teams. Used R/Python to analyze the data, plot the visualizations and implemented ML algorithms for large datasetanalysis and Tableau for constructing dashboards. Selecting features, building, and optimizing classifiers using Machine learning techniques Created various types of visualizations using Python, R and Tableau.  Data wrangling to clean, transform and reshape the data utilizing NumPy and Pandas library. Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, TextAnalytics, Sentiment Analysis, Naive Bayes, and Logistic Regression using Python to determine the accuracy rateof the model. Worked with a marketing team to implement targeted marketing on small scale businesses and developed aMachine Learning model to classify the customers with high value. Utilized SQL to extract data from SQL Server and MongoDB, to prepare data for analysis. Analyzed the data using Python and Spark and also Used Spark Streaming to analyze tweets.  Performed data cleaning and data preprocessing tasks to remove the noise and to transform the data into anunderstandable format. Used NLP for sentiment analysis and to aid business team to analyze market trends and gain market intelligence. Performed data integrity checks, data cleaning, exploratory data analysis, feature engineering and Optimizationusing Python. Used various techniques like Histogram, bar plot, Scatter plots, pair plots, Box plots and Violin plots todetermine the condition of the data. Measured the performance of the model using various performance metrics like Accuracy, Confusion matrix,Precision and Recall, F-1 Score, log-loss.

Pratik S Education Details

  • The University Of Texas At Dallas
    The University Of Texas At Dallas
    Information Technology And Management
  • University Of Mumbai
    University Of Mumbai
    Computer Science

Frequently Asked Questions about Pratik S

What company does Pratik S work for?

Pratik S works for Samsung Electronics

What is Pratik S's role at the current company?

Pratik S's current role is Data Engineer.

What schools did Pratik S attend?

Pratik S attended The University Of Texas At Dallas, University Of Mumbai.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.