Celeste S

Celeste S Email and Phone Number

Seek positions of Big Data Developer & Data Engineer | AWS Certification Holder | Hadoop | Spark @ Information Builders
Celeste S's Location
Washington DC-Baltimore Area, United States, United States
About Celeste S

Actively looking for C2C Data Engineer | Big Data Developer in USLanguage: Python, SQL, Scala, R, JavaSpark: RDD, DataFrame, DataSet, SparkSQL, PySpark, ML, Spark StreamingHadoop: HDFS, YARN, MapReduce, Tez, ZooKeeper, Hue, Hive, Pig, Sqoop, Impala, Kafka, HBaseAWS: EC2, EMR, Kinesis, Lambda, Step Functions, Athena, Glue, Data Pipeline, S3, RDS, Redshift, DynamoDB, CloudWatch

Celeste S's Current Company Details
Information Builders

Information Builders

View
Seek positions of Big Data Developer & Data Engineer | AWS Certification Holder | Hadoop | Spark
Celeste S Work Experience Details
  • Information Builders
    Data Engineer
    Information Builders Jun 2020 - Present
    Fort Lauderdale, Florida, Us
    • Migrate data and applications to AWS platform implementing such services as RDS, EMR, Kinesis, Glue, Redshift, Data Pipeline, S3, etc. for cloud computation, storage and transfer.• Data Ingestion and ETL in AWS Glue from datastore such as RDS and S3 and third-party industrial data from IRI and Numerator, modified ETL script in PySpark or Scala, triggering the process using Lambda function.• Crawled and collected real-time data from multiple sources, for example, app data from mobile devices, website user data and third-party/partner (Adobe Analytics) web analytics data, then pushed into AWS Kinesis Stream.• Analyzed data from Kinesis Stream in Kinesis Analytics using SQL, some would be directly passed in EMR Hadoop and Spark cluster for further processing.• Implemented Kinesis Firehose to transfer (e.g.: to ORC and Parquet) and sink data into RedShift and AWS RDS in S3.• Conducted programming using different Spark APIs such as Scala, RDD, PySpark and SQL to transform data for further processing on generating predictive analyses.• Collaborated with Application Development team to build and develop customized dashboard connecting with the data from S3, Kinesis, DynamoDB and data warehouses such as Redshift.
  • University System Of Maryland
    Data Engineer
    University System Of Maryland Jan 2020 - May 2020
    Baltimore, Maryland, Us
    • Built and configured Hadoop cluster on university cluster with such modules or tools as HDFS, MapReduce, YARN, Zookeeper, Sqoop, Hive, Pig, Spark, Kafka, etc.• Gathered, imported and connected the clients’ data from multiple sources such as web, Google Analytics accounts Snowflake and RMDBS using tools such as Sqoop and Kafka with source and sink connectors provided by Confluent.• Based on the hypotheses planned to test, designed data mining and processing in Hive, Pig, Spark (e.g.: PySpark and SQL APIs) to conduct ETL procedure, as a preparation for further statistical and modeling analyses.• Tracked real-time data via Google Analytics from accounts of multiple clients, then pushed the data into Kafka cluster with Confluent REST Proxy.• Implemented Spark Streaming to filter out data and process using PySpark, SQL API and Scala API, connected with Tableau using Spark SQL database generating an integrated dashboard to visualize the data, guide marketing and operational decisions.• Conducted A/B Test and Hypothesis Test, trained and tuned predictive models (e.g. stepwise regressions, random forest, boosting, XGBoost), summarized and demonstrated findings using Python (pandas, numpy, scikit-learn, matplotlib, seaborn) and R (tidyr, glmnet, randomforest, xgboost).• Used the data from real-time dashboard and Google Analytics to guide Google Ads campaign on such as biding strategy, budget strategy and conversion funnel to create value for clients.
  • Roofstock
    Data Science Engineer
    Roofstock Mar 2019 - Dec 2019
    Oakland, California, Us
    • Utilized Amazon EMR service to create and manage fully configured, elastic clusters of EC2 instances running Hadoop and other applications (e.g. such as Hive, Pig, Spark) in the Hadoop ecosystem with EMRFS connector allowing Hadoop to use S3 as storage layer.• Imported structured data from Snowflake into S3 then processed the data using Spark (Python, Scala, RDD and SQL APIs) and HiveQL in EC2 instances.• Developed data pipeline using AWS Data Pipeline to transfer and process data across Redshift, S3, and EMR.• Merged and cleaned several datasets in PySpark to get targeting variable “time-to-sell” and do Feature Engineering, as a preparation for descriptive and predictive analyses. • Considered and developed proper assumptions on the concept of “reasonable offer” and selected key features (e.g. year_built, bedroom_bathroom_ratio, cap_rate) to do EDA and visualized in Tableau connecting with Amazon RedShift database.• Built models for both numerical (duration of a good offer will come up) and categorical problems (whether a good offer will occur within 7 days) using machine learning methods such as linear regression, logistic regression, LASSO, bagging, random forest, gradient-boosted trees, etc. in Spark ML. • Trained and tuned the models, finally both the best numerical (RMSE: 28) and categorical models (Accuracy: 84%) could beat the baseline (RMSE: 43.67, Accuracy: 53%).
  • Google App Store (Gas)
    Data Analyst
    Google App Store (Gas) Aug 2018 - Feb 2019
    • Crawled data using beautiful soup from GAS to Python environment.• Data cleansing, data processing, data imputation, data transformation and reshaping, and data visualization (EDA) in Python using such packages as numpy, pandas, matplotlib, seaborn, re, nltk, etc. • Implemented statistical methods for example, correlation analysis to discuss relationships among variables, and one-way ANOVA to verify whether there are any statistically significant differences between aggregated values (e.g. mean) of different app categories. • Applied pivot table, merge and group by methods to aggregate variables in order to dig out valuable and meaningful findings. • Performed Text Mining on consumer reviews to explore what are the targeting consumer complaint or appreciate most, and sentiment analyses by creating Word Cloud and NLP. • Built and tuned models for the variables of “rating” and “pricing” using machine learning methods including linear regressions, forward selection, stepwise selection, PCA, decision trees, random forest, etc.
  • Bluefocus
    Data Analyst
    Bluefocus Aug 2015 - Jul 2018
    Chaoyang District, Beijing, Cn
    C'estbon (Yibao), a beverage brand affiliated to CRC, had the second largest market share of mineral water in China, with an annual turnover of over 12.6 billion RMB. The project aims to promote its new sports drink product, Mulene (Moli), introduced in 2017, in China using digital marketing and data Analysis technologies. By scanning the QR code on the bottle, consumers can interact with the brand in various forms, such as lottery, games, contests and so on. The project not only promotes its products stimulating consumptions, but also collects meaningful consumer data, which can analyze consumer behaviors and guide future advertising strategies and other marketing mix.• Monitored activity operations, observed real-time data, delivered issues and feedbacks to the software development team to resolve technical issues.• Collected KPI data such as Average Screens Per Visit, Daily Active Users (DAU), Social Shares, New User Rate, User Growth Rate, etc., and imported them in R Studio, Power BI and Excel, and delivered report to the client.• Cleaned and processed data in R environment, conducted EDA by grouping, merging, transforming, and visualizing the data.• In order to deliver the data analysis in a more understandable way, some analyses would be in Excel using pivot table, VLOOKUP functions, etc. • Demonstrated and reported findings for tech/non-tech persons using Power BI and PowerPoint.

Celeste S Skills

Functional Programming R Aws Glue Git Amazon Elastic Mapreduce Amazon Web Services Linux Apache Pig Sqoop Load Jira Scala Apache Kafka Aws Data Pipeline Advertising Tableau Time Series Analysis Amazon Ec2 Hadoop Nosql Amazon Redshift Online Transaction Processing Java Apache Spark Pyspark Marketing Batch Processing Hive Aws Lambda A/b Testing Extract Sql Python Amazon Relational Database Service Microsoft Power Bi Olap Elasticsearch Machine Learning Stream Processing Sparksql Transform Amazon S3 Data Mining Mapreduce Amazon Kinesis Bash Github

Frequently Asked Questions about Celeste S

What company does Celeste S work for?

Celeste S works for Information Builders

What is Celeste S's role at the current company?

Celeste S's current role is Seek positions of Big Data Developer & Data Engineer | AWS Certification Holder | Hadoop | Spark.

What skills is Celeste S known for?

Celeste S has skills like Functional Programming, R, Aws Glue, Git, Amazon Elastic Mapreduce, Amazon Web Services, Linux, Apache Pig, Sqoop, Load, Jira, Scala.

Who are Celeste S's colleagues?

Celeste S's colleagues are Ivan Rubio Cuesta, Michael Laimann, Goutham Reddy, Sweta Jain, Greg Weinheimer, Pedro Cano, Bob Bean.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.