Yuan Huang Email and Phone Number
10+ years of experience in leading and developing data pipeline, integrated data and software systems, and data science solutions. Solid hands-on experience in statistical analysis, machine learning and deep learning algorithms combined with strong programming, data engineering, Devops and MLOps skills as a certified AWS Machine Learning Specialty, Airflow DAG developer, Kubernetes Administrator and Jenkins developer. Excellent records of stakeholder management in building strong relationships with senior management and cross-functional teams and translating business processes to data science projects
Sion Power Corporation
View- Website:
- sionpower.com
- Employees:
- 59
-
Machine Learning ArchitectSion Power Corporation Apr 2024 - PresentTucson, Arizona, United States -
Analytics Lead | Business Insights & Analytics (Bia), Zoetis Technology & DataZoetis May 2022 - Dec 2023Boston, Massachusetts, United States● Led R&D machine learning, AI, and data analytics projects with a team of 7 data scientists and software developers to deliver high quality data science products and solutions, using agile project management methodology● Translated business and scientific questions to machine learning, AI and data analytics projects, evaluated the feasibility and KPIs/metrics, and designed technical roadmaps of the projects by utilizing knowledge and experiences in NLP/LLM, time series, computer vision, supervised and unsupervised learning, and deep learning, as well as data engineering, software engineering, DevOps and MLOps.● Orchestrated the entire product lifecycle from ideation, feasibility exploration, Proof Of Concept (POC), launch, release, and adoption with technical standards in Python coding, unit tests, Github version control, and CI/CD for 8 projects within a span of 1.5 yearsHighlighted achievements:○ Developed production-level computer vision/segmentation analysis deep learning models (Pytorch/Pytorch-lightning, OpenCV and UNet for data processing and modeling, OOD for codebase design, Numpy, Pytest, github actions workflow for CI/CD, ML model package development, and FastAPI) to identify lesion regions on animal tissue images, reducing over 200 FTE hours annually, and resulting in more consistent evaluation of vaccine efficiency○ Completed and transferred a Python Scikit-learn predictive pipeline (starting from data cleaning/data dictionary definition to model deployment) for dairy cow diseases using Scikit-learn (Pandas, Numpy, Seanborn, Matplotlib, Random Forest, XGBoost, and Pipeline) to the Precision Animal Health (PAH) department○ Launched patient query, analysis, and visualization R Shiny web app/dashboard on Posit using Pyspark for big data query that accelerated the clinical recruitment process by at least one month -
Research Fellow, Comp Chem Software Lead | Modeling And InformaticsVertex Pharmaceuticals Jul 2021 - May 2022Boston, Massachusetts, United States● Designed and developed MolProperty serverless AWS cloud computing system (AWS CDK, API Gateway, Lambda, Data API, and Aurora, Docker, OpenEye, Jchem, Pytest) for high performance molecular property calculation, storage, and query● Developed Python client package for computational chemists to query MolProperty computing system (Asyncio, Docopt, Pandas, Boto3 and Python package development)● Led cross-functional AWS modeling service project (including an external MLOps team) that includes CI/CD and model deployment for small molecule temporal predictive model training and inference pipelines. (JIRA, MLflow, Pytest, Docker, Python package development, Pandas, Scikit-learn, Random Forest, AWS lambda, Aurora, API gateway, CDK, Codebuild, Codepipeline, ECR, ECS, Sagemaker SDK, Cloudwatch)● Led DCS package Proof-of-Concept project by collaborating with the Software Engineering team● Hosted code reviews on design pattern, Python technology, and best coding practices -
Principal Scientist/Senior Manager, Data EngineeringBristol Myers Squibb Jan 2019 - Jul 2021Greater Boston● Led the development of an antisense oligonucleotides bioinformatics calculation tool with computational biology and medicinal chemistry groups, reducing overnight processing to just 15 minutes utilizing R and bioconductor packages● Created and deployed data ingestion tools on AWS to query, download, transform, and store public RNA-Seq data from NCBI GEO data repository (Docker, Docker cli, shell script, Pandas, AWS EC2, ECR, ECS and S3).● Built and launched a data lake system as the central data catalog that integrated and managed a variety of data repositories (DynamoDB, Aurora MySQL, Redshift, S3 for Athena, data modeling for sql/non-sql and data warehouse). Developed the ETL data pipeline (Apache Airflow, AWS Step Functions, Lambda, API Gateway, GLUE, Sagemaker/PySpark, SAM, and Data Migration Service)● Designed and implemented front-end and back-end architectures of the Target Profiler web application that integrates genomic and omics data for data utilization and visualization to harmonize genomic data for 6 immunology diseases (AWS API gateway, Lambda, Athena, Aurora, data API, JavaScript/D3/crossfilter, HTML/CSS/bootstrap/Sass) ● Managed IBD clinical data from Crohn’s and Colitis Foundation and provided data mining support of EMR data● Led the development of a web application for fast data download of OpenTargets data by customized Presto Queries (AWS API gateway, Lambda, Athena, and R Shiny app). Initiated and led the POC test with Varada to further optimize query performance on big data -
Staff Engineer, DataTivo Aug 2018 - Jan 2019Greater Boston AreaData Science Group, Analytics and Advertising Engineering, Data Analytics R&D, Tivo• Performed large volume queries, data processing and wrangling for viewership data on AWS s3 using big data techniques (Athena, Presto query, Pymysql, Quoble, Pyspark and Pandas). (see my github for code demo)• Developed python scripts and Jupyter notebooks for time based and program based viewership data quality checking using average audience (AA), rating and data visualization (presto query, pandas and matplotlib). The scripts automatically transfer and store the generated plots and dataframes to AWS s3 -
Data Scientist/Scientist | BioinformaticsAndover Innovative Medicines Institute (Aim), Eisai Apr 2012 - Aug 2018Andover, Ma 01810Scientist, bioinformatics/Data Scientist | Human Biology and Data Science Engine • Performed regression and classification analysis (linear regression, random forest and XGBoost), clustering analysis (hierarchical clustering and principal component analysis), association study, and hypothesis tests for RNA sequencing data using R (ggplot2, dplyr) and Python scientific packages (pandas, scipy.stats, statsmodels, seaborn and matplotlib)• Performed cleaning, alignment, and transformation of RNA-Seq data on DNAnexus cloud computing platform (Linux Shell, DNAnexus toolkit for cloud computing).• Implemented interactive R notebook (tidyr, ggplot2, dplyr, plotly, DT) and Shiny apps for data summary and visualization• Modified and maintained ETL data pipelines and database for high throughput DMPK data loading and storage (pipeline pilot, Oracle database, toad and SQL).• Created and built a Quality by Design (QbD) framework and software for fast liquid chromatography method development based on Design Of Experiment (DOE) and computation simulation, and published the work as a journal paper• Released VBA applications to DMPK department for the automatic processing and analysis of high-throughput microsomal stability screening assay that reduced processing time by more than 10 times• Created and executed scripts (pipeline pilot and SQL) for extracting, transferring, updating and integrating pharmacokinetics data from Japan database (PostgreSQL) to U.S. site (D360 and Oracle), which reduced data delay time from months to 24 h .• Designed and developed a cost calculation system to calculate and visualize the costs of synthetic routes (Java, MySQL, and Javascript) for the process chemistry group (Bio-IT World Conference 2015)• Served as a core technical team member of Allotrope Foundation; led the Proof-of-Concept team at Eisai for the installation and testing of Allotrope Proof-of-Concept software applications and Allotrope Data Format (ADF) converters -
Research AssociateUniversity Of Minnesota Nov 2008 - Mar 2012Greater Minneapolis-St. Paul Area• Optimized 2D-LC instrument operational conditions using computer simulations including Monte Carlo simulation• Developed a MATLAB program for processing and visualizing on-line 2D-LC experimental data
Yuan Huang Education Details
-
Analytical Chemistry -
Computer Science -
Environmental Science -
Analytical Chemistry (Concentrate On Chemometrics)
Frequently Asked Questions about Yuan Huang
What company does Yuan Huang work for?
Yuan Huang works for Sion Power Corporation
What is Yuan Huang's role at the current company?
Yuan Huang's current role is Machine Learning Architect | Analytics Lead | Software Lead | Principal Scientist, Data Engineer | Data Scientist.
What schools did Yuan Huang attend?
Yuan Huang attended University Of Arizona, The University Of Texas At El Paso, Chinese Academy Of Sciences, Nanjing University Of Technology.
Who are Yuan Huang's colleagues?
Yuan Huang's colleagues are Lawrence Weinstein, Hector Mendoza, Leonor Vargas, Max Jimenez, Fairy Girl, Apolo J.r., Greg Lowe.
Not the Yuan Huang you were looking for?
-
4phila.gov, mckinsey.com, mckinsey.com, generationcitizen.org
4 +121556XXXXX
-
Yuan Huang
New York, Ny3fb.com, facebook.com, gmail.com1 (855) 6XXXXXXX
-
Yuan Huang
New York City Metropolitan Area4gmail.com, goldmansachs.com, gs.com, blackstone.com
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial