Data Scientist
Current- Identified new data sources for model improvement, and created an ETL pipeline in Apache Airflow. These pipelines downloaded, cleaned, transformed, and loaded 20+ million data points into an AWS RDS database every month.
- Used new data sources to create XG Boost models that provided over a 10% improvement relative to baseline claim prediction models.
- Developed automated report creation for customer POCs. The reports analyzed over 1,000,000 data points, drew relevant charts, calculated summary statistics, and provided key metrics for customers to use our product data.
- Developed Quality Control processes to QC and verify schemas for 40+ GB Geospatial data in under a second. This reduced QC time by over 40%.
- Maintained a $4 million revenue Casualty Insurance API with Python, Jenkins, SQL and a Postgres database. Improved API, performance by 20% via bug fixes in Python, Docker images, and AWS Sagemaker endpoints.
- Developed an automated data testing framework using AWS Sagemaker. The framework would process new data points then automatically make machine learning models to see if the data points improved performance.