Data Science Intern
Current- Conduct exploratory data analysis (EDA) on a database of over 1 million data points, using SQL queries to perform data cleaning and identify valuable metrics used to inform state decision-making on development for over.
- Maintain a data pipeline using natural language processing (NLP) techniques to process and aggregate seismic sensor data, integrating it with a custom GUI to offer streamlined, user-friendly visualization capabilities.
- Design and publish Tableau dashboards showcasing key data insights gleaned from hundreds of government data sources, driving team efficiency in project prioritization by clearly communicating performance indicators
- Developed an AI application with Python, combining LangChain PDF parsing with a locally fine-tuned large language model (LLM) to automatically analyze reports and compile key measurements, reducing team workload by.
- Utilized SciKit-Learn on a dataset of 100,000+ entries to train a robust linear regression model for use in predicting total project income, leveraging ElasticNet regularization and cross-validation to achieve 95%.