Senior Machine Learning Engineer
Current- Created sophisticated data pipelines using Databricks and PySpark as the engineer responsible for dataset preparation in a major LLM training contract. The resulting models would be the main ones used by Chegg for.
- Pioneered research into multimodal question answering for diagrams, creating experimental vision LLMs with HuggingFace and PyTorch toolchains.
- Developed a powerful LLM testing harness for experimenting with techniques to improve llm reasoning and created a toolset for text and structured data processing that became the standard used across teams.
- Ensured quality by: designing and reviewing informative and reliable data annotation tasks; performing data analyses ad hoc with scientific python; and contributing to resilient container orchestration.
- Engineered features for diverse machine learning toolsets large language models, from autogluon to advanced mathematical tools of SciPy.
- Promoted agile methodology in research featuring JIRA for documentation and planning, and CI/CD tools for automated testing and code-reviews.