Software Engineer (Machine Learning, Ai Platform)
- Spearheaded orchestration and automation of AI agent training (with each agent an ensemble of PyTorch models acting as a mixture of experts) into an MLOps pipeline backed by a self-hosted in-cluster duo of Prefect.
- Rapidly prototyped a working MVP showcasing how we could easily scale the training runs via the Prefect-Ray integration and an in-cluster or Anyscale Cluster, also presenting SkyPilot as a way to abstract Ray and cloud.
- Modernized the developer experience for the AI Platform team by bringing in Tilt to watch for changes in the Kubernetes manifests for full Docker build/pushes, thereafter updating pods without reload for fast.