Machine Learning Engineer
Current- Optimized open-source models to achieve 10 times the number of inferences per minute using the same computational infrastructure.- Reduced VRAM usage of open-source models for production by 40% using quantization and other techniques.- Managed computational infrastructure (GPU) both in the Cloud (AWS, GCP) and on local servers, including GPU.