Software Engineer (Backend, Ai)
Current- Working on developing an image engine, triton deployment, and optimization of models
- Fine-tuning of 12B parameter model via the Lora technique. Improved inference time (latency) of the model by 11 times via Post-training quantization and quantization-aware training. Fusion of multiple Loras to improve.