Ai Engineer
Current- Prepared and designed test cases in the Saudi dialect to evaluate Speech-to-Text models.
- Created test cases in the Saudi dialect to evaluate Large Language Models (LLMs) such as Jais, Llama 3.1, Qwen 2.5, Mistral, and Mixtral.
- Crafting specialized prompts to enhance LLM performance, then evaluating and comparing model output (such as Jais, Llama 3.1, Qwen 2.5, Mistral, and Mixtral.).
- Assessing the accuracy of model outputs using tools such as BGE-M3, Paraphrase, and Ragas.
- Developed a General Evaluation framework to evaluate LLM outputs based on a given custom criteria using the LLaMA 3.1 70B model and Prompt Engineering techniques like Chain of Thought Prompting.
- Created prompts to strengthen LLMs against attacks and guide model responses for secure outputs.