Aisha Khatun

Aisha Khatun Email and Phone Number

SDE @ Amazon | Masters (Thesis) @ University of Waterloo | AI, NLP researcher | Data Scientist @ Amazon Web Services (AWS)
seattle, washington, united states
Aisha Khatun's Location
Toronto, Ontario, Canada, Canada
Aisha Khatun's Contact Details

Aisha Khatun work email

Aisha Khatun personal email

n/a
About Aisha Khatun

Hi! I am Aisha Khatun. I work with AI, ML, and everything Data! I have industry experience building end-to-end Machine Learning pipelines, Model Monitoring, and Data Analytics. As a graduate NLP researcher, I analyzed the capabilities and limitations of Large Language Models (LLM) in answering questions about sensitive topics and instruction following abilities. I am passionate about AI and have experience solving complex problems by applying ML techniques and extracting valuable information through data analysis. Let's connect and discuss exciting opportunities in the field of AI and computer science!Skills: NLP, Generative models, Research and applied ML, Data Science and AnalyticsLanguages: Python, Java, Scala, Javascript, SQL, SPARQLTools: PyTorch, Keras, Fastai, Tensorflow 2, Spark, Hadoop, Airflow, Plotly, Tableau, Power BISocials:- twitter.com/a2khatun- tanny411.github.io- github.com/tanny411

Aisha Khatun's Current Company Details
Amazon Web Services (AWS)

Amazon Web Services (Aws)

View
SDE @ Amazon | Masters (Thesis) @ University of Waterloo | AI, NLP researcher | Data Scientist
seattle, washington, united states
Employees:
72973
Aisha Khatun Work Experience Details
  • Amazon Web Services (Aws)
    Software Development Engineer
    Amazon Web Services (Aws) Dec 2024 - Present
    Toronto, Ontario, Canada
    Distributed Systems @ AWS
  • University Of Waterloo
    Graduate Researcher
    University Of Waterloo Sep 2022 - Aug 2024
    Waterloo, Ontario, Canada
    Graduate Research Student with Professor Dan Brown at the University of Waterloo. My work encompassed analyzing the capabilities and limitations of LLMs (Large Language Models), especially open-source models, in instruction following and answering questions about sensitive topics. Traditional Natural Language Processing (NLP) benchmarks often overlook nuances in LLM behavior and reliability. My thesis addresses this gap by curating a dataset across six categories: Fact, Conspiracy, Controversy, Misconception, Stereotype, and Fiction. We rigorously define LLMs' factual accuracy, consistency, and robustness to prompt variations using diverse response formats and question variations, and evaluate these on 37 models. Our findings reveal LLMs' volatility and unreliability, particularly in the Controversy and Misconception categories, where conflicting training data impedes performance. Additionally, we explore LLMs' ability to generate coherent fictional narratives, probing their ability to retain and effectively utilize factual information, a critical requirement for creative tasks like story generation. While LLMs offer versatile applications, their reliability hinges on addressing challenges in prompt understanding and response consistency, Thesis: * https://uwspace.uwaterloo.ca/items/e01e11a6-e033-4f6a-85c6-849fba74e039Dataset:* https://borealisdata.ca/dataset.xhtml?persistentId=doi:10.5683/SP3/5MZWBV GitHub:* https://github.com/tanny411/llm-reliability-and-consistency-evaluation Media Coverage:* https://ediscoverytoday.com/2024/08/30/uncovering-the-reliability-and-consistency-of-ai-language-models-artificial-intelligence-trends/Publications:* Reliability Check: An Analysis of GPT-3’s Response to Sensitive Topics and Prompt Wording (https://aclanthology.org/2023.trustnlp-1.8/)* A Study on Large Language Models' Limitations in Multiple-Choice Question Answering (https://arxiv.org/abs/2401.07955)
  • University Of Waterloo
    Teaching Assistant
    University Of Waterloo Sep 2022 - Aug 2024
    Waterloo, Ontario, Canada
    - Teaching Assistant CS135 Designing Functional Programs - student.cs.uwaterloo.ca/~cs135- Teaching Assistant CS240 (twice) Data Structures and Data Management - student.cs.uwaterloo.ca/~cs240/w23- Instructional Apprentice CS105 Introduction to Computer Programming 1 - https://student.cs.uwaterloo.ca/~cs105- Teaching Assistant CS230 Introduction to Computers and Computer Systems - https://student.cs.uwaterloo.ca/~cs230/s24/index.shtml
  • Wikimedia Foundation
    Research Data Scientist (Nlp)
    Wikimedia Foundation Feb 2023 - Jun 2024
    - Worked on improving the Wikipedia link recommendation system in all 300+ Wikipedia languages by creating a small suite of language-agnostic models to handle all languages with high precision. This helps address the deployment and testing bottlenecks due to large number of models, as well as will improve link recommendation in small wikis.* https://meta.wikimedia.org/wiki/Research:Improving_multilingual_support_for_link_recommendation_model_for_add-a-link_task- Worked with the Research Team to develop Copyediting as a structured task. To increase and maintain the standard of Wikipedia articles, it is important to ensure articles don't have typos, spelling, or grammatical errors. While there are ongoing efforts to automatically detect "commonly misspelled" words in English Wikipedia, most other languages are left behind. I built a pipeline to automatically curate and detect commonly misspelled words in 100+ languages in Wikipedia using the entirely of Wiktionary and Wikipedia.* https://meta.wikimedia.org/wiki/Research:Copyediting_as_a_structured_task
  • Wikimedia Foundation
    Data Analyst
    Wikimedia Foundation Apr 2021 - Aug 2022
    Remote
    I worked on analyzing Wikidata Query Service (https://query.wikidata.org/) queries and Wikidata dumps as a contract data analyst to help figure out ways to scale the service. My analysis included- Understanding Wikidata's structure, what it consists of, and how diverse it is- Finding subgraphs within Wikidata, automating the subgraph detection workflow, and generating various subgraph metrics- Finding ways to identify SPARQL queries that access certain subgraphs and generate subgraph query metrics- Productionizing the analysis work on Wikidata subgraphs metrics and subgraph query metricsSee my work here: https://wikitech.wikimedia.org/w/index.php?title=User:AKhatun
  • Wikimedia Foundation
    Data Science Intern (Outreachy)
    Wikimedia Foundation Dec 2020 - Mar 2021
    Worked on the Abstract Wikipedia Data Science project to find important modules and show modules similar to each other in order to merge or refactor modules towards a language-independent Wikipedia.- Extensively used SQL and MediawikiAPI to collect, group, and analyze Lua modules across all 300+ language Wikipedias- Used various Unsupervised Machine Learning algorithms to cluster the collected modules, identify similar modules, and isolate unique modules.- Created a tool using Vue.JS to display the similar and unique modules along with similarity scores and several filtering mechanisms.Web-interface: abstract-wiki-ds.toolforge.orgSee our work in GitHub @ github.com/wikimedia/abstract-wikipedia-data-scienceand Phabricator @ phabricator.wikimedia.org/T263678
  • Therap Services
    Machine Learning Engineer
    Therap Services Mar 2020 - Sep 2020
    Dhaka, Bangladesh
    - Used computer vision for accurate face detection in images and video footage for in-office use.- Researched and applied smaller yet accurate pre-trained Computer Vision models suitable for IoT devices.- Leveraged OCR to extract measurements from pulse oximeters images for swift COVID-19 detection.- Implemented an ML-based fall detection system using inertial sensor readings from smartwatches.
  • Shahjalal University Of Science And Technology
    Research Assistant
    Shahjalal University Of Science And Technology Nov 2018 - Jun 2020
    Sylhet, Bangladesh
    Projects:* Language Agnostic Source Code Authorship Attribution* Character Level Authorship Attribution in Bengali Literature* Transfer-Learning-based approach to Attribution in Bengali Literature. * Collected long-text Authorship Attribution Dataset. * Collected multiple large Bengali corpus. * Pre-trained and fine-tuned ULMFiT and mBERT from scratch. * Assessed the effectiveness of pre-training datasets and tokenizations on downstream Authorship Attribution task.Publications:* Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT (https://dl.acm.org/doi/abs/10.1145/3530691)* Authorship Attribution in Bangla literature using Character-level CNN (https://ieeexplore.ieee.org/abstract/document/9038560)* A Subword Level Language Model for Bangla Language (https://link.springer.com/chapter/10.1007/978-981-15-3607-6_31)

Aisha Khatun Skills

Python Mysql Keras Java Machine Learning Competitive Programming Deep Learning Javascript C++ Php C Cascading Style Sheets Pytorch Html Sql

Aisha Khatun Education Details

Frequently Asked Questions about Aisha Khatun

What company does Aisha Khatun work for?

Aisha Khatun works for Amazon Web Services (Aws)

What is Aisha Khatun's role at the current company?

Aisha Khatun's current role is SDE @ Amazon | Masters (Thesis) @ University of Waterloo | AI, NLP researcher | Data Scientist.

What is Aisha Khatun's email address?

Aisha Khatun's email address is ai****@****rloo.ca

What schools did Aisha Khatun attend?

Aisha Khatun attended University Of Waterloo, Shahjalal University Of Science And Technology, Udacity.

What skills is Aisha Khatun known for?

Aisha Khatun has skills like Python, Mysql, Keras, Java, Machine Learning, Competitive Programming, Deep Learning, Javascript, C++, Php, C, Cascading Style Sheets.

Not the Aisha Khatun you were looking for?

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.