As a Data Scientist with over four years of industry experience, I specialize in developing advanced machine learning and natural language processing (NLP) solutions to drive data-driven insights and solve complex problems. My expertise spans across cutting-edge technologies such as Large Language Models (LLMs), computer vision, and explainable AI, all supported by a strong foundation in statistical analysis, hypothesis testing, and data engineering.I am a detail-oriented problem solver with a strong analytical mindset that helps me identify patterns and trends in data, ensuring high-quality analysis that leads to better and more profitable data-driven business decisions.I have a proven track record of designing and deploying end-to-end data pipelines, building explainable AI models, and developing innovative solutions. Whether it’s creating scalable pipelines, developing advanced ML and NLP models, or leading a team on high-impact business problems, I focus on delivering solutions that optimize performance and streamline workflows.I am adept in prioritizing tasks, meeting deadlines, and delivering results in fast-paced environments.I am passionate about leveraging my technical skills to uncover meaningful insights from complex datasets, automate processes, and drive innovative solutions across domains. I am seeking opportunities to apply my expertise in data science, machine learning, and AI to solve real-world challenges, collaborate with dynamic teams, and push the boundaries of what data-driven technologies can achieve.
-
Head Teaching AssistantData Science At Vanderbilt University Aug 2024 - Dec 2024Nashville, Tennessee, United States -
Data Science InternAlliancebernstein Jan 2024 - Dec 2024Nashville, Tennessee, United States- Led a team of 3 using Agile practices to engineer a chatbot, incorporating advanced question-answering methods and mitigating hallucinations in LLMs through external knowledge grounding using Python, ChromaDB, DSPy, and few-shot learning.- Designed and developed a self-correction pipeline leveraging fine-tuned LLMs and DSPy for structured and unstructured query generation (Text-to-SQL) achieving 90% execution accuracy and 48% query accuracy, enabling query parsing for hybrid datasets.- Developed a sentiment analysis pipeline using Python, FinBERT, and PyTorch to process and analyze 200K articles, predicting stock price movements, enhancing decision-making and generating investment signals.- Formulated data labeling strategies using semi-supervised learning techniques and OpenAI, ensuring high-quality ground truth training data, improving FinBERT and GNN’s prediction accuracy by 20 percentage points from 65% to 85%.- Fine-tuned single-task and multi-task BERT models, Llama model to predict financial events discussed in articles and determine causality and sentiment on stock prices, achieving an accuracy of 85% and generating a portfolio with a 10% annual return.Keywords: Python, PyTorch, BERT Models, Single-task and Multi-task BERT, Generative AI, Large Language Models (LLM), Retrieval-augmented Generation, OpenAI, AI Chatbot, Chatbot Development, Semantic Parsing, SQL Query Generation, Unstructured Query Language, Free-text Operations, Sentiment Analysis, Stock Price Prediction, Causality Detection, Investment Signals, Portfolio Management, Insights Extraction, Financial Data Parsing, Data Integration, Knowledge Grounding, LLM Hallucination Prevention, Semi-supervised Learning, Data Labeling Strategy, AI for Finance -
Research InternInstitute For Software Integrated Systems, Vanderbilt University May 2024 - Jul 2024Nashville, Tennessee, United States- Developed a docker-containerized application to host cyber-security competitions using TypeScript, React, Express, and SQL.- Designed an optimized database to improve data management and querying, enabling a seamless shift from MongoDB to SQL, resulting in a 30% increase in query efficiency and a 25% reduction in data retrieval times.- Employed regular expressions to prepare training data for identifying personal identifiable information (PII), and implemented supervised learning models like Random Forest, spaCy, and BERT, achieving 90% accuracy in automated PII detection and masking, making data readily available for training cyber-security models.- Explored the employability of large language models in the prevention of cyber-physical attacks, identifying potential enhancements in threat mitigation accuracy by 15%.Keywords: Docker, spaCy, BERT, TypeScript, React, Express, SQL, MongoDB to SQL Migration, Database Optimization, Query Efficiency, Data Management, Data Retrieval, Regular Expressions, Large Language Models, Machine Learning, Supervised Learning, Random Forest, Data Masking -
Data Science InternNissan Motor Corporation Jan 2024 - Apr 2024Nashville, Tennessee, United States- Conceptualized a comprehensive project workflow for the development of a chatbot, while working in a team of 6 to transform and analyze Likert-scaled survey data from over 60K respondents, improving data accessibility and hypothesis testing efficiency for Nissan’s customer research group.- Implemented t-SNE and XGBoost for customer segmentation and feature selection to generate customer personas, improving the relevance of insights, and thus answering capability of the chatbot by 25%.- Spearheaded the engineering of a retrieval augmented generation-based chatbot from customer personas utilizing ChromaDB, Python, and generative AI models to enhance conversational dynamics and insights extraction.Keywords: Python, t-SNE, XGBoost, ChromaDB, PyTorch, Machine Learning, Natural Language Processing, Generative AI, Retrieval Augmented Generation (RAG), Data Transformation, Data Analysis, Customer Segmentation, Feature Selection, Data Accessibility, Conversational AI Chatbot Development, Project Workflow Design, Team Collaboration -
Student Data AnalystNashville Soccer Club Jan 2024 - Apr 2024- Collaborated with a cross-functional team to analyze fan engagement and ticket sales dynamics using data-driven insights.- Designed an evaluation framework to assess multiple ticketing factors, including attendance, resale, and purchase behavior metrics, driving strategic decisions for revenue optimization.- Conducted advanced statistical analyses to determine correlations between fan engagement scores and match-day attendance, supporting marketing and engagement strategies.- Leveraged SQL and Python for data extraction, transformation, and analysis, streamlining reporting processes and enhancing data accessibility.- Delivered actionable insights to guide marketing and operations teams.This experience is part of my master’s degree coursework.Keywords: SQL, Python, Statistical Analysis, Data Analysis, Reporting Automation, Sports Analytics, Fan Engagement Strategies, Ticket Sales Dynamics, Revenue Optimization -
Research AssistantDepartment Of Special Education, Vanderbilt University, Peabody College Aug 2023 - Apr 2024Nashville, Tennessee, United States- Designed a sophisticated natural language processing pipeline in Python using Transformer models (BLEURT, MPNet), PyTorch, and SQL, computerizing student assessment evaluation, achieving 89% accuracy and reducing manual efforts by 80%.- Implemented an end-to-end automated data analysis pipeline using Python and SQL, performing statistical tests and generating visualizations to create post-deployment reports for productionized iTELL volumes, reducing report generation time by 99%.- Researched and curated a dataset for training the automated scoring system on short answer responses, leading to a 30% improvement in scoring consistency and reliability.- Designed a RAG-based chatbot application in Supabase, TypeScript, and React to enhance student experience by providing real-time access to supplementary subject material and text references.- Developed a video analytics tool in TypeScript to analyze user activity on videos embedded in iTELL to identify strong and weak areas of students, leading to a 20% increase in targeted question effectiveness and ensuring comprehensive learning.Keywords: Python, SQL, PyTorch, Transformer Models, BLEURT, MPNet, Data Analysis Pipeline, Statistical Analysis, Dataset Curation, Data Visualization, Reporting Automation, Post-deployment Reporting, React, TypeScript, Chatbot Development, User Activity Analysis, Video Analytics, Targeted Learning, Learning Analytics, AI and NLP in Education -
Data Scientist IiDeloitte Jun 2023 - Jul 2023New Delhi, Delhi, India- Developed an advanced NLP-based clustering algorithm using Transformers, spaCy, and HDBScan, automating document review processes, reducing manual efforts by 70%, and speeding up the document collection process by 75%.Keywords: Transformers, spaCy, HDBScan, Clustering Algorithm, Machine Learning, Data Science, Text Processing, Unsupervised Learning, Algorithm Development, Advanced NLP Models, Text Analytics, Document Automation, Document Review Automation, Document Clustering, Workflow Optimization, AI-Powered Solutions -
Data Scientist IDeloitte Aug 2020 - May 2023New Delhi, Delhi, India- Architected and developed a Named-Entity-Recognition (NER)-based recommender engine utilizing novel algorithms for acronym expansion, co-reference resolution (CR), and relation extraction in Python using spaCy, PyTorch, and BERT models, achieving a processing time of under 1 second for 5,000 tokens. - Finetuned BERT models to achieve a 95% F1-Score on NER for six audit-related entities. - Developed a 10x faster and 3x more accurate CR algorithm compared to AllenNLP’s baseline CR algorithm.- Designed and engineered a Python-based web application to orchestrate the development of the recommender system, train models, showcase data and model metrics, and provide a platform for preparing training data, reducing model development time by 40%.Keywords: Python, spaCy, PyTorch, AllenNLP, BERT Models, Transformer Models, Recommendation Engine, Named Entity Recognition (NER), Acronym Expansion, Co-reference Resolution, Relation Extraction, Algorithm Optimization, Model Training and Finetuning, Data Preparation, Python Web Application Development, Machine Learning, Model Orchestration, Workflow Automation, AI-driven Solutions, Natural Language Processing -
Data Engineer InternSenvion India Jan 2020 - Jun 2020Bengaluru, Karnataka, India- Crafted an end-to-end automated reporting pipeline using Cassandra, Python, and Apache Airflow to generate periodic performance reports to facilitate effective asset management. The pipeline processed data from the database and created visualizations, and generated Word documents, reducing report generation time by 70%.- Designed and established a Python-based robust pipeline to oversee the Extract, Transform, Load (ETL) process, incorporating quality assurance protocols to guarantee data integrity and reliability.- Developed automated tests using PyTest to guarantee data quality and consistency, reducing data errors by 20%.Keywords: Cassandra, Python, Apache Airflow, PyTest, Automated Testing, ETL Processes, Data Processing, Data Transformation, Data Quality, Data Integrity, Data Reliability, Database Integration, Automated Performance Reporting Pipeline in Python, Data Visualization, Workflow Automation, Quality Assurance -
Computer Vision Engineer InternJiovio Healthcare Jun 2019 - Jul 2019Bengaluru, Karnataka, India- Developed a JavaScript application integrated with AWS Rekognition and AWS S3 to identify faces in over 10K images and 500 hours of video streaming from CCTV cameras.Keywords: JavaScript Development, AWS Rekognition, AWS S3, AWS SDK, AWS Cloud Services, Cloud Integration, Cloud-native Applications, Machine Learning in Cloud, Streaming Data, Face Recognition, Computer Vision, Real-time Image Processing, Image and Video Processing, CCTV Video Analysis, Video Surveillance Analytics
Vaibhav Gupta Education Details
Frequently Asked Questions about Vaibhav Gupta
What is Vaibhav Gupta's role at the current company?
Vaibhav Gupta's current role is Data Science Master's at Vanderbilt University.
What schools did Vaibhav Gupta attend?
Vaibhav Gupta attended Data Science At Vanderbilt University, Thapar Institute Of Engineering And Technology.
Not the Vaibhav Gupta you were looking for?
-
1gmail.com
-
Vaibhav Gupta
New York, Ny2upstart.com, kabloomcorp.com2 +161773XXXXX
-
Vaibhav Gupta
Folsom, Ca -
Vaibhav Gupta
United States -
Free Chrome Extension
Find emails, phones & company data instantly
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial