Natural Language Processing Engineer
Current- Extract, clean, and organize data of job descriptions and resumes from Indeed.com for training and analysis
- Develop a GUI tool for NER (Name Entity Recognition) tagging tasks, which greatly improves annotation efficiency
- Customize pipelines in spaCy and train word embedding NER models for parsed job descriptions and resumes
- Optimize word2vec model in genism package by performing directional skip-gram for word embeddings, which explicitly distinguishes directions of contextual words and improves word prediction for corpora of resumes
- Implement a tokenization tool that integrates functionalities such as TF-IDF (term frequency-inverse document frequency) retrieval, concordance, and term-index lookups, phrase merges and removals, and document.