Principal Data Scientist
CurrentDeveloped a robust document analysis framework integrating OCR (Tesseract), image processing (PyMuPDF, OpenCV), NLP techniques (spaCy), with comprehensive logging for enhanced traceability and debugging. Overcame non-standard PDF encodings to improve data extraction reliability and significantly improving operational efficiency.