Sde1-Data
Current- Supported computation of derived attributes from raw fields in Elastic Search as well as the construction and ingestion of country-specific data pipelines.
- Performed queries on the Elastic Search and MongoDB architectures to speed up data retrieval and optimize searches.
- Suggested and developed a script for searching ES data, optimizing it through fuzzy match, and further automating it using DAGs in order to create company profiles and ingest them into MongoDB for different countries.
- Developed search-based ES indexes for constructing features like HS Code and Company Name autosuggest.
- Sped up the execution time of numerous Python functions by employing multithreading and parallelizing these processes with the aid of the pandarallel/threadpool executor.
- Tools/Technologies Used- Jupyter Notebook, Python, Pandas, Elasticsearch, Apache Airflow, PyMongo, Numpy, MongoDB, Muti-Threading, Threadpool Executor.