G D. work email
- Valid
- Valid
G D. personal email
- Valid
- Valid
• A highly analytical and productive professional with over 10+ years of experience in Data Science. Excellent in developing an engagement plan using the technical expertise and business acumen to translate the business requirements and project objectives into successful system solutions.• Expertise on designing and developing the Big data Analytics platforms for Retail, Logistics, Healthcare and Banking Industries using Big Data, Spark, Real-time streaming, Kafka, Data Science, Machine Learning, NLP and Cloud• Experience in end-to-end Data Development project hands-on starting from Data Ingestion, Data Quality, Data Governance, Data Management, Data Loading, Data Reporting and Analyzing.
-
Senior Data ScientistCapital OneCharlotte, Nc, Us -
Lead Data ScientistMarketamerica Sep 2022 - Present• Spearheaded NLP project for e-commerce categorization, leveraging LLM and GPT-4 models, enhancing product classification and search relevance, driving revenue growth.• Led development of vector search system using advanced NLP, including GPT-4, boosting product discovery and conversion rates.• Collaborated across teams to optimize machine learning algorithms using quantization methods and perturbation techniques, aligning with business goals for revenue generation.• Developed personalized recommendation models using LLM and NLP, incorporating customer data for improved satisfaction and driving repeat purchases.• Deployed ML models, including LLM, on AWS SageMaker for scalability and real-time inference, integrating seamlessly with existing infrastructure.• Optimized ML models through continuous experimentation, including perturbation techniques, staying ahead in NLP and recommendation systems.• Provided expertise in NLP and ML, fostering innovation and knowledge sharing within internal teams, enhancing AI capabilities. -
Senior Data ScientistFord Motor Company Jul 2021 - Aug 2022Dearborn, Michigan, Us• Customer Segmentation: Built a customer segmentation model using regression models in Python to segment millions of customers to gain insight into behaviors, measure marketing effectiveness, and better allocate future marketing spend. • Customer Churn: Built and maintained logistic regression and random forest models which predicted the customer’s likelihood to renew vs account closures with 85-90% accuracy ($20M balance saved) based on customer usage patterns and characteristics• Lead in initiative to build statistical models using historical data and consumer data to identify the potential consumers for the cross selling financial products • Construct and fit statistical, machine learning, or optimization models that enable estimation of retail establishment survey decision-making across a range of complex environments and applications• Used pandas, NumPy, seaborne, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms. Expertise in R, Mat lab, python and respective libraries.• Performed K-means clustering, Regression and Decision Trees in Python. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.• Identified and removed outliers in the data by using different statistical methods like Standard Deviation Method and Inter Quartile Range (IQR) Methods• Tackled highly imbalanced Fraud dataset using under sampling, oversampling with SMOTE and cost sensitive algorithms with Python Scikit-learn.• Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear, and Logistic Regression, Clustering,• Power BI development and administration. -
Lead Data ScientistCapital One Jul 2020 - Mar 2021Mclean, Va, Us• Wrote complex Spark SQL queries for data analysis to meet business requirement.• Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.• Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.• Used big data tools Spark (Pyspark, SparkSQL, Mllib) to conduct real time analysis of loan default based on AWS.• Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption and publishing data sources to Tableau server.• Assist customers by being able to deliver a ML project from beginning to end, including understanding the business need, aggregating data, exploring data, building & validating predictive models, and deploying completed models with concept-drift monitoring and retraining to deliver business impact to the organization • Use AWS AI services (e.g., Personalize), ML platforms (SageMaker), and frameworks (e.g.,TensorFlow, PyTorch, SparkML, scikit-learn) to help our customers build ML models -
Senior Data ScientistNielsen Apr 2016 - Dec 2019New York, Ny, Us• Using NLP, extract key information from medical reports thereby reducing the processing time for more standard cases and enabling underwriters to focus on the most difficult or complex ones.• Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.• Using NLP techniques (with NLTK and gensim libraries), prototyped automatic extraction of structured listings data from free-form text descriptions. • Use AWS AI services (e.g., Personalize), ML platforms (SageMaker), and frameworks (e.g., MXNet, TensorFlow, PyTorch, SparkML, scikit-learn) to help our customers build ML models• Apply computer based mathematical/statistical techniques using software. Lead or participate in statistical projects or studies in survey sampling (design and estimation), modeling, or statistical research. Apply knowledge of programming/coding language (i.e., SQL, SAS, SPSS, RStudio) to develop scripts or applications• Research and implement novel ML approaches, including hardware optimizations on platforms such as AWS Inferentia • Collaborated with product team to define KPIs and to assess the progress thereof; also, to propose and execute product analytics projects such as user segmentation• Identified and removed outliers in the data by using different statistical methods like Standard Deviation Method and Inter Quartile Range (IQR) Methods• Handled imbalanced data sets by resampling methods like Synthetic Minority over Sampling Technique (SMOTE) and Random under Sampling methods• Tackled highly imbalanced Fraud dataset using under sampling, oversampling with SMOTE and cost sensitive algorithms with Python Scikit-learn.• Wrote complex Spark SQL queries for data analysis to meet business requirement.• Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop on AWS. -
Data ScientistSmart Tricks Apr 2015 - Apr 2016• Lead in initiative to build statistical models using historical data to predict FMCG sales in several economic markets. Focused on analyzing the factors affecting the sales of SENAP region• Construct and fit statistical, machine learning, or optimization models that enable estimation of retail establishment survey decision-making across a range of complex environments and applications• Used pandas, NumPy, seaborne, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms. Expertise in R, Mat lab, python and respective libraries.• Research on Reinforcement Learning and control (Tensor Flow, Torch), and machine learning model (Scikit-learn).• Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear, and Logistic Regression, SVM, Clustering, Principal Component Analysis.• Performed K-means clustering, Regression and Decision Trees in R. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.• Implemented various statistical techniques to manipulate the data like missing data imputation, principal component analysis and sampling.• Worked on R packages to interface with Caffe Deep Learning Framework. Perform validation on machine learning output from R.• Applied different dimensionality reduction techniques like principal component analysis (PCA) and t-stochastic neighborhood embedding (t-SNE) on feature matrix.• Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.• Responsible for design and development of Python programs/scripts to prepare transform and harmonize data sets in preparation for modeling.• Worked with Market Mix Modeling to strategize the advertisement investments to better balance the ROI on advertisements.
-
Data AnalystMedrcedu Tech May 2013 - Apr 2015• Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.• Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data• Involved in SQOOP implementation which helps in loading data from various RDBMS sources to Hadoop systems and vice versa. • Developed Python scripts to extract the data from the web server output files to load into HDFS.• Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.• Worked on Cloud Health tool to generate AWS reports and dashboards for cost analysis.• Written a python script which automates to launch the EMR cluster and configures the Hadoop applications.• Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in PySpark.• Experienced in analyzing and Optimizing RDD's by controlling partitions for the given data• Experienced in writing live Real-time Processing using Spark Streaming with Kafka• Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting• Experienced in querying data using SparkSQL on top of Spark engine• Involved in managing and monitoring Hadoop cluster using Cloudera Manager.• Used Python and Shell scripting to build pipelines.• Developed data pipeline using sqoop, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.• Plan, develop, coordinate, and participate in various marketing research activities to identify customer preferences and attitudes and to enhance products and services• Created data partitions on large data sets in S3 and DDL on partitioned data• Worked for BI Analytics team to conduct A/B testing, data extraction and exploratory analysis.
-
Data AnalystMy Home Industries Private Limited Jun 2008 - Apr 2010Hyderabad, Telangana, In• Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.• Responsible for building scalable distributed data solutions using Hadoop.• Experienced in loading and transforming of large sets of structured, semi-structured and unstructured data.• Developed Spark jobs and Hive Jobs to summarize and transform data.• Built on-premises data pipelines using Kafka and spark for real-time data analysis.• Created reports in TABLEAU for visualization of the data sets created and tested Spark SQL connectors.• Implemented Hive complex UDF's to execute business logic with Hive Queries.• Developed a different kind of custom filters and handled pre-defined filters on HBase data using API.• Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.• Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive and then loading data into HDFS.• Exporting of a result set from HIVE to MySQL using Sqoop export tool for further processing.• Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis.• Experience in managing and reviewing Hadoop Log files.• Experienced in querying data using SparkSQL on top of Spark engine• Involved in managing and monitoring Hadoop cluster using Cloudera Manager.• Used Python and Shell scripting to build pipelines.• Developed data pipeline using sqoop, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.• Used Sqoop to channel data from different sources of HDFS and RDBMS.• Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.• Created Partitioned Hive tables and worked on them using HiveQL.• Loading Data into HBase using Bulk Load and Non-bulk load
G D. Skills
Frequently Asked Questions about G D.
What company does G D. work for?
G D. works for Capital One
What is G D.'s role at the current company?
G D.'s current role is Senior Data Scientist.
What is G D.'s email address?
G D.'s email address is gi****@****ail.com
What are some of G D.'s interests?
G D. has interest in It Integration With Manufacturing Sector, Economic Empowerment, Education, Science And Technology, Business Analytics And Intelligence, Project And Product Management, Business Process Optimization, Arts And Culture, New Technologies In Manufacturing Sector.
What skills is G D. known for?
G D. has skills like Microsoft Office, Strategic Planning, Business Development, Microsoft Excel, Crm, Business Strategy, Pre Sales, Business Analysis, Business Planning, Business Process, Business Process Improvement, Sas/base.
Who are G D.'s colleagues?
G D.'s colleagues are Edward Patterson, Danny Novoa, Monique Taylor, Danielle Martin, Shawntee Neie, Xin Sun, Lois Shapiro.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial