Pooja B Email and Phone Number
As a senior data engineer at Freddie Mac, I have over nine years of experience in developing scalable and reliable data pipelines using cutting-edge technologies. My mission is to enable data-driven decisions and insights that support Freddie Mac's vision of making home possible for millions of families and individuals.I leverage my skills in SQL, Python, Azure Databricks, and other tools to process various formats and sources of data, such as XLS, JSON, TXT, RDBMS, and Swift Object Store. I also use Snowpark, PySpark, MapReduce, NiFi, Kafka, Spark, Hive, and Scala to ingest, transform, and analyze data in HDFS, AWS S3, Elastic search, Redshift, and Snowflake. I bring diverse perspectives and experiences to the team, as I have worked in different domains, such as banking, retail, and mortgage, and have a master's degree in computer science.Tech Stack:Python, SQL, Scala, Spark, Snowflake, Hadoop, Hive, ADF(Azure Data Factory), Azure, AWS, Power-BI, GCP.Database, PostgreSQL Database, Oracle, Teradata, SQL SERVER.AWS: EMR, Glue, Lambda, CloudFormation, DynamoDB, Athena, EC2.Project Management: Agile, JIRA, Confluence.
Freddie Mac
View- Website:
- freddiemac.com
- Employees:
- 8833
-
Senior Big Data EngineerFreddie Mac Aug 2022 - Present Developed Python Spark streaming scripts to load raw files and corresponding. Used snowpark for extracting the data from source and loading it to Enterprise snowflake. Implemented PySpark logic to transform and process various formats of data like XLS, XLS, JSON, and TXT. Elaborated Python Scripts to fetch/get S3 files using Boto3 module. Built scripts to load PySpark processed files into Redshift DB and used diverse PySpark logics. Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources. Processed metadata files into AWS S3 and Elastic search cluster. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks. Used Azure Data Factory, to create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores Created Hive Generic UDF's to process business logic that varies based on policy. Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables. Monitoring Cluster using Cloudera manager. Included migration of existing applications and development of new applications using AWS cloud services. Developed Python Scripts to get the recent S3 keys from Elastic search. -
Senior Big Data Engineer/Hadoop DeveloperBank Of America Jun 2020 - Jul 2022 Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive. Have been using NiFi for transferring data from source to destination and Responsible for handling batch as well as Real-time Spark jobs through NiFi. Developed micro-services using Python scripts in Spark Data Frame API’s for the semantic layer. Developed Spark scripts by using Scala as per the requirement. Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data. Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data. Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark. Built the complete data ingestion pipeline using NiFi which POST’s flow file through invoke HTTP processor to our Micro services hosted inside the Docker containers. Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way. Designed and implemented MapReduce based large-scale parallel relation-learning system. Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format. Performed data profiling and transformation on the raw data using Pig and Python. Visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver. Created Hive Generic UDF's to process business logic that varies based on policy. Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables. Monitoring Cluster using Cloudera manager. Develop predictive analytic using Apache Spark Scala APIs. Implemented MapReduce counters to gather metrics of good records and bad records. Built data governance processes, procedures, and control for Data Platform using Nifi. -
Sr. Big Data Engineer/Data EngineerWalmart Jan 2019 - May 2020Bentonville, Arkansas, United States Worked on python API to extract data from RDBMS and create dataframes to save as table and submit in spark jobs. Worked on Sqoop import and export from Azure, SQL Server, Oracle, DB2 and Teradata and configure to run in parallel in shell scripting. Extensive working knowledge in Core java concepts, Collections and Web based, Client/Server Applications. Loaded large sets of structured, semi structured data from Swift Object Store to HDFS in Edge node with DISTCP command and created staging table. Worked on POWER BI to create reporting graphs and cards for analysis for different years and months based on different KPI’s requirement. Worked on TDCH connector to export table to Teradata and configuring the job in shell scripting. Experience to create HQL scripts and python scripts to submit in pyspark jobs with optimize resources. Created shell scripts to perform file transfer between Windows server and file server. Extensively worked on error handling in shell scripting as well as in python depending on downstream jobs requirement. Extensively worked on Shell Scripting to configure the spark jobs and Sqoop jobs from databases. Experience of handling huge datasets in Spark Jobs and submit parallelly to run as small subsets of jobs with the help of partitioning. Performed transformations between many hive tables to create final feeds. Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's. Written SQL queries for data analysis to meet the business requirements and perform validations on it. Hands on experience on troubleshooting spark errors related to executors and java heap memory. Worked on converting the data sets to Hive ORC format and partitioning the data on week number basis. -
Hadoop DeveloperCapital One Apr 2018 - Dec 2018Plano, Texas, United States Involved in Requirement Gathering to connect with BA. Working Closely with BA & Client for creating technical Documents like High-Level Design and low-Level Design specifications. Experienced on loading and transforming of large sets of structured data, semis structured data and unstructured data. Imported data using Sqoop to load data from MySQL to HDFS on regular basis. Developing RDDS to schedule various Hadoop Program. Written SPARK SQL Queries for data analysis to meet the business requirements. Experienced in defining job flows. Cluster coordination services through Kafka and Zookeeper. Serializing JSON data and storing the data into tables using Spark SQL. Writing Shell scripts to automate the process flow. Storing the extracted data into HDFS using Flume Experienced in multiple file formats including XML, JSON, CSV and other compressed file formats Experienced writing queries in Spark SQL using Scala Communicated all issues and participated in weekly strategy meetings. Collaborated with the infrastructure, network, database, application, to ensure data quality and availability. Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing. Prepare daily and weekly project status report and share it with the client. Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts -
Hadoop And Spark DeveloperRolta India Limited Dec 2015 - Jul 2016India Involved in Requirement Gathering to connect with BA. Working Closely with BA & Client for creating technical Documents like High-Level Design and low-Level Design specifications. Experienced on loading and transforming of large sets of structured data, semis structured data and unstructured data. Imported data using Sqoop to load data from MySQL to HDFS on regular basis. Developing RDDS to schedule various Hadoop Program. Written SPARK SQL Queries for data analysis to meet the business requirements. Experienced in defining job flows. Cluster coordination services through Kafka and Zookeeper. Serializing JSON data and storing the data into tables using Spark SQL. Writing Shell scripts to automate the process flow. Storing the extracted data into HDFS using Flume Experienced in multiple file formats including XML, JSON, CSV and other compressed file formats Experienced writing queries in Spark SQL using Scala Communicated all issues and participated in weekly strategy meetings. Collaborated with the infrastructure, network, database, application, to ensure data quality and availability. Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing. -
Software SpecialistEclinicalworks Software Pvt. Ltd. Oct 2013 - Dec 2014India Managing over 20 critical applications single. Configuring Dataguard for OLTP databases. Upgrading databases to 12C. Work with Application team to performance related issues. Rebuilding of Indexes for better performance, maintenance of Oracle Database. Generated performance reports and Daily health checkup of the database using utilities like AWR, Statspack to gather performance statistics. Identified and tuned poor SQL statements using EXPLAIN PLAN, SQL TRACE and TKPROF, analyzed tables, indexes for improving the performance of the Query. Troubleshooting various issues like database connectivity to users, privileges issue. Created users and allocated appropriate table space quotas with necessary privileges and roles for all databases. Wrote script to monitor the database with shell and PL/SQL code or SQL code such as procedure, function and package. Created or cloned the oracle Instance and databases on ASM. Performed database cloning and re-location activities. Managed tablespaces, data files, redo logs, tables and its segments. Maintained data integrity also managed profiles, resources and password security manage Users, privileges and roles.
Pooja B Education Details
-
Computer Science -
Computer Science
Frequently Asked Questions about Pooja B
What company does Pooja B work for?
Pooja B works for Freddie Mac
What is Pooja B's role at the current company?
Pooja B's current role is AWS Certified Solutions Architect-Associate | Google Cloud Certified-Associate Cloud Engineer | Senior Data Engineer @ Freddie Mac.
What schools did Pooja B attend?
Pooja B attended Monroe College, Alamuri Ratnamala Institute Of Engineering And Technology.
Who are Pooja B's colleagues?
Pooja B's colleagues are Elle-Rose Lagdameo, Nannette Mitchell Mba, Pmp, Eileen Dejoras, Chris O Kwon, Nida Peerzada, Prasad Gorantla, Tiffany Balogun.
Not the Pooja B you were looking for?
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial