Pavan Kumar is a Senior Data Engineer at TD.
-
Senior Data EngineerTdUnited States -
Senior Data EngineerTd Feb 2023 - PresentBoston, Massachusetts, United StatesInvolved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with PythonUsed Pyspark for data frames, ETL, Data Mapping, Transformation and Loading in complex and high-volume environmentImplement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).Developed interactive dashboards and reports using Microsoft Fabric’s analytics tools, enabling stakeholders to gain actionable insights and make data-driven decisions.Measured Efficiency of Hadoop/Hive environment ensuring SLA is metImplemented Copy activity, Custom Azure Data Factory Pipeline ActivitiesImplement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsightAnalyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.Worked on developing ETL Workflows on the data obtained using Scala for processing it in HDFS and HBase using OozieDesigned several DAGs (Directed Acyclic Graph) for automating ETL pipelinesAggregated daily sales team updates to send report to executives and to organize jobs running on Spark clustersOptimized the Tensor Flow Model for efficiencyExperienced in ETL concepts, building ETL solutions and Data modeling Worked on architecting the ETL transformation layers and writing spark jobs to do the processing. -
Big Data EngineerMaximus It Mar 2020 - Nov 2022IndiaWorking with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc.Developed the PySpark code for AWS Glue jobs and for EMR.Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboardsGood Knowledge and experience in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from SQL Server, Teradata and Netezza using Sqoop.Implemented data ingestion and handling clusters in real time processing using Kafka.Worked on data migration from east to west snowflake accounts.Utilized Python Libraries like Boto3, NumPy for AWS.Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.Configured documents which allow Airflow to communicate to its PostgreSQL database.Created Mobile dashboards and embedded in Salesforce App using Tableau Sparkler for SSO purpose to provide sales users one stop shop experience.Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms concepts on client specified columns.Integrated services like Bitbucket AWS Code Pipeline and AWS Elastic Beanstalk to create a deployment pipeline.Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application.Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps, heat maps, bullet charts, Gantt charts demonstrating key information for decision making.Working experience with data streaming process with Kafka, Apache Spark, Hive.Responsible for Design, Development, and testing of the database and Developed Stored Procedures, Views, and Triggers
-
Big Data EngineerSynergy Technologies Nov 2017 - Feb 2020IndiaMigrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to usersCreated reports in TABLEAU for visualization of the data sets created and tested Spark SQL connectors.Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWSMigrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for data sets processing and storage.Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift.Used IAM for creating roles, users, groups and implemented MFA to provide additional security to AWS account and its resources. AWS ECS and EKS for docker image storage and deployment.Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.Proficient in developing real time pipelines using Kafka connect, Kafka stream, stream sets and other real time processing components.Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.Migrate on in-house database to AWS Cloud and also designed, built, and deployed a multitude of applications utilizing the AWS stack (Including EC2, RDS) by focusing on high-availability and auto-scaling.Performed data manipulation on extracted data using Python Pandas.Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables.Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. -
Hadoop DeveloperD&B Technologies Oct 2016 - Aug 2017IndiaTransformed the data using AWS Glue dynamic frames with PySpark; cataloged the transformed the data using Crawlers and scheduled the job and crawler using workflow featureDeveloped data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (JSON) for visualization, and generating.Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.Experience in report writing using SQL Server Reporting Services (SSRS) and creating various types of reports like drill down, Parameterized, Cascading, Conditional, Table, Matrix, Chart and Sub Reports.Used DataStax Spark connector which is used to store the data into Cassandra database or get the data from Cassandra database.Performed super user trainings in Basic and Advanced Tableau Desktop.Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.Wrote Oozie scripts and setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.build scalable distributed data solutions using Hadoop ecosystem.Worked with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.Expert on connecting multiple data sources in Tableau to implement working reports.
-
Data EngineerMaximus It Dec 2014 - Oct 2016Hyderabad, Telangana, IndiaInvolved in creating Hive tables, loading with data and writing hive queries on top of data present in HDFS.Worked on tuning the performance Pig queries. Involved in Developing the Pig scripts for processing data.Written Hive queries to transform the data into tabular format and process the results using Hive Query Language.By using Apache Flume loaded real time unstructured data like xml data, log files into HDFS.Processed large amount both structured and unstructured data using MapReduce framework.Loaded data from MySQL server to the Hadoop clusters using the data ingestion tool Sqoop.Used Tableau to get the visualizations on data outcome from the ML algorithms.Designed solution to perform ETL tasks like data acquisition, data transformation, data cleaning and efficient data storage on HDFSDeveloped Spark code using Scala and Spark Streaming for faster testing and processing of data.Store the resultant processed data back into Hadoop Distributed File System.Applied machine learning algorithms (K- nearest Neighbors, random forest) using Spark MLib on top of HDFS data and compare the accuracy between the models.
Pavan Kumar Education Details
-
KitsBachelor'S Degree
Frequently Asked Questions about Pavan Kumar
What company does Pavan Kumar work for?
Pavan Kumar works for Td
What is Pavan Kumar's role at the current company?
Pavan Kumar's current role is Senior Data Engineer.
What schools did Pavan Kumar attend?
Pavan Kumar attended Kits.
Who are Pavan Kumar's colleagues?
Pavan Kumar's colleagues are Nancy Lantin, Sagar Khurana, Paul Donohoe, Warren Schneider, Liz Weber, Cams, Cgss, Wendy Tucci, Bindiya Bansal.
Not the Pavan Kumar you were looking for?
-
PAVAN KUMAR
Senior Full Stack Developer (Java & Python) @ Bank Of America | Spring Boot | React 15 | Angular16 | Node.Js | Aws | Gcp | Azure | Mysql | Mongodb | Oracle | Graphql | Java21 | CamundaCharlotte, Nc -
Pavan Kumar
Falls Church, Va -
Pavan Kumar
Greater Houston -
-
pavan Kumar
Alpharetta, Ga1cricketwireless.com
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial