Pavan K Email and Phone Number
Pavan K is a Senior Data Engineer | Actively Looking for New Positions | Python | PySpark | Scala |MySQL| PL/SQL | MongoDB| Tableau | Agile | Hadoop | HDFS | Airflow | Hive | Pig | Kafka | AWS | Azure | Snowflake| oozie| at American Equity.
American Equity
View- Website:
- american-equity.com
- Employees:
- 511
-
Senior Data EngineerAmerican Equity Oct 2023 - PresentDes Moines, Ia, United StatesAdvanced working SQL knowledge and experience working with relational databases PostgreSQL, query authoring (SQL).Extensive experience leveraging Python to build data transformation pipelines (ETL) and experience with libraries such as pandas and NumPy. Experience using AWS Glue and EMR to construct data pipelines. Batch processing with Java / Spark SQL. Experience on Airflow and related tools to get to the client requirements. Analyze, transform, and validate files in streaming with Java 8 / spark.Development of RESTful WS web services using Java and spring boot. Developed Atoms/Molecules on Dev/Test/Prod servers, Monitoring Jobs Process Reporting Screen, Real-time & Account Dashboard Screen for errors and daily transaction count. -
Senior Data EngineerCrowdstrike Dec 2022 - Oct 2023Sunnyvale, California, United StatesUse SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.Developed python based Automation Scripts.Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.Developing Spark programs with Python, and applied principles of functional programming to process the complex structured data sets.Worked with Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in AWS.Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.Created Unix shell scripts to run the Informatica Powercenter workflows and controlling ETL flow.Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big data technologies. Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports. -
Data EngineerMorgan Stanley May 2021 - Nov 2022New York, United StatesExtensively using open source languages Python, Scala and Java. Create Self Service reporting in Azure Data Lake Store Gen2 using an ELT approach. Performance tune up Phoenix/HBase, Hive queries and Spark. Installed Kafka to gather data from disperse sources and store for consumption. Writing PySpark and spark SQL transformation in Azure Data bricks to perform complex transformations for business rule implementation. Defined and deployed monitoring, metrics, and logging systems on AWS. Connected to Amazon Redshift through Tableau to extract live data for real time analysis. Extending Hive and Pig core functionality by writing custom UDFs, UDTF and UDAFs. Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency which is important for decision making in the process. Scheduled different Snowflake jobs using NiFi. Developed ETL jobs using Spark-Scala to migrate data from Oracle to new hive tables. -
Data EngineerState Of Oregon Jan 2019 - Apr 2021Salem, Oregon, United StatesCreated HBase tables to load large sets of structured data. Managed and reviewed Hadoop log files. Used AWS Glue for the data transformation, validate and data cleansing. Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS. Created components like Hive UDFs for missing functionality in HIVE for analytics. Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various. Used different file formats like Text files, Sequence Files, Avro. Developed Oozie work processes for planning and arranging the ETL cycle. Associated with composing Python scripts to computerize the way towards extricating weblogs utilizing Airflow DAGs. Used python, the ETL pipeline was developed and programmed to collect data from Redshift data warehouse. Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS. Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI. Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses. Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts. Implemented SQOOP for large dataset transfer between Hadoop and RDBMs. Processed data into HDFS by developing solutions. Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data. -
Hadoop DeveloperMax Healthcare Jun 2014 - Sep 2018Hyderabad, Telangana, IndiaInstalled Oozie workflow engine to run multiple Hive and Pig Jobs. Developed Simple to complex Map/reduce jobs using Hive and Pig. Developed Map Reduce programs for data analysis and data cleaning. Designed solution to perform ETL tasks like data acquisition, data transformation, data cleaning and efficient data storage on HDFS. Designed, implemented and deployed within a customer’s existing Hadoop / Cassandra cluster a series of custom parallel algorithms for various customer defined metrics and unsupervised learning models. Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster. Extensively worked with PySpark / Spark SQL for data cleansing and generating Data Frames and RDDs. Involved in creating Hive tables, loading with data and writing hive queries on top of data present in HDFS. Developed Spark code using Scala and Spark Streaming for faster testing and processing of data.
Pavan K Education Details
-
Computer Science
Frequently Asked Questions about Pavan K
What company does Pavan K work for?
Pavan K works for American Equity
What is Pavan K's role at the current company?
Pavan K's current role is Senior Data Engineer | Actively Looking for New Positions | Python | PySpark | Scala |MySQL| PL/SQL | MongoDB| Tableau | Agile | Hadoop | HDFS | Airflow | Hive | Pig | Kafka | AWS | Azure | Snowflake| oozie|.
What schools did Pavan K attend?
Pavan K attended Jntuh College Of Engineering Hyderabad.
Who are Pavan K's colleagues?
Pavan K's colleagues are Alex Steimel, Cfa, Autumn Grismore, Sara Drake, Libby Lundgren, Jen Vangilder, Christopher Klokowski, Chris Kinzig.
Not the Pavan K you were looking for?
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial