• Over 6+ years of experience in Application analysis, Design, Maintenance, Development and Testing which includes over 3 years of experience in Big Data and Hadoop components, which are HDFS, Map Reduce, YARN, HBASE, HIVE, PIG, SQOOP, PYTHON, PYSPARK.• Expertise in Hadoop Architecture and various components such as Data Node, Name Node, Secondary Name Node, Job Tracker and Task Tracker.• Experience on working with Hadoop Clusters using major Hadoop distributions such as Cloudera.• Expertise in writing Hadoop Jobs for analyzing data using Python, MapReduce, Hive and Pig• Worked on developing Shell scripts to coordinate execution of all other scripts and move the data files within and outside of HDFS.• Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.• Have good knowledge on partition and bucketing concepts used in HIVE.• Experienced in usage of External and Managed tables in HIVE to optimize the performance.• Experience creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.• Experienced in using Elastic Map Reduce on Amazon Web Services (AWS) cloud for supporting data analysis projects along with importing and exporting data into S3• Experienced in Python programming and well versed with using Python Integrated Development Environment such as PyCharm, Anaconda, and Eclipse.• Well versed in creating several database objects such as tables, stored procedures, functions and triggers using SQL and PL/SQL• Good knowledge in Data Visualization by creating dashboards using Tableau• Experienced working in CI/CD environment• Proficient in implementing complex business rules through Informatica transformations, Workflows/Worklets and Mappings/Mapplet.• Proficient in development methodologies such as Agile and Waterfall models.
Paypal
-
Sr Data EngineeringPaypal Jun 2021 - Present
-
Data EnineerAnthem Aug 2019 - Jun 2021• Involved in requirement gathering talking to the business stake holders, designed and developed Data Engineering pipelines based on the business requirements• Perform functional and technical analysis based on the business meetings with different stake holders and create design documents• Perform UAT for different applications validating data• Developed Scala scripts using both Data frames and RDS’s in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop• Implement Spark UDF’s to process data that can’t be performed using spark actions and transformations.• Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries• Involved in analyzing, coding, testing production implementation and system support for Hadoop applications. • Extensive expertise using the core Spark APIs and processing data on an EMR cluster• Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.• Programmed in Hive, Spark SQL, Java, C# and Python to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines• Extensive expertise using the core Spark APIs and processing data on a EMR cluster• Performs development, QA, and dev-ops roles as needed to ensure total end to end responsibility of solutions• Automated deployments using CI/CD
-
Data EngineerCharles Schwab Aug 2018 - Aug 2019Westlake, Texas, Us• Interact with various customers and understand their requirement and design a workflows to pull data from various OLTP systems into Hadoop and cloud storage platform• Import and export data using Sqoop to load data to and from Teradata to S3 on regular basis.• Created Hive tables, implement partitioning, dynamic partitioning and bucketing in Hive for efficient data access.• Implement Spark UDF’s to process data that can’t be performed using spark actions and transformations.• Worked on copying the files into Hadoop system for testing purpose at regular intervals.• Processed large amounts of structured and semi-structured data using MapReduce programs.• Developing Spark code in SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD.• Hands on experience working with Elastic MapReduce, S3 in AWS and storing the transformed data into S3• Involved on configuration, development of Spark environment (Databricks) with AWS cloud such as EC2, EMR• Working knowledge of Amazon database Redshift• Load the data into Spark RDD and do in memory data Computation to generate the Output response.• Involved in converting Hive queries into Spark transformations using Spark RDDs.• Extensively worked on different file formats like PARQUET, AVRO & ORC.• Developed and maintain system documentation and runbooks.• Converted existing SQL queries into Hive QL queries -
Data EngineerArtha Solutions May 2017 - May 2018Scottsdale, Az, Us• Migrated existing data from Teradata and SQL Server to Hadoop and perform ETL operations on it.• Designed and Implemented Sqoop incremental imports, delta imports on tables without primary keys and dates from Teradata and append directly into Hive warehouse.• Interacted with business users to get necessary requirements and ensured the changes implemented correctly• Worked with Avro and Parquet file formats and used various compression techniques to leverage the storage in HDFS.• Implemented partitioning, Dynamic Partitioning and Bucketing in HIVE• Worked on on-call production issues- scrubbing, resolve hive query issues, workaround for defects with SLA duration. -
Etl DeveloperNovartis Jul 2013 - Jun 2016Basel, Baselstadt, Ch• Worked with structured and semi-structured data.• Created Hive tables to store data into HDFS, loading data and writing hive queries that will run internally in map-reduce way.• Developed HIVE queries to meet business requirements.• Good understanding of both partition and bucketing concepts and developed both external and managed tables within HIVE to optimize performance. • Rectified performance issues within HIVE scripts with the incorporation of Groups, Joins and Aggregation Functions.• Created HIVE staging table to load source tables in RDBMS source data Parquet format. • Expertise in importing data from conventional RDBMS into HDFS with the help of Sqoop.• Familiar with monitoring and managing Hadoop Cluster using Cloudera Manager.• Created HBase tables to store large sets of semi structure data coming from various sources.• Worked with SparkSQL to read external data and process the data using Scala framework. • Migrated complex Map reduce programs, Hive scripts into RDD transformations and actions• Parse Json files through Spark core to extract schema for the production data using Spark SQL.• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python. • Involved in analyzing, coding, testing production implementation and system support for Enterprise Data Warehouse Application. • Worked on implementing Spark using SparkSQL for faster testing and processing of data responsible to manage data from different sources.Environment: CDH-4, Hadoop, HDFS, Hive, HBase, Sqoop, YARN, Spark, Python, SQL, Shell Scripting, Python, PySpark, PyCharm
Manoj E Education Details
-
Osmania UniversityComputer Science
Frequently Asked Questions about Manoj E
What company does Manoj E work for?
Manoj E works for Paypal
What is Manoj E's role at the current company?
Manoj E's current role is Data Engineering | Cloud Computing | CI/CD.
What schools did Manoj E attend?
Manoj E attended Osmania University.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial