Big Data Engineer
CurrentProject Description: Federal Deposit Insurance Corporation (FDIC) has created a new record keeping requirement (Part 370) for the largest insured depository institutions (IDI) - compliance is required by April 1, 2020. This rule requires large IDIs to now determine & calculate insured/uninsured balances and to uniquely identify fiduciary owners and beneficiaries via reporting within 24 hours. The ultimate goal is full compliance, the key focus for a large domestic bank is to properly prioritize remediation efforts regarding data and system gaps based on their impact to make a deposit insurance calculation.Responsibilities:• Loaded D-Stream data into Spark RDD from S3(Simple Storage Service – AWS Service) and do in memory data Computation to generate Output response based on the transformations and actions.• Worked and learned a great deal from AWS Cloud services like EMR, Step Functions, EC2, S3, RDS, Cloud Watch, Lambda, Athena, GLUE Setting up IAM Roles.• Worked on a great deal using ELASTIC MAP REDUCE and setup Hadoop environment on AWS EC2 Instances also responsible for EMR cluster trouble shooting.• Worked on PySpark by modifying the existing Python scripts that reads the input data, apply Spark transformations and write the result to a file using python.• Used the Presto Query engine to run the ad-hoc queries using the created views, by on boarding tables meta data to AWS Glue. Used Red Dash UI in order to execute the queries on Datasets stored on Amazon S3.• Set up the Spark testing base using the Holden-Karau Spark Testing Base from scratch for testing the data transformation logics on QA before promoting the code to production.