• Over 13 years of experience as a Big Data Developer, specializing in the Hadoop Ecosystem, HDFS, Hive, Sqoop, Spark, Oozie, Control M, Shell scripting, and Java.• Over 9 years of hands-on expertise in Hadoop and Spark engineering, specializing in distributed systems and parallel processing architectures.• Deep understanding of the MapReduce Framework and Spark execution framework, with expertise in writing end-to-end data processing jobs for comprehensive data analysis.• Proficient in working with structured data using HiveQL, implementing join operations, writing custom UDFs, and optimizing Hive queries.• Developed scalable Spark applications in Python and Scala for ETL purposes.• Successfully migrated an on-premises application to AWS and designed, configured, and deployed applications on AWS stack (EC2, Glue, Lambda, SNS, S3, RDS, Cloud Watch, SQS, IAM).• Specialized in building serverless services using AWS Lambda and creating ETL processes in AWS Glue for seamless data migration to AWS Redshift.• Developed machine learning models using Python libraries (Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn).• Extensive experience in maintaining Hadoop clusters on AWS EMR.• Implemented batch and real-time analytics on larger datasets using PySpark.• Proficient in using Cloudera Manager for Hadoop operations, Apache Flume, and Kafka for data collection and aggregation.• Strong background in writing Pig scripts for data transformation and migration to Snowflake.• Designed and deployed various applications on AWS stack for high-availability, fault tolerance, and auto-scaling.• Built serverless services using AWS Lambda and created ETL processes in AWS Glue for data migration to AWS Redshift.• Experienced in importing/exporting data from/to RDBMS and Hadoop Ecosystem using Apache Sqoop.• Developed AWS Cloud Formation Templates for creating EMR Cluster.• Knowledgeable in Azure services and Google Cloud Platform for data migration and transformation.• Solid understanding of NoSQL databases, with hands-on experience in Cassandra with PySpark and Scala for analytics.• Proficient in querying data from Cassandra for searching, grouping, and sorting.• Proficient in Core Java, Multithreading, Version Control (GitHub), and Build tools (Maven).• Skilled in developing PySpark applications for large-scale datasets.
-
Senior Azure Data EngineerCloud Front GroupIndia -
Sr. Big Data EngineerMorgan Stanley Aug 2021 - Mar 2024New York, United States• Collaborated with managers and stakeholders to understand core business requirements. Implemented a generic ETL framework with high availability for bringing related data for Hadoop & Cassandra from various sources using spark.• Experienced in using Platfora a data visualization tool specific for Hadoop, and created various• Lens and Viz boards for a real-time visualization from hive tables.• Queried and analyzed data from Cassandra for quick searching, sorting and grouping… Show more • Collaborated with managers and stakeholders to understand core business requirements. Implemented a generic ETL framework with high availability for bringing related data for Hadoop & Cassandra from various sources using spark.• Experienced in using Platfora a data visualization tool specific for Hadoop, and created various• Lens and Viz boards for a real-time visualization from hive tables.• Queried and analyzed data from Cassandra for quick searching, sorting and grouping throughCQL.• Implemented various Data Modeling techniques for Cassandra.• Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.• Participated in various upgradations and troubleshooting activities across enterprise.• Knowledge in performance troubleshooting and tuning Hadoop clusters.• Applied Spark advanced procedures like text analytics and processing using the in-memory• processing.• Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it• in Hadoop.• Created architecture stack blueprint for data access with NoSQL Database Cassandra;• Brought data from various sources in to Hadoop and Cassandra using Kafka.• Experienced in using Tidal enterprise scheduler and Oozie Operational Services for• coordinating the cluster and scheduling workflows.• Applied spark streaming for real time data transforming.• Created multiple dashboards in tableau for multiple business needs.Environment: MapR 5.0.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java,SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho. Show less -
Hadoop ConsultantJpmorgan Chase & Co. Aug 2020 - Jul 2021New York, United States• Worked independently on understanding the business needs and goals of the organizations.• Developed Data Integrity and Data Quality components like DBDataQualityChecks, File Data Integrity checks, Balance Comparison checks for the incoming binary files and as well as certain control points in the ETL. • Developed Map Reduce Program for Generating Unique key for every incoming new record (Universal Key Generator).• Developed UDF’s for hive and pig to support extra functionality… Show more • Worked independently on understanding the business needs and goals of the organizations.• Developed Data Integrity and Data Quality components like DBDataQualityChecks, File Data Integrity checks, Balance Comparison checks for the incoming binary files and as well as certain control points in the ETL. • Developed Map Reduce Program for Generating Unique key for every incoming new record (Universal Key Generator).• Developed UDF’s for hive and pig to support extra functionality provided by Teradata.• Worked on Avro and Parquet File Formats with snappy compression.• Worked on POC for Apache Spark and Crunch.• Worked on Autosys for scheduling the Oozie Workflows.• Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobsIn java and Scala for data cleaning and preprocessing.• Experienced in installing, configuring and using Hadoop Ecosystem components.• Experienced in Importing and exporting data into HDFS and Hive using Sqoop.• Participated in development/implementation of Cloudera Hadoop environment.• Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly onHadoop.• Integrated Cassandra as a distributed persistent metadata store to provide metadata resolutionfor network entities on the network• Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.• Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unittesting.• Used DataStax Cassandra along with Pentaho for reporting. Show less -
Hadoop ConsultantMarlin Capital Solutions Dec 2018 - Jul 2020• Developed and maintained data pipelines using Sqoop, Flume, and Kafka to ingest, transform, and process customer behavioral data for analysis.• Performed data aggregation and analysis on large-scale datasets using Apache Spark, Scala, and Hive, resulting in improved business insights.• Utilized Hadoop, Spark, and Cloudera ecosystems to handle structured, semi-structured, and unstructured data loads and transformations.• Integrated HBase with Hive on the Analytics Zone, optimizing… Show more • Developed and maintained data pipelines using Sqoop, Flume, and Kafka to ingest, transform, and process customer behavioral data for analysis.• Performed data aggregation and analysis on large-scale datasets using Apache Spark, Scala, and Hive, resulting in improved business insights.• Utilized Hadoop, Spark, and Cloudera ecosystems to handle structured, semi-structured, and unstructured data loads and transformations.• Integrated HBase with Hive on the Analytics Zone, optimizing HBase tables for faster and more efficient data querying.• Leveraged Hive queries and SparkSQL for data analysis and processing, meeting specific business requirements and emulating MapReduce functionalities.• Implemented deployment automation using YAML scripts for faster and more efficient builds and releases.• Migrated data from Oracle RDBMS to Hadoop using Sqoop, enhancing data management and processing capabilities. Show less -
Hadoop DeveloperVanguard Aug 2017 - Nov 2018Malvern, Pennsylvania, United States• Worked with business analysts, business stakeholders, SMEs to analyze business requirements.• Migrating data from multiple source systems to Hadoop distributed file systems for data analysis.• Building pipeline for preprocessing and data cleaning using oozie.• Creating UDF’s in python to run on SPARK.• Analyzing client’s behavior using SPARK Data Frame API.• Improving the performance and optimization of the existing algorithms in Hadoop using Spark-SQL and Data Frame… Show more • Worked with business analysts, business stakeholders, SMEs to analyze business requirements.• Migrating data from multiple source systems to Hadoop distributed file systems for data analysis.• Building pipeline for preprocessing and data cleaning using oozie.• Creating UDF’s in python to run on SPARK.• Analyzing client’s behavior using SPARK Data Frame API.• Improving the performance and optimization of the existing algorithms in Hadoop using Spark-SQL and Data Frame API.• Developed Spark jobs in python to perform operations like aggregation, data processing and data analysis.• Worked on Custom Hadoop InputFormat to read Fixed Length ASCII files.• Managing and scheduling jobs on Hadoop cluster using oozie.• Hands on experience with different file formats like TEXT, AVRO, PARQUET, ORC.• Converted SAS scripts to Spark SQL.• Worked on SPARK-HBASE Integration.• Using the Spark framework Enhanced and optimized product Spark code to aggregate, groupand run data mining tasks. Show less -
Hadoop ConsultantQuicken Loans Nov 2015 - Jul 2017• Created a shell script to generate staging and landing tables with the same schema as the source and generate properties for Oozie jobs.• Developed Oozie workflows for executing Sqoop and Hive actions, including working with NoSQL databases like HBase to load large sets of semi-structured data from various sources.• Performed performance optimizations on Spark and Python, resolving performance issues in Spark.• Led the roadmap for migrating enterprise data from multiple sources… Show more • Created a shell script to generate staging and landing tables with the same schema as the source and generate properties for Oozie jobs.• Developed Oozie workflows for executing Sqoop and Hive actions, including working with NoSQL databases like HBase to load large sets of semi-structured data from various sources.• Performed performance optimizations on Spark and Python, resolving performance issues in Spark.• Led the roadmap for migrating enterprise data from multiple sources (SQL Server, provider databases) to Amazon S3 as a centralized data hub.• Loaded and transformed structured and semi-structured data from various downstream systems.• Developed Spark and Hive ETL pipelines for business-specific transformations.• Built applications and automated Spark pipelines for bulk loads and incremental loads of different datasets.• Developed scripts for running Oozie workflows, capturing job logs, and creating a metadata table for job execution times.• Converted existing MapReduce applications to PySpark applications as part of streamlining legacy jobs and creating a new framework. Show less -
Hadoop ConsultantThe Janssen Pharmaceutical Companies Of Johnson & Johnson Nov 2013 - Oct 2015New Jersey, United States● Using the components Ec2,S3,SWS,SQS,RDS,RedShift,EMR in day-to-day activities● developed custom Map Reduce jobs in java for preprocessing and data cleaning.● Hands on experience in Python for streaming MapReduce programs.● Managing and reviewing Hadoop Log files.● Loading and transforming large sets of structured and semi-structured data.● Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.● Involved in daily SCRUM meetings to… Show more ● Using the components Ec2,S3,SWS,SQS,RDS,RedShift,EMR in day-to-day activities● developed custom Map Reduce jobs in java for preprocessing and data cleaning.● Hands on experience in Python for streaming MapReduce programs.● Managing and reviewing Hadoop Log files.● Loading and transforming large sets of structured and semi-structured data.● Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.● Involved in daily SCRUM meetings to discuss the development/progress of Sprints .A● Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.● Developed Simple to complex Map/reduce Jobs using Hive and Pig● Developed Map Reduce Programs for data analysis and data cleaning.● Developed PIG Latin scripts for the analysis of semi structured data. Show less -
Java DeveloperAmerican Water Dec 2011 - Oct 2013● Collaborated with cross-functional teams and stakeholders to gather business requirements.● Involved in requirement analysis and played a key role in project planning.● Successfully completed the Architecture, Detailed Design & Development of modules● Interacted with end users to gather, analyze, and implement the project.● Designed and developed web components and business modules through all tiers frompresentation to persistence.● Used hibernate for mapping from Java… Show more ● Collaborated with cross-functional teams and stakeholders to gather business requirements.● Involved in requirement analysis and played a key role in project planning.● Successfully completed the Architecture, Detailed Design & Development of modules● Interacted with end users to gather, analyze, and implement the project.● Designed and developed web components and business modules through all tiers frompresentation to persistence.● Used hibernate for mapping from Java classes to database tables.● Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries andconfigured in Struts-config.xml, Web.xml files.● Developed UI layout using Dreamweaver.● Developed java beans to interact with UI & database.● Created the end-user business interfaces.● Frequent interaction with client and delivered solution for their business needs.● Developed ANT script for building and packaging J2EE components.● Wrote PL/SQL queries and Stored procedures for data retrieval● Created and modified DB2 Schema objects like Tables, Indexes.● Created Test Plan, Test Cases & scripts for UI testing. Show less
Atiya Rehman Education Details
-
Computer Engineering -
Computer Science
Frequently Asked Questions about Atiya Rehman
What company does Atiya Rehman work for?
Atiya Rehman works for Cloud Front Group
What is Atiya Rehman's role at the current company?
Atiya Rehman's current role is Senior Azure Data engineer.
What schools did Atiya Rehman attend?
Atiya Rehman attended Jawaharlal Nehru Technological University, University Of Houston-Clear Lake.
Not the Atiya Rehman you were looking for?
-
-
Atiya Rehman
Master Of Public Health Graduate From Mcmaster University | Patient Care Assistant At North York General HospitalCanada -
-
1trimedix.co.uk
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial