Passionate Big Data Engineer and Data Analyst with around 4 years of hands-on experience in designing, developing, and deploying data-driven solutions for enterprise applications. Proven expertise in leveraging major cloud platforms including AWS and Azure for big data products and tasks, alongside a solid background in implementing real-time data analytics using Spark Streaming, Kafka, and Flume. Skilled in Python, PySpark, and Scala for data pipeline creation and proficient in multiple relational databases such as Oracle, MySQL, and SQL Server. Demonstrated proficiency in using Hadoop, Spark, Hive, and other Big Data technologies to handle large-scale data. Currently seeking new opportunities in Big Data Engineering where I can leverage my skills and experience to drive business insights and value from data. Open to connect with professionals and opportunities in this domain. TECHNICAL SKILLS--------------------------------------------------------------------------------------------------------Big Data Technologies: Hadoop (HDFS, MapReduce), Spark (Scala), PySpark, Hive, Kafka-Storm, Pig, Sqoop, Oozie,Cassandra, Apache Druid, Snowflake.Big Data distribution: Cloudera, Hortonworks, Amazon EMRProgramming Languages: Java, Python, Scala, Unix Shell scripting, Spark SQL, HiveQL, C, C++Databases: MySQL, MS-SQL Server, NoSQL (HBase, MongoDB)Java Technologies: JSP, Servlets, JDBC, Junit.Web Technologies: HTML, XML, JavaScript, jQuery, CSS, Python.Operating Systems: Windows, Linux (Ubuntu)Cloud Services: AWS (EC2, VPC, EBS, S3, AMI, SQS, SNS, RDS, Cloud Watch, DynamoDB, IAM), GCP (Big Query,Composer, Cloud Dataproc)Reporting/ETL Tools: Tableau, QlikView, Informatica, Pentaho.Servers: Apache Tomcat, WebSphere, JBoss.
-
Senior Data EngineerManulifeKitchener, On, Ca -
Data EngineerTd Nov 2021 - PresentToronto, Ontario, Canada• Involved in requirements gathering, analysis, design, development, change management, deployment.• Involved in the development of real time streaming applications using PySpark, Apache Flink, Kafka, Hive on distributed Hadoop Cluster.• Utilizing Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib.• Extracting data from heterogeneous sources and performed complex business logic on network data to normalize raw data which can be utilized by BI teams to detect anomalies.• Designing and developing Flink pipelines to consume streaming data from Kafka and applied business logic to massage and transform and serialize raw data.• Developing common Flink module for serializing and deserializing AVRO data by applying schema.• Developing Spark streaming pipeline to batch real time data, detect anomalies by applying business logic and write the anomalies to HBase table.• Implementing layered architecture for Hadoop to modularize design. Developed framework scripts to enable quick development. Designed reusable shell scripts for Hive, Sqoop, Flink and PIG jobs. Standardize error handling, logging and metadata management processes.• Indexed processed data and created dashboards and alerts in splunk to be utilized/ action by support teams.• Responsible for operations and support of big data Analytis platform, Splunk and Tableau visualization.• Managed, developed, and designed a dashboard control panel for customers and Administrators using Tableau, PostgreSQL and RESTAPI calls. Skills: Apache Spark · PySpark · Scala · Extract, Transform, Load (ETL) · hdfs · Hadoop · Continuous Integration and Continuous Delivery (CI/CD) · Google BigQuery · Microsoft Azure · -
Scala DeveloperCubic Corporation Jul 2020 - Aug 2021Hyderabad, Telangana, India• Designed Batch Audit Process in batch\shell script to monitor each ETL job along with reporting status which includes table name, start and finish time, number of rows loaded, status, etc.• Developed python scripts to automate data ingestion pipeline for multiple data sources and deployed Apache Nifi in AWS.• Design and develop Tableau visualizations which include preparing Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies.• Involved in implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches. • Developed Spark batch job to automate creation/metadata update of external Hive table created on top of datasets residing in HDFS. • Developed Data Serialization spark common module for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats. • Worked on ER Modeling, Dimensional Modeling (StarSchema, Snowflake Schema), Data warehousing and OLAP tools. • Populated HDFS and PostgreSQL with huge amounts of data using Apache Kafka. • Design and develop Rest API (Commerce API) which provides functionality to connect to the PostgreSQL through Java services. • Designed Batch Audit Process in batch\shell script to monitor each ETL job along with reporting status which includes table name, start and finish time, number of rows loaded, status, etc. • Developed Spark jobs in PySpark to perform ETL from SQL Server to Hadoop. • Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.Skills: PySpark · Scala · Azure Databricks · Bash · SQL · Hive · Amazon Web Services (AWS) · Python (Programming Language) · JSON · MapReduceSkills: PySpark · Scala · Azure Databricks · Bash · SQL · Hive · Amazon Web Services (AWS) · Python (Programming Language) · JSON · MapReduce -
Etl DeveloperIce Jun 2019 - Jun 2020Hyderabad, Telangana, India • Responsible for analysis of requirements and designing generic and standard ETL process to load data from different source systems• The developed objects are tested for unit/component testing and prepared test cases document for mappings/sessions/workflows.• Involved in the daily status meeting and interacting with onshore team through mails/calls to follow up the module and to resolve the data/code issues.• Handled Classification System part of the project which involved loading of the data based on some pre conditions• Understand the existing business model and customer requirements.• Involved in developing and documenting the ETL (Extract, Transformation and Load) strategy to populate the Data Warehouse from various source systems• Involved in Data Extraction, Staging, Targeting Transformation and Loading.• Involved in testing at the data base end and reviewing the Informatica Mappings as per the business logic• Listed out the issues that was not according to business requirement, developed some maps and changes for other maps.• Wrote several test cases, identifying the issues that can occur, understanding the date merge, match process.
Imran Yaba Education Details
-
Nalla Malla Reddy Engineering College
Frequently Asked Questions about Imran Yaba
What company does Imran Yaba work for?
Imran Yaba works for Manulife
What is Imran Yaba's role at the current company?
Imran Yaba's current role is Senior Data Engineer.
What schools did Imran Yaba attend?
Imran Yaba attended Jawaharlal Nehru Technological University, Nalla Malla Reddy Engineering College.
Who are Imran Yaba's colleagues?
Imran Yaba's colleagues are Gian Carlo Manero, Vanas Wang, Kenny Ng, Rose Diaz, Donna Mclean, Erika Cruz, Katarina Cerovečki Jurman.
Not the Imran Yaba you were looking for?
-
-
Imran Bin Yaba musheerabad
Hyderabad
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial