I am an Experienced Data Engineer of 5 years with a history of designing and developing robust and scalable data infrastructure, who is passionate about enabling analytics for stakeholders and businesses while using DevOps practices to ensure faster software development cycles.Skills● Developed reliable and scalable data pipelines using both ETL and ELT models● Experienced in Cloud technologies, AWS and Azure for data development● Worked with a range of relational databases/data warehouses/lakehouses such as SQL Server, Postgres, MySQL, Snowflake, Redshift, S3 and Azure Storage● Developed databases and warehouses and automated processes using stored procedure, triggers● Optimized existing SQL databases and tables using indexing, partitioning and SQL query optimization● Strong Python programming and frameworks(Pandas, Polars, PySpark) experience for data pipeline and ETL development● Strong SQL (T-SQL and PL/SQL) scripting and development capability● Orchestration of ETL jobs using Apache Airflow● Utilized big data technologies such as Databricks, Apache Spark, Kafka and Hadoop for large volume data processing● Used DBT to test and manage data transformations● Designed, developed and integrated data warehouse and data lakehouse solutions in the Cloud● Experienced in dimensional database modeling and architecting data infrastructure to enable fast data analytics● Strong visualization dashboarding (Tableau, Power BI) and data analysis capability
-
Senior Data AnalystRbcCanada -
Data Engineer IiRbc Apr 2023 - Present● Designed and developed full end-to-end ETL data pipelines in the cloud Cloud using Python/Spark, Azure Data Factory, Azure Data Lake and Azure SQL for robust data ingestions while keeping processing times low and scalability high● Debugged and updated Databricks notebooks when needed to reduce infrastructure usage and Azure cloud computing costs● Created a template for Git branching strategies and pull requests for seamless collaboration amongst Data Engineers and Analysts● Orchestrated data pipeline ETL/ELT jobs using Apache Airflow with various operators● Frequently performed data ingestion with Python/Spark through REST APIs and worked with JSON data to be ingested and transformed as per business requirements● Utilized Azure Key Vault to ensure data security, protect sensitive information and prevent potential breaches.● Worked with Azure Data Lake and Databricks for large data processing utilizing Spark SQL & PySpark for both datalakehouse and data warehouse development and was responsible for performing governance activities such as data quality checks● Optimized SQL Databases and Data Warehouses via performance tuning, indexing, partitioning etc and SQL query optimization, resulting in lower querying times (~60-70%) and faster data analysis to produce data-driven insights and findings, as well as lightweight Tableau visualization dashboards● Updated SQL data warehouse in Teradata by expanding upon existing warehouse architecture data models to include fields required by various business stakeholders for accurate data reporting● Utilized Apache Kafka to create a streaming ETL data pipeline● Used DBT for table creation and updates, testing, CI/CD and overall database development in SQL Server, Teradata● Created stored procedures for SQL database and data warehouse development, ensuring data integrity and automating tasks to reduce manual workload● Created logging dashboard for data engineering team to monitor server health, average job times etc. -
Data EngineerYataghan Networks Inc. Jul 2021 - Apr 2023Toronto, Ontario, Canada● Implemented an ETL data pipeline orchestration solution by creating a Docker container with Airflow, optimizing scriptscheduling and execution● Spearheaded Git practices, significantly reducing development times by implementing efficient Git branching strategies● Utilized DBT for proper data testing, version controlled SQL Scripting, and documentation to ensure data quality● Displayed a high proficiency in SQL by performing complex querying (INTERSECTs, UNIONs, complex joins andaggregations) as well as using SQL for database development by developing stored procedures and triggers upon existingdata infrastructure● Automated data transformations for reporting purposes through the creation of stored procedures and triggers in variousdatabases such as SQL Server, MySQL and Postgres, reducing manual workload by ~70%● Aided coworkers in refactoring SQL scripts to reduce querying time● Developed data solutions in cloud platforms (Azure and AWS) while minimizing cloud costs, using tools such as AWSRedshift, EMR, Glue, S3 in AWS and Azure Storage, Azure Data Factory, Databricks in Azure, using Python programmingand frameworks (Pandas, PySpark), as well as SQL scripting● Recommended indexing practices to reduce querying time in client’s marketing databases and faster visualizationdashboarding● Designed ETL data pipelines from various sources such as APIs and OLTP databases, worked with structured andsemi-structured data such as JSON, Parquet, CSV and SQL tables, using Python, Apache Spark and SQL● Assisted Data Architect in infrastructure design of Data Lakehouse using Medallion Architecture (Bronze, Silver, Goldlayers), to ensure seamless big data analytics, implemented said infrastructure using Python, SQL and Spark● Developed KPIs and forecasted quarterly and yearly metrics, providing data-driven insights and reporting on ROI fordigital and email marketing campaigns● Utilized CI/CD pipelines for faster data development in Python -
Data EngineerElement Fleet Management Dec 2020 - Jul 2021Mississauga, Ontario, Canada● Designed and Developed end-to-end automated, scalable ETL/ELT data pipelines in AWS Cloud Platform, utilizing services such as AWS EMR, S3, Glue, Python, SQL, PySpark etc. to add to existing data infrastructure and increased data analytics avenues, while keeping data security in mind● Utilized big data technologies such as Spark and Hadoop for high-volume data processing and solutions while using data quality checks to ensure data integrity and usability● Displayed Python programming and Apache Spark proficiency (Pyspark, SparkSQL) by creating data pipelines from both structured and semi-structured data such as CSV, JSON, SQL database tables etc.● Used Python scripting to perform data ingestions from API Calls, using functional programming frameworks for readability● Performed data transformations using Python (Pandas/Polars/Pyspark) and SQL and loaded final tables in Snowflake data warehouse● Expanded the existing Snowflake data warehouse by including new dimension tables using appropriate database design techniques for data integration that enabled new angles of data analysis and business intelligence for Analysts and Scientists, providing new data-driven insights for the business● Reduced manual workload by scheduling and orchestrating Python and Spark scripts through Apache Airflow● Worked with Data Scientists to create a predictive model in order to determine best sites of activity and future endeavors● Assisted in database/data warehouse architecture expansion in Postgres and Snowflake, using existing OLTP databases as sources to enabling faster data analytics and efficient visualization dashboarding in Power BI, while reducing costs● Liaised with Project Managers and Data Analysts about analytics requirements to maximize utility for business● Followed Git branching best-practices to ensure code repository is built correctly -
Data EngineerRbc Jan 2020 - Dec 2020● Engineered automated, scalable ETL data pipelines using Python programming, SQL and ApacheSpark/PySpark/SparkSQL in the Cloud using Azure Databricks, streamlining the ingestion process into Azure Data Lakestorage and the subsequent output into the data warehouse to enable data analysis and business intelligence endeavors● Performed governance by using data quality checks in both Python/Spark scripts as well as SQL queries by using test cases● Developed data pipelines using Pandas and pure Python for smaller ETL tasks that did not require a Spark Cluster,reducing overall costs for our data platform● Orchestrated and scheduled Python and PySpark scripts using Apache Airflow to handle both simple and complexorchestration requirements● Performance tuned and optimized SQL databases in SQL Server and scripts by using practices such as indexing, shardingand partitioning, as well as making changes in scripts to reduce querying time by 70%● Utilized Git for maintaining code repository and followed best practices when branching, creating pull requests● Designed STAR Schema-based data warehouse model, enhancing SQL query efficiency, reproducibility and facilitatinglightweight, self-service visualization dashboard creation in Tableau● Applied database design methodologies to assist in architecture of data lakehouse using Azure, expanding our datainfrastructure and capability for handling big data● Created Script and Documentation templates for Python and SQL scripting, increasing readability and consistency● Worked with cross-functional teams in various departments in order to gather requirements, set expectations and delegatetasks as needed● Monitored data platform server performance to ensure data systems are up and running by using logs to measure the useof data infrastructure and CPU performance history● Followed data security best practices by using Hashicorp Vault to ensure safety of critical credentials -
Data AnalystRbc Jun 2019 - Dec 2019Ontario, Canada● Created visualization dashboards in Tableau to automate reporting, reducing reporting time by several hours per week whiledelivering key insights● Developed, maintained and optimized data infrastructure in SQL Server, reducing query time by over 80% througheffective indexing and query optimization practices● Applied data modeling techniques when creating database designs for seamless data integration with new sources● Utilized various tools such as Python, Pandas and SQL to perform transformations, exploratory data analysis, as well asother forms of data analytics to create data-driven actionable insights from high volumes of data● Performed data ingestion into Python from various sources such as CSVs, databases and APIs in order to wrangle data andcreate final needed products via ETL data pipelines● Followed Python programming best practices such as using functional programming paradigms to ensure ETL scripts arereadable● Developed both simple and complex SQL queries as needed per stakeholder requirements, often requiring complex joins,recursions and aggregations● Developed unique data solutions for stakeholders for various reports, as well creating automated email reporting● Provided regular reporting to internal stakeholders with key KPIs to measure impact and success
Junaid M. Education Details
-
Management Economics And Finance
Frequently Asked Questions about Junaid M.
What company does Junaid M. work for?
Junaid M. works for Rbc
What is Junaid M.'s role at the current company?
Junaid M.'s current role is Senior Data Analyst.
What schools did Junaid M. attend?
Junaid M. attended Lang School Of Business And Economics - University Of Guelph.
Who are Junaid M.'s colleagues?
Junaid M.'s colleagues are Helen Gatama, Mingyu Liang, Michelle Halstead, Paula Peart, Inae Kim, Albert Lam, Brennan Tobin.
Not the Junaid M. you were looking for?
-
Junaid Rafiq P.Eng, M.Eng, CAN-CISEC
Whitby, On -
Junaid Khan M.Eng, P.Eng. IntPE, PMP®,ENV SP
Greater Toronto Area, Canada -
Muhammad Junaid Afzal (M.Sc.)
Program Manager @ Serf | Pepsico Regenerative Agriculture, Western CanadaWinnipeg, Mb
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial