• Senior Data Engineer with 5+ years of industrial experience in variety Data Platforms, Tools & Technologies with a strong adherence to personal accountability in both individual and team scenarios.• Strong experience in Software Development Life Cycle including Requirements Analysis, Design Specification and Testing as per cycle in both Waterfall and Agile Methodologies. • Experience in Machine Learning with large data sets of Structured and Unstructured data, Data Acquisition Data Validation, Predictive Modeling, Data Visualisation.• Experience with structured (MySQL, Oracle SQL, Postgre SQL) and unstructured (NoSQL) databases. Strong understanding of relational databases. Familiar with cross platforms ETL using Python/JAVA SQL connector.• In depth knowledge of building Spark applications in Python in cluster/client mode and monitor and perform application health check using spark UI. Good understanding and hands-on working with Spark SQL, RDDs, and data frames and datasets.• Experience with big data tools (Hadoop, Spark, Databricks). Hands on experience with MapReduce, HDFS, PIG and HIVE. • Good working knowledge of Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.• Excellent knowledge of ETL/ELT for data pipelining and create data flows between applications using Airflow. Experience with creating complex SQL codes, functions, and procedures. Experience in building data pipeline in Kafka to fetch the data from OLTP and store in warehouse.• Involved in continuous integration and deployment (CI/CD) using DevOps tools like Looper, Concord
Verizon Wireless Inc.
-
Senior Data EngineerVerizon Wireless Inc. Dec 2019 - Present• Performing a role of a data engineer and database engineer for Verizon’s application database. The role includes creating data pipelines from application databases to Data Lake and warehouse and create stream and batch data processing pipelines.• Maintaining more than 16 PB, 300 nodes Cloudera’s distribution Hadoop production and dev clusters. Perform daily health checks, work on alerts, and other related tasks.• Developed spark applications using Spark SQL and RDDs. Excellent knowledge of Spark Streaming and learning Spark ML.• Designed and developed streaming ETL application that is auto-scheduled (cron independent) and auto-scaled (able to add data extraction job during run time in Python, Bash Script, and PLSQL that extract data from OLTP application databases and send the data to Kafka brokers.• Installed, configured, and maintaining multi-node standalone Kafka-Zookeeper cluster for building streaming data pipelines that stream database health information from 200+ Oracle OLTP databases to the warehouse database.• Developed application in PySpark for processing databases’ health information data and store it on oracle database. In depth knowledge of monitoring spark application through spark UI and good in debugging the failed spark jobs. Maintaining existing spark jobs and make changes as per business need.• Created Hive databases, tables and queries for data analytics as and when needed, integrate them with data processing jobs, and drop them as a part of cleanup. Also working on maintaining HBase database used by other applications.• Created Python Flask application to deploy PLSQL code automatically onto the production database. Created adaptive database alerting system in Python, SQL, and Bash scripting that generates the database alerts almost in real time depending on thresholds which can be change on the fly.
-
Senior Data EngineerConfidential Feb 2019 - Dec 2019Albany, Us• Analysed large and critical datasets using Cloudera, HDFS, MapReduce, Hive, Hive UDF, Pig, Sqoop and Spark.• Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.• Worked with the Spark for improving performance and optimisation of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.• Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.• Developed Dashboard reports on Tableau.• Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS.• Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.• Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.• Optimised MapReduce Jobs to use HDFS efficiently by using various compression mechanisms• Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD’s in Scala.• Expertise in writing Scala code using Higher order functions for iterative algorithms in Spark for Performance considerations.• Experienced in managing and reviewing Hadoop log files• Create and Maintain Tera data Tables, Views, Macros, Triggers and Stored Procedures• Expertise in snowflake to create and Maintain Tables and views.• Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig• Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS• Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards. -
Data EngineerTcs Oct 2017 - Jan 2018Mumbai, Maharashtra, In• Implement Continuous Integration and Continuous Delivery process using GitLab along with Python and Shell scripts to automate routine jobs, which includes synchronise installers, configuration modules, packages and requirements for the applications.• Hands on Django frame work using PyCharm, hands on Airflow workflow management• Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.• Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups (ASG), EBS, Snowflake, IAM, CloudFormation, Route 53, CloudWatch, CloudFront, CloudTrail.• Create and configure elastic load balancers and auto scaling groups to distribute the traffic and to have a cost efficient, fault tolerant and highly available environment.• Manage metadata alongside the data for visibility of where data came from, its linage to ensure and quickly and efficiently finding data for customer projects using AWS Data lake and its complex functions like AWS Lambda, AWS Glue.• Hands on with Redshift Database (ETL data pipe lines from AWS Aurora - MySQL Engine to Redshift)• Develop and implement complex databases and data marts for current Production state for both traditional RDBMS and Hadoop Eco system with the legacy applications and designing the solutions to in corporate new processes & implementation into existing environment.• Build ETL pipeline end to end from AWS S3 to Key, Value store DynamoDB, and Snowflake Data ware house for analytical queries and specifically for cloud data.• Convert into different Data formats for user/business requirements by streaming data pipeline from various sources Snowflake and unstructured data, Dynamo-db.• Perform document modifications and enhancements made to the applications, systems and databases as required by the project. -
Data EngineerSpyry Technologies Jan 2016 - Oct 2017• Migrated the existing data from Tera data/SQL Server to Hadoop and perform ETL operations on it.• Responsible for loading structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions.• Worked on different data formats such as JSON and performed machine learning algorithms in Python.• Created a task scheduling application to run in an EC2 environment on multiple servers.• Strong knowledge of various Data warehousing methodologies and Data modelling concepts.• Created Hive partitioned tables using Parquet & Avro format to improve query performance and efficient space utilisation.• Exported the aggregated data into RDBMS using Sqoop for creating dashboards in the Tableau and developed trend analysis using statistical features.• Responsibilities include Database Design and Creation of User Database.• Created Containers in Docker.• Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.• Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker and Kubernetes.• Used SSIS, NIFI, Python scripts, Spark Applications for ETL Operations to create data flow pipelines and involved in transforming data from legacy tables to Hive, Hbase tables, and S3 buckets for handoff to business and Data scientists to create analytics over the data.• Support current and new services that leverage AWS cloud computing architecture including EC2, S3, and other managed service offerings.• Used advanced SQL methods to code, test, debug, and document complex database queries.• Designed and developed Scala workflows for data pull from cloud-based systems and applying transformations on it.• The ability to develop reliable, maintainable, efficient code in most of SQL, Linux shell, and Python.
Suhas K Education Details
-
The University Of TexasData Science
Frequently Asked Questions about Suhas K
What company does Suhas K work for?
Suhas K works for Verizon Wireless Inc.
What is Suhas K's role at the current company?
Suhas K's current role is Senior Data Engineer at Verizon Wireless Inc..
What schools did Suhas K attend?
Suhas K attended The University Of Texas.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial