Accomplished IT professional with 8+ years of experience, specialized in Big Data ecosystem - DataAcquisition, Ingestion, Modeling, Storage Analysis, Integration, and Data Processing. A Data Science enthusiast with strong Problem solving, Debugging and Analytical capabilities, whoactively engages in understanding and delivering business requirements. Closely collaborated with business products, production support, engineering team on a regular basisfor Diving deep on data, Effective decision making and to support Analytics platforms. Strong Hadoop and platform support experience with all the entire suite of tools and services in majorHadoop Distributions - Cloudera, Amazon EMR, Azure HDInsight, and Hortonworks. Extensive working experience with Big data ecosystem - Hadoop (HDFS, MapReduce, Yarn), Spark,Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume, Nifi. Sound Experience with AWS cloud (EMR, EC2, RDS, EBS, S3, Kinesis, Lambda, Glue, Athena,Elasticsearch, SQS, DynamoDB, Redshift, ECS) Working knowledge on Azure cloud components (HDInsight, Databricks, DataLake, Blob Storage,Data Factory, Storage Explorer, SQL DB, SQL DWH, CosmosDB). Excellent knowledge of Hadoop cluster architecture and its key concepts - Distributed file systems,Parallel processing, High availability, Fault tolerance and Scalability. Obtained and processed data from Enterprise applications, Clickstream events, API gateways,Application logs and database updates. Proficient at writing MapReduce jobs and UDF’s to gather, analyze, transform, and deliver the data asper business requirements. Expertise in building PySpark and Spark-Scala applications for interactive analysis, batch processing,and stream processing. Acquired profound knowledge in developing production ready Spark applications utilizing Spark Core,Spark Streaming, Spark SQL, DataFrames, Datasets and Spark-ML. Experienced in writing Spark scripts in Python, Scala, Java and SQL for development and analysis. Proficient at using Spark API’s for streaming real time data, staging, cleansing, applyingtransformations and preparing data for machine learning needs. Worked with various streaming ingest services like Kafka, Kinesis, flume, and JMS. Involved in end to end implementation of Enterprise Data Lakes with Batch and Real-time processingusing Spark streaming, Kafka, Flume and Sqoop. Extensive experience in development of Bash scripting, T-SQL, and PL/SQL Scripts.
-
Sr. Data EngineerDirectv Jan 2020 - PresentEl Segundo, Ca, Us• Involved in building a data pipeline and performed analytics using AWS stack (EMR, EC2, S3, RDS, Lambda, Kinesis, Athena, Glue, SQS, Redshift, and ECS).• Worked with Data Science team running Machine Learning models on Spark EMR cluster and delivered the data needs as per business requirements.• Automated the process of transforming and ingesting terabytes of monthly data in Parquet format using Kinesis, S3, Lambda and Airflow.• Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables. Utilized Spark’s in memory capabilities to handle large datasets on S3 Data lake.• Developed Spark jobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases.• Migrated Java analytical applications into Scala. Used Scala where performance and logic is critical.• Created workflows using Airflow to automate the process of extracting weblogs into S3 Datalake.• Involved in developing batch and stream processing applications that require functional pipelining using Spark Scala and Streaming API.• Involved in extracting and enriching multiple Cassandra tables using joins in SparkSQL. Also converted Hive queries into Spark transformations.• Hands-on experience on API design and development using Spring Boot for Data movement across different systems.• Fetched live data from Oracle database using Spark Streaming and Amazon Kinesis using the feed from API Gateway REST service.• Performed ETL operations using Python, SparkSQL, S3 and Redshift on terabytes of data to obtain customer insights.• Performed interactive Analytics like cleansing, validation and quality checks on data stored in S3 buckets using AWS Athena. -
Senior Data EngineerGlobal Atlantic Financial Group Mar 2017 - Dec 2019New York, Ny, Us• Experience in working with Azure cloud platform (HDInsight, Databricks, DataLake, Blob Storage, Data Factory, Synapse, SQL DB, SQL DWH and Data Storage Explorer).• Involved in building an Enterprise DataLake using Data Factory and Blob storage, enabling other teams to work with more complex scenarios and ML solutions.• Used Azure Data Factory, SQL API and Mongo API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).• Developed Spark Scala scripts for mining data and performed transformations on large datasets to provide real time insights and reports.• Supported analytical platform, handled data quality, and improved the performance using Scala’s higher order functions, lambda expressions, pattern matching and collections.• Implemented scalable microservices with Scala and Akka to handle concurrency and high traffic. Optimized existing Scala code and improved the cluster performance.• Performed data cleansing and applied transformations using Databricks and Spark data analysis.• Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.• Designed and automated Custom-built input adapters using Spark, Sqoop and Airflow to ingest and analyze data from RDBMS to Azure Datalake.• Reduced access time by refactoring data models, query optimization and implemented Redis cache to support Snowflake.• Involved in developing automated workflows for daily incremental loads, moved data from RDBMS to Data Lake.• Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from MS SQL to CosmosDB and improved the query performance.• Created Automated ETL jobs in Talend and pushed the data to Azure SQL data warehouse.• Managed resources and scheduling across the cluster using Azure Kubernetes Service.• Used Azure DevOps for CI/CD, debugging and monitoring jobs and applications. Used Azure Active Directory and Ranger for security. -
Big Data DeveloperHomesite Insurance Jun 2016 - Feb 2017Boston, Ma, Us• Responsible for building scalable and distributed data solutions using Cloudera CDH.• Involved in migrating large amounts of data from on-prem Cloudera cluster to EC2 instances deployed on Elastic MapReduce (EMR) cluster.• Gathered data and performed analytics using AWS stack (EMR, EC2, S3, RDS, Lambda, Redshift).• Developed an ETL pipeline using to extract archived logs from disparate sources and stored in S3 data lake. Used Cron and AutoSys schedulers for weekly automation.• Implemented Spark Scala UDF's to handle data quality, filter and validate data sets, Also Involved in converting Java analytical applications to Scala.• Involved in converting Java MapReduce jobs to Scala UDF’s and improved the performance.• Analyzed and optimized pertinent data stored in Snowflake using PySpark and SparkSQL.• Worked with Impala for massive parallel processing of queries for ad-hoc analysis. Designed and developed complex queries using Hive and Impala for a logistics application.• Developed Sqoop jobs for data ingestion, incremental data loads from RDBMS to Snowflake.• Created Bash scripts to add dynamic partitions to Hive staging tables. Responsible for loading bulk amount of data into HBase using MapReduce jobs.• Loaded data from web servers using Flume and Spark Streaming API. Used flume sink to write directly to indexers deployed on cluster, allowing indexing during ingestion.• Involved in creating broker topics, producers, and consumers to monitor, process and archive live streaming data.• Co-ordinated with Kafka team and built an on-premise data pipeline. supported Kafka Integrations, performance tuning and identified bottlenecks to improve performance and throughput.• Used StreamSets for analytics and Involved in debugging and optimizing data pipelines collecting logs and metrics from various application APIs. -
Big Data DeveloperBroadridge Nov 2014 - May 2016New York, New York, Us• Worked with Hortonworks distribution. Installed, configured, and maintained a Hadoop cluster based on the business requirements.• Experience with Apache BigData components like HDFS, MapReduce, YARN, Hive, HBase, Sqoop, Pig, Ambari and Nifi.• Involved in end to end implementation of ETL pipelines using Python and SQL for high volume analytics, also reviewed use cases before on boarding to HDFS.• Responsible to load, manage and review terabytes of log files using Ambari web UI.• Involved in writing rack topology scripts and Java map reduce programs to parse raw data.• Migrated from JMS solace to Apache Kafka, used Zookeeper to manage synchronization, serialization, and coordination across the cluster.• Used Sqoop to migrate data between traditional RDBMS and HDFS. Ingested data, from MS SQL, Teradata, and Cassandra databases.• Identified required tables, views and exported them into Hive. Performed ad-hoc queries using Hive joins, partitioning, bucketing techniques for faster data access.• Used Nifi to automate the data flow between disparate systems. Designed dataflow models and complicated target tables to obtain relevant metrics from various sources.• Developed Bash scripts to get log files from FTP server and executed Hive jobs to parse them.• Performed data analysis using HiveQL, Pig Latin and custom MapReduce programs in Java.• Enhanced scripts of existing Python modules. Worked on writing APIs to load the processed data to HBase tables.• Migrated ETL jobs to Pig scripts to apply joins, aggregations, and transformations.• Used Power BI as a front-end BI tool and MS SQL Server as a back-end database to design and develop dashboards, workbooks, and complex aggregate calculations.• Troubleshooted defects by identifying root cause and fixed them during QA phase.• Used Jenkins for CI/CD and SVN for version control. -
Data AnalystNationwide Jun 2012 - Oct 2014Us• Played a key role in gathering business requirements, system and design requirements, gap analysis, use case diagrams and flow charts.• Performed ETL operations using with Informatica power center to - data extraction, staging, apply transformations and stored in target data centers.• Parsed complex files using Informatica Data Transformations (normalizer, Lookup, Source Qualifier, Expression, Aggregator, Sorter, Rank and Joiner) and loaded them into databases.• Created complex SQL queries and scripts to extract, aggregate and validate data from MS SQL, Oracle, and flat files using Informatica and loaded into a single data warehouse repository.• Involve in creating database objects like tables, views, stored procedures, triggers, packages, and functions using T-SQL to provide structure and maintain data efficiently.• Designed SQL, SSIS, and Python based batch and real-time ETL pipelines to extract data from transactional and operational databases and load the data into target databases/data warehouses.• Involved in writing python scripts to extract data from different API’s.• Responsible for collecting, scrubbing, and extracting data, generated compliance reports using SSRS, analyzed and identified market trends to improve product sales.• Performed data profiling, answered complex business questions by providing data to business users.• Generated DDL and created the tables and views in the corresponding architectural layers.• Extract, transform and analyze measures/indicators from multiple sources to generate reports, dashboards, and analytical solutions.
Frequently Asked Questions about Kiran D
What company does Kiran D work for?
Kiran D works for Directv
What is Kiran D's role at the current company?
Kiran D's current role is Sr. Big Data Engineer at DIRECTV.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial