10 years of experience in Analyzing, Designing, Developing and Implementation of data, architecture, frameworks as a Senior Data Engineer. Specialized in Data Warehousing, Decision support Systems and extensive experience in implementing Full Life cycle Data Warehousing Projects and in Hadoop/Big Data related technology experience in Storage, Querying, Processing, analysis of data. Software development involving cloud computing platforms like Amazon Web Services (AWS), Azure and Google Cloud (GCP). Expertise in using SSIS and Informatica to extract, transform, and load data from a variety of sources and targets Created reusable Snowflake UDFs, Stored Procedures and Views to standardize constructs across multiple pipelines. Established near-zero maintenance security policies, access controls and MFA for Snowflake account. Developed and maintained ETL processes to move data from source systems to Snowflake. Strong proficiency in SQL concepts, Presto SQL, Hive SQL, Python (Pandas, NumPy, SciPy, Matplotlib), Scala, Java, and Spark to handle large volumes of data. Expertise in Database design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema and Snowflake Schema Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Zookeeper and Flume. Hands-on experience in designing and implementing data engineering pipelines and analyzing data using AWS stack like AWS EMR, AWS Glue, EC2, AWS Lambda, Athena, Redshift, Sqoop and Hive. Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems like Teradata, Oracle, SQL Server and vice-versa. Developed and maintained SSIS packages to extract, transform, and load (ETL) data from a variety of sources, including flat files, relational databases, and XML files. Used SSIS to automate data movement and processing, including data cleaning, validation, and aggregation. Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, and other services in AWS. Deep understanding of Cloud Architectures including AWS, Azure, GCP. Adding columns to snowflake views, creating Snowflake tables by accessing data from Teradata. Strong proficiency in SQL concepts, Presto SQL, Hive SQL, Python (Pandas, NumPy, SciPy, Matplotlib), Scala, and Spark to handle large volumes of data.
Cvs Health Care
-
Sr Data EngineerCvs Health Care Oct 2022 - PresentTexas, United States Developed Spark programs to parse the raw data, populate staging tables, and store the refined data in partitioned tables in the Enterprise Data warehouse. Developed Streaming applications using PySpark to read from the Kafka and persist the data NoSQL databases such as HBase and Cassandra. Implemented PySpark Scripts using Spark SQL to access hive tables into a spark for faster processing of data. Performed upgrades, scaling actions and zero-downtime migrations for Snowflake deployments handling over 100k DAU. Carried out POCs to validate adoption of Snowflake for analytics, data science and data lake use cases. evangelized best practices for Snowflake among developers and analysts to optimize working with cloud data platform. Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software. Implemented Responsible AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, and Auto scaling groups, Optimized volumes and EC2 instances. Wrote Terraform templates for AWS Infrastructure as a code to build staging, production environments & set up build & automation for Jenkins. Developed and optimized Snowflake data models for 30+ analytics use cases across sales, marketing and finance departments. Loaded over 5TB of structured and semi-structured data into Snowflake from S3, Kafka and DBs using Tasks and Streams. Migrating an entire oracle database to Big Query and using powerBI for reporting. Developed streaming and batch processing applications using PySpark to ingest data from the various sources into HDFS Data Lake. Developed DDLs and DMLs scripts in SQL and HQL for analytics applications in RDBMS and Hive. Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well.
-
Sr Data EngineerFifth Third Bank Nov 2020 - Sep 2022Cincinnati, Ohio, United States Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users. Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users. Transformed and analyzed the data using Pyspark, HIVE, based on ETL mappings. Developed spark programs and created the data frames and worked on transformations. Analyzed large and critical datasets using HDFS, HBase, Hive, Scala, HQL, PIG, Sqoop, Kubernetes and Zookeeper. Develop dashboards in poker to visualize teh suspicious patterns/activities in real time for business users. Integrated data from multiple sources into SAP. Tested data transformations to ensure accuracy and completeness. Good at creating a Star/Snowflake scheme depending on the requirement and creating complex views or stored procedure to design the required fact & dimension tables for the tabular model and then establishing relationships. Developed data pipeline using Flume, Kafka, and Spark Stream to ingest data from their weblog server and apply the transformation. Performed data analysis and profiling of source data to better understand the sources. Work related to downloading Big Query data into pandas or Spark data frames for advanced ETL capabilities. Carried out data transformation and cleansing using SQL queries, Python and Pyspark. Wrote scripts in Hive SQL for creating complex tables with high-performance metrics like partitioning, clustering, and skewing. Created ETL Pipeline using Spark and Hive for ingest data from multiple sources. Was responsible for ETL and data validation using SQL Server Integration Services. -
Data EngineerMolina Healthcare Oct 2019 - Oct 2020Bothell, Washington, United States Enhanced building a centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift, Lambda, and Glue. Developed the scripts to automate the ingestion process using Pyspark as needed through various sources such as API, AWS S3, Teradata, and Redshift. Implemented AWS Redshift by Extracting, transforming, and loading data from various heterogeneous data sources and destinations. Deployed Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running Adhoc queries, this allowed for a more reliable and faster reporting interface, giving sub-second query responses for basic queries. Build a real-time streaming pipeline utilizing Kafka, Spark Streaming, and Redshift. Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud. Built a series of Spark applications and Hive scripts to produce various analytical datasets needed for digital marketing teams. Worked extensively on fine-tuning spark applications and providing production support to various pipelines running in production. Leverage graphing capabilities for analysing healthcare data. Worked on automating the infrastructure setup, launching, and termination of EMR clusters. Worked on creating Kafka producers using Kafka Producer API for connecting to external Rest live stream applications and producing messages to Kafka topic. Implemented ETL Migration services by developing the AWS Lambda functions to generate a serverless data pipeline that can be written to AWS Glue Catalog and queried from AWS Athena. Developed Python script using Boto3 library to download files from AWS S3 bucket and utilized Python script in SSIS package for ETL processing through SQL stored procedures. Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and Pyspark. Developed robust pipelines to transfer data from the s3 bucket to the Redshift database using Python and AWS Glue -
Data EngineerGrape Soft Solution Nov 2016 - Aug 2019Bengaluru, Karnataka, India Enhanced end-to-end development of Data Warehouses/Data Marts/Data Lakes with ETL tools like Informatic power center, and Big Data platform (PySpark, Hive, Hadoop Ecosystem) environments. Created Databricks notebooks using Spark SQL, Scala, Python, and Automated notebooks using jobs. Well-versed in migrating data from an on-premises environment to Azure Environment by using Azure Data Factory. Analyzed and transformed the on-premises SQL scripts and designed the solution to implement using PySpark in Databricks. Developed data ingestion jobs using different steaming services like Apache Kafka into different data storage services in Azure and other enterprise data stores. Developed the scalable data pipelines in Azure Databricks and ingested the enrichment on the gold layer of the data lake. Built customer operator tasks in Azure Integration Services using Python-based data pipeline use cases. Experienced ETL resource in automating the data processing pipelines using Azure Integration Services and monitoring the workflows end to end. Extensively worked towards performance tuning/optimization of queries, contributing to a 15% -30% improvement in the deployed code using partitioning, and clustering techniques. Developed templates to trigger the Azure jobs from requests and integrated the Dataflow with Azure for storage. Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and Azure ‘big data’ technologies. Built and automated data engineering ETL pipeline over Snowflake DB using Apache Spark and integrated data from disparate sources with Python APIs like PySpark and consolidated them in a data mart (Star schema). Experienced in managing the Terabytes of historical data stored in cloud storage like Azure cloud Storage. Worked in implementing, Building, and Deployment of CI/CD pipelines.
-
Big Data EngineerWellness Center Broadridge Financial Solutions Oct 2014 - Oct 2016Hyderabad, Telangana, India Performance analysis and fixing issues for Spark Jobs to optimize the execution time to reduce the cost of execution resources. Ingested hundreds of millions of records daily from diverse data sources using Cloudera Hadoop cluster in the big data Hadoop ecosystem. Utilized Zookeeper for coordination and management of distributed systems. Managed large datasets in a columnar format within a NoSQL data store. Utilized Kafka, Elasticsearch, and complex regex for data ingestion, cleaning, and transformation for machine learning projects. Utilized AWS services such as S3, Glue, and Athena to build data catalogs and enable efficient data search. Set up Google Cloud Platform (GCP) Dataproc clusters for data transformation and analytics in Google Cloud Storage. Configured and managed NiFi data flow clusters for efficient data movement. Used Pentaho for building ETL pipelines, integrating with Kafka, Elasticsearch, AWS S3. Extracted and mapped data from various formats, including XML, JSON, binary, and base64 encoding/decoding. Demonstrated expertise in the full software development cycle. Applied object-oriented principles to design and implement scalable and maintainable solutions. Worked with distributed systems, multiple-tier Service-Oriented Architecture (SOA), Java development, Scala, and Shell Scripting.
Dinesh V Education Details
Frequently Asked Questions about Dinesh V
What company does Dinesh V work for?
Dinesh V works for Cvs Health Care
What is Dinesh V's role at the current company?
Dinesh V's current role is Sr Data Engineer at CVS HEALTH CARE.
What schools did Dinesh V attend?
Dinesh V attended Andhra University.
Not the Dinesh V you were looking for?
Free Chrome Extension
Find emails, phones & company data instantly
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial