Rahul V is a Senior Data Engineer at Apex Health.
-
Data EngineerApex Health Jan 2023 - PresentTampa, Fl, Us● Established data pipelines that continuously synchronized data from Azure Cosmos DB to Azure Data Lake for downstream processing and analysis, ensuring data freshness for real-time reporting.● Implemented data governance and access policies for Azure Data Lake, utilizing Azure Active Directory (AAD) integration and role-based access control (RBAC) to secure sensitive data.● Built complex data transformations and aggregations in Azure Databricks using PySpark, Scala, and SQL, preparing data for machine learning model training and business reporting.● Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap. Installing, configuring, and maintaining Data Pipelines● Developed the features, scenarios, step definitions for BDD (Behavior Driven Development) and TDD (Test Driven Development) using Cucumber, Gherkin and ruby.● Designing the business requirement collection approach based on the project scope and SDLC methodology.● Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.● Files extracted from Hadoop and dropped on daily hourly basis into S3. Working with Data governance and Data quality to design various models and processes. -
Big Data DeveloperCapital One Aug 2019 - Dec 2021Mclean, Va, Us● Developed Spark applications using Pysparkand Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.● Used SSIS to build automated multi-dimensional cubes and Importing Table definitions and Metadata using IBM InfoSphere DataStage Manager● Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra● Develop Spark streaming application to read raw packet data from Kafka topics, format it to JSON and push back to kafka for future use case’s purpose.● Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.● Analyzed existing SQL and IBM InfoSphere DataStage jobs for better performance.● Validated the test data in DB2 tables on Mainframes and on Teradata using SQL queries.● Identified and documented Functional/Non-Functional and other related business decisions for implementing Actimize-SAM to comply with AML Regulations. ● Developed Shell scripts for running IBM InfoSphere DataStage Jobs and transferring files to other internal teams and External vendors. -
Hadoop DeveloperBank Of America Jan 2016 - Aug 2019Charlotte, Nc, Us● Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.● Developing Spark scripts, UDFS using both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.● Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and other compressed file formats Codecs like gZip, Snappy, Lzo.● Strong understanding of Partitioning, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.● Interacted with business partners, Business Analysts, and product owners to understand requirements and build scalable distributed data solutions using Hadoop ecosystem.● Developed Spark Streaming programs to process near real time data from Kafka, and process data with both stateless and state full transformations. ● Experience in report writing using SQL Server Reporting Services (SSRS) and creating various types of reports like drill down, Parameterized, Cascading, Conditional, Table, Matrix, Chart and Sub Reports.● Used DataStax Spark connector which is used to store the data into Cassandra database or get the data from Cassandra database.● Wrote Oozie scripts and set up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs. -
Software EngineerMindtree May 2013 - Dec 2015Bangalore, Karnataka, In● Implemented Lambda to configure Dynamo DBAutoscaling feature and implemented Data Access Layer to access AWS DynamoDB data.● Automated nightly build to run quality control using Python with BOTO3 library to make sure pipeline does not fail which reduces the effort by 70%.● Worked on AWSServices like AWS SNS to send out automated emails and messages using BOTO3 after the nightly run.● Worked on the development of tools which automate AWS server provisioning, automated application deployments, and implementation of basic failover among regions through AWS SDK’s.● Created AWS Lambda, EC2 instances provisioning on AWS environment and implemented security groups, administered Amazon VPC's.● Used Jenkins pipelines to drive all micro-services builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes.● Involved with development of Ansible playbooks with Python and SSH as wrapper for management of AWS node configurations and testing playbooks on AWS instances.● Developed PythonAWSserverless lambda with concurrent and multi-threading to make the process faster and asynchronously executing the callable.● Implemented CloudTrail to capture the events related to API calls made to AWS infrastructure.● Monitored containers in AWS EC2 machines using Datadog API and ingest, enrich data into the internal cache system.● Chunking the data to convert them from larger data sets to smaller chunks using python scripts which will be useful for faster data processing.
Rahul V Education Details
-
Jntuh College Of Engineering HyderabadComputer Science
Frequently Asked Questions about Rahul V
What company does Rahul V work for?
Rahul V works for Apex Health
What is Rahul V's role at the current company?
Rahul V's current role is Senior Data Engineer.
What schools did Rahul V attend?
Rahul V attended Jntuh College Of Engineering Hyderabad.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial