Rakesh D Email and Phone Number
• Professional 9+years of experience as a Data Engineer and coding with analytical programming. Experience in Google Cloud Platform (GCP) using cloud native tools such as Big Query, Cloud Composer, Cloud Data Proc, Google Cloud Storage, Cloud Dataflow & Cloud Data fusion.• Experience with the use of AWS services includes RDS, Networking, Route 53, IAM, S3, EC2, EBS and VPC and administering AWS resources using Console and CLI.• Hands-on experience in GCP, Big Query, GCS, cloud functions, Cloud dataflow, Pub/Sub, cloud shell, GSUTIL, bq command- line utilities, Data Proc.• Comprehensive working experience in implementing Big Data projects using Apache Hadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie.• Experience working on Hortonworks / Cloudera / Map R.• Excellent working knowledge of HDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers, etc.• In-depth understanding of Apache spark job execution Components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages, and task.• Experience working on Spark and Spark Streaming.• Hands-on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala, and Flume.• Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Spark, Kafka, Storm, Zookeeper, and Flume• Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it• Worked on Performance Tuning to ensure that assigned systems were patched, configured, and optimized for maximum functionality and availability. Implemented solutions that reduced single points of failure and improved system uptime to 99.9% availability• Experience with distributed systems, large-scale non-relational data stores, and multi-terabyte data warehouses.• Experience in handling python and spark context when writing Pyspark programs for ETL.• Experience in managing and reviewing Hadoop log files• Real-time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing, and analysis of data.• Experience in setting up Hadoop clusters on cloud platforms like AWS. • Customized the dashboards and done access management and identity in AWS • Worked on Data serialization formats to convert complex objects into sequence bits using Avro, Parquet, JSON, CSV formats. • Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAF’s.
Ups Capital®
View- Website:
- upscapital.com
- Employees:
- 3295
-
Gcp Data EnginnerUps Capital® Oct 2023 - PresentAtlanta, Georgia, United StatesDesigned and implemented robust data pipelines and infrastructure using Python and SQL, optimizing data movement efficiency and processing times. • Executed comprehensive data quality measures and monitoring systems, ensuring compliance with regulations and enhancing data governance. • Collaborated cross-functionally to translate complex business needs into innovative data solutions, improving data-driven decision-making accuracy. • Engineered and optimized data pipelines using Airflow, Kafka, and Luigi, streamlining data flow and reducing latency. • Implemented data quality tools, Great Expectations and Trifacta, to reduce errors and accelerate the identification of data anomalies. • Utilized cloud technologies, including AWS as the primary platform, to implement cost-effective and scalable solutions. • Engaged in full systems life cycle management activities, providing key insights for analyses, technical requirements, and coding. • Implemented version control using Git, ensuring seamless collaboration and tracking of code changes. • Executed strategic optimizations in data warehousing solutions, including BigQuery, Snowflake, and Redshift. • Played a pivotal role in the adoption of additional cloud platforms like GCP, expanding the company's cloud capabilities. • Conducted training sessions for the team on Python and SQL integration for enhanced data manipulation. • Implemented automated monitoring systems for IoT devices, reducing response time to device issues. • Developed and maintained documentation for data pipelines and infrastructure, streamlining knowledge transfer. • Collaborated with external vendors to seamlessly integrate APIs, reducing data integration time. • Actively participated in industry conferences and forums, staying abreast of the latest trends and technologies. • Assisted in the onboarding of new team members, providing mentorship and support. -
Big Data EngineerSam'S Club Aug 2022 - Sep 2023Bentonville, Arkansas, United States• Experience with complete SDLC process staging code reviews, source code management, and build process.• Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate tasks among the team.• Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval, and processing systems. • Developed PySpark script to merge static and dynamic files and cleanse the data.• Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.• Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.• Experience in moving data between GCP and Azure using Azure Data Factory.• Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery• Developed data pipelines using Flume, Sqoop, Pig, and Map Reduce to ingest data into HDFS for analysis.• Developed PySpark scripts that runs on MSSQL table pushes to Big Data where data is stored in Hive tables.• Developed Python application for Google Analytics aggregation and reporting and used Django configuration to manage URLs and application parameters.• Developed Oozie Workflows for daily incremental loads, which get data from Teradata and then imported into hive tables.• Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries, and writing data into HDFS through Sqoop. • Developed pig scripts to transform the data into a structured format and are automated through Oozie coordinators.• Carried out data transformation and cleansing using SQL queries, Python and Pyspark• Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming.• Wrote Sqoop scripts for importing large data sets from Teradata into HDFS.• Performed Data Ingestion from multiple internal clients using Apache Kafka.• Wrote Map Reduce jobs to discover trends in data usage by the users. -
Sr. Aws Data EngineerAbbvie Oct 2019 - Jul 2022Vernon Hills, Illinois, United States• Developed ETL data pipelines using Sqoop, Spark, Spark SQL, Scala, and Oozie.• Used Spark for interactive queries, processing of streaming data and integrated with popular NoSQL databases.• Experience with AWS Cloud IAM, Data Pipeline, EMR, S3, EC2.• Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances• Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline that can be written to Glue Catalog and can be queried from Athena.• Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash.• Worked on AWS Elastic Beanstalk for fast deploying of various applications developed with Java, PHP, Node.js, Python on familiar servers such as Apache• Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations • Developed PySpark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed• Exploring Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-SQL, Data Frame, pair RDD's, Spark YARN.• Experience with terraform scripts which automates the step execution in EMR to load the data to Scylla DB.• Developed Python AWS server less lambda with concurrent and multi-threading to make the process faster and asynchronously executing the callable.• Implemented scheduled downtime for non-prod servers for optimizing AWS pricing. • Developed Kafka consumer API in Scala for consuming data from Kafka topics.• Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline system using Scala programming. -
Data EngineerBroadridge Feb 2017 - Sep 2019North Hempstead, New York, United States• Designed and developed Oozie and auto sys workflows for various projects • Collaborated with Data Science and Analyst teams to gather requirements and develop business-relevant stories• Developed Spark jobs with python to process JSON Data. Used Spark SQL to perform joins and store the Data in Amazon S3. • Used Glue to maintain Data lakes and data marts with Glue used to filter data between S3, Redshift, and Glacier storage setups and Athena as a quick query for specific use cases such as visualization in Tableau.• Worked on setting up of Hadoop ecosystem & Kafka Cluster on AWS EC2 Instances.• Worked with Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python, and Experience working on AWS EMR Instances.• Implemented job workflow scheduling and monitoring tools like oozie and Zookeeper. • Executed data analysis with Pig and HIVE.• Implement spark using Scala and Spark-SQL for improved processing and testing of Data. • Employed spark to extract the Data from Teradata, Netezza and SQL Server into Mongo DB post applying transformations with Spark RDD. • Developed end-to-end data processing pipelines beginning with using distributed messaging systems in Kafka for persisting data into relevant Data objects.• Implemented Kafka for collecting real-time transaction Data, which was then processed with spark streaming with Python to recommend actionable insights. • Scheduled clusters with Cloud watch and created Lambdas to generate operational alerts for various workflows. • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations, and others during the ingestion process itself. • Worked on various optimization techniques to manage the processing and storage of Big Data in Hadoop. • End to End-to-end ownership of the process, ensuring best practices of Big Data stack. -
Hadoop DeveloperCybage Software Jun 2014 - Nov 2016Hyderabad, Telangana, India• Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop • Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR.• Involved in converting Hive/SQL queries into transformations using Python.• Design and develop Informatica BDE Application and Hive Queries to ingest Landing Raw zone and transform the data with business logic to refined zone and to Green plum data marts for reporting layer for consumption through Tableau. • Installed, configured, and maintained big data technologies and systems. Maintained documentation and troubleshooting playbooks. • Automated the installation and maintenance of Kafka, storm, zookeeper and elastic search using salt stack technology. • Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.• Developed connectors for elastic search and green plum for data transfer from a Kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real-time data processing. • Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform analytics on data in Hive • Exploring with Spark improving the performance and optimization of the existing algorithms Hadoop using Spark context, Spark-SQL, Data Frame, Spark YARN • Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS Good understanding of performance tuning with NoSQL, Kafka, Storm and SQL Technologies
Rakesh D Education Details
-
Bachelor'S Degree
Frequently Asked Questions about Rakesh D
What company does Rakesh D work for?
Rakesh D works for Ups Capital®
What is Rakesh D's role at the current company?
Rakesh D's current role is Senior GCP Data Engineer at UPS | AWS| Big Data | Python | Azure | Pyspark | Spark SQL | Azure Databrick| Hadoop | Snow flake| ETL | SQL | Airflow | Agile | Actively looking for new opportunities..
What schools did Rakesh D attend?
Rakesh D attended Jntuh College Of Engineering Hyderabad.
Who are Rakesh D's colleagues?
Rakesh D's colleagues are Enilepena Peña, Jacqueline Ruffin, Tom Reamer, Nicola Mckenzie-Ramirez, Caroline Guesthier, Bailee Zeitler, Stephen Scruggs.
Not the Rakesh D you were looking for?
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial