• 8+ years of overall experience in IT industry and 7+ years of hands-on experience as a Data Engineer with expertise in comprising designing, developing, and implementing data models and data pipelines for enterprise-level applications using Big Data tools and cloud technologies such as AWS, Azure. • Experienced in working with Azure cloud platforms (HDInsight, Data Lake, Databricks, Blob Storage, Data Factory, Azure Functions, Azure SQL data warehouse, and Synapse). • Proficient in migrating on-premises data sources to Azure data lake, Azure SQL Database, Databricks, and Azure SQL Data warehouse using Azure Data factory and granting access to the users. • Experienced in Developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into customer usage patterns. • Experienced with Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, and Relational Data Ingestion. • Experienced with the use of AWS services including S3, EC2, SQS, RDS, Neptune, EMR, Kinesis, Lambda, Step Functions, Terraform, Glue, Redshift, Athena, DynamoDB, Elasticsearch, Service Catalog, CloudWatch, IAM and administering AWS resources using Console and CLI. • Hands-on experience in building the infrastructure necessary for the best data extraction, transformation, and loading from a range of data sources using NoSQL and SQL from AWS & Big Data technologies (Dynamo, Kinesis, S3, HIVE/Spark) • Developed and deployed a variety of Lambda functions using the built-in AWS Lambda Libraries and Lambda functions written in Scala and using custom libraries. • Capable of using AWS utilities such as EMR, S3, and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS). • Strong knowledge in working with Amazon EC2 to provide a complete solution for computing, query processing, and storage across a wide range of applications. • Experienced in configuring Spark Streaming to receive real-time data from Apache Kafka and store the stream data to HDFS and expertise in using spark-SQL with various data sources like JSON, Parquet, and Hive. • Extensively used Spark Data Frames API over the Cloudera platform to perform analytics on Hive data and used Spark Data Frame Operations to perform required Validations in the data. • Expertise in developing production-ready Spark applications utilizing Spark-Core, Data Frames, Spark-SQL, Spark-ML, and Spark-Streaming API.
Molina Health Care
-
Senior Data EngineerMolina Health Care Jun 2023 - PresentSt Paul, Minnesota, United States• Extensively worked with Azure cloud platform (HDInsight, Data Lake, Databricks, Blob Storage, Data Factory, Synapse, SQL, SQL DB, DWH, and Data Storage Explorer). • Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processed the data in In Azure Databricks. • Designed and configured Azure Cloud relational servers and databases, analyzing current and future business requirements. • Developed data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. • Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes. • Developed robust ETL pipelines in Azure Data Factory (ADF) using Linked Services from different sources and loaded them into Azure SQL Datawarehouse. • Developed Elastic pool databases and scheduled Elastic jobs to execute T-SQL procedures. • Developed Spark applications in azure Databricks using Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns. • Proficient in performing ETL operations in Azure Databricks by connecting to different relational database source systems using JDBC connectors. • Migrated data from Azure Blob storage data to Azure Data Lake using Azure Data Factory (ADF). • Developed the robust & scalable ETL Azure Data Lake to Data warehouse applications for Medicaid and Medicare data using the Azure Databricks. • Built and automated data engineering ETL pipeline over Snowflake DB using Apache Spark and integrated data from disparate sources with Python APIs like PySpark and consolidated them in a data mart (Star schema). • Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Databricks.
-
Data EngineerAmway Sep 2021 - May 2023Ada, Michigan, United States• Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53,S3, RDS, Dynamo DB, SNS, SQS, and IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation. • Developed the AWS Data pipelines from various data resources in AWS including AWS API Gateway to receive responses from AWS Lambda and retrieve data and converted responses into JSON format and stored them in AWS redshift. • Developed the scalable AWS Lambda code in Python for nested JSON files, converting, comparing, sorting, etc. • Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. • Optimized the performance and efficiency of existing spark jobs and converted the Map-reduce script to spark SQL. • Experienced in collecting data from an AWS S3 bucket in real time using Spark Streaming, doing the appropriate transformations and aggregations, and persisting the data in HDFS. • Implemented AWS glue catalog with crawler to get the data from S3 and perform SQL query operations. • Developed robust and scalable data integration pipelines to transfer data from the S3 bucket to the RedShift database using Python and AWS Glue. • Built and maintained the Hadoop cluster on AWS EMR and has used AWS services like EC2 and S3 for small data sets processing and storage. • Developed Python code for different tasks, dependencies, and time sensors for each job for workflow management and automation using the Airflow tool. • Scheduling Spark/Scala jobs using Oozie workflow in Hadoop Cluster and generated detailed design documentation for the source-to-target transformations. • Designed the reports and dashboards to utilize data for interactive dashboards in Tableau based on business requirements. -
Data EngineerInfosys Feb 2019 - Aug 2021Columbia, South Carolina, United States• Extracted data from HDFS, including customer behaviour, sales and revenue data, supply chain, and logistics data. • Transferred the data to AWS S3 using Apache Nifi, which is an open-source data integration tool that enables powerful and scalable dataflows. • Validated and cleaned the data using Python scripts before storing it in S3. • Used PySpark to process and transform the data, which is a distributed computing framework for big data processing with Python API. • Loaded the transformed data into AWS RedShift data warehousing to analyze the data. • Scheduled the pipeline using Apache Oozie, which is a workflow scheduler system to manage Apache Hadoop jobs. • Developed and maintained a library of custom Airflow DAG templates and operators, which improved consistency and code quality across the team. • Led a team of three data engineers in designing and implementing a complex data ingestion and processing pipeline for a new data source, which reduced time to insights by 50%. • Analyzed the data in HDFS using Apache Hive, which is a data warehouse software that facilitates querying and managing large datasets. • Converted Hive queries into PySpark transformations using PySpark RDDs and Data Frame API. • Monitored the data pipeline and applications using Grafana. • Configured Zookeeper to support distributed applications. • Used functional programming concepts and the collection framework of Scala to store and process complex data. • Used GitHub as a version control system for managing code changes. • Developed visualizations and dashboards using Tableau for reporting and business intelligence purposes. -
Data AnalystKarvy Data Management Services Limited. Apr 2017 - Nov 2018Hyderabad, Telangana, India• Extensively used Informatica Client tools Power Center Designer, Workflow Manager, Workflow Monitor, and Repository Manager. • Used Kafka for live streaming data and performed analytics on it. Worked on Sqoop to transfer the data from relational database and Hadoop. • Configured in building real-time data pipelines using Kafka for streaming data ingestion and Spark Streaming for real-time consumption and processing. • Loaded data from Web servers and Teradata using Sqoop, Flume, and Spark Streaming API. • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming. • Written multiple MapReduce programs for data extraction, transformation, and aggregation from multiple file-formats including XML, JSON, CSV & other compressed file formats. • Extracted data from various heterogeneous sources like Oracle, and Flat Files. • Developed complex mapping using the Informatica Power Center tool. • Extracted data from Oracle and Flat files, Excel files, and performed complex joiner, Expression, Aggregate, Lookup, Stored procedure, Filter, Router transformation, and Update strategy transformations to load data into the target systems. • Worked with Data modeler in developing STAR Schemas. • Involved in analyzing the existence of the source feed in the existing CSDR database. • Handling a high volume of day-to-day Informatica workflow migrations. • Review Informatica ETL design documents and work closely with development to ensure correct standards are followed. • Worked on SQL queries to query the Repository DB to find the deviations from Company's ETL Standards for the objects created by users such as Sources, Targets, Transformations, Log Files, Mappings, Sessions, and Workflows. • Leveraging the existing PL/SQL scripts for the daily ETL operation. -
Data AnalystKarvy Data Management Services Limited. Oct 2015 - Mar 2017Hyderabad, Telangana, India• Experience working on projects with machine learning, big data, data visualization, R and Python development, Unix, and SQL. • Performed exploratory data analysis using NumPy, matplotlib, and pandas. • Expertise in quantitative analysis, data mining, and the presentation of data to see beyond the numbers and understand trends and insights. • Experience analyzing data with the help of Python libraries including Pandas, NumPy, SciPy, and Matplotlib. • Created complex SQL queries and scripts to extract and aggregate data to validate the accuracy of the data and Business requirements gathering and translating them into clear and concise specifications and queries. • Prepared high-level analysis reports with Excel and Tableau. Provides feedback on the quality of Data including identification of billing patterns and outliers. • Worked on sort & filters of tableau like Basic Sorting, basic filters, quick filters, context filters, condition filters, top filters, and filter operations. • Identify and document limitations in data quality that jeopardize the ability of internal and external data analysts' ability; wrote standard SQL Queries to perform data validation; created excel summary reports (Pivot tables and Charts); and gathered analytical data to develop functional requirements using data modeling and ETL tools. • Read data from different sources like CSV files, Excel, HTML pages, and SQL and performed data analysis and wrote to any data source like CSV file, Excel, or database. • Experience in using Lambda functions like filter (), map (), and reduce () with pandas Data Frame and performing various operations. • Used Pandas API for analyzing time series. Creating regression test framework for new code. • Developed and handled business logic through backend Python code.
Rishitha S Education Details
-
Miracle Educational Society Group Of InstitutionsComputer Science
Frequently Asked Questions about Rishitha S
What company does Rishitha S work for?
Rishitha S works for Molina Health Care
What is Rishitha S's role at the current company?
Rishitha S's current role is Senior Data Engineer at Ecolab | Hive | Python | Azure | Pyspark | Spark SQL | Azure Databrick| Hadoop | Snow flake| ETL | SQL | Airflow | Agile | Actively looking for new opportunities on C2C/C2H.
What schools did Rishitha S attend?
Rishitha S attended Miracle Educational Society Group Of Institutions.
Not the Rishitha S you were looking for?
-
RISHITHA S
| Devops Engineer | Azure Administrator | Aws | Azure | Cloud Engineer | Docker | Vpc | Terraform | Ci/Cd | Ansible | Github | Linux | Kubernetes | Python Developer | Mysql | JenkinsAustin, Texas Metropolitan Area -
-
Rishitha S
Fullstack Developer | Reactjs | Java | Node Js | Angular | Javascript | Typescript | Html/Html5 | Css/Css3 | Git | Actively Looking For C2C Roles | Open To RelocateUnited States -
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial