Swetha B

Swetha B Email and Phone Number

Senior Big Data Engineer at Safeway @ Safeway
us
Swetha B's Location
Irving, Texas, United States, United States
About Swetha B

•Dynamic and motivated IT professional with around 8+ years of experience, specialized in designing data-intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Warehouse, Data Visualization, Reporting, and Data Quality solutions.•Sound Experience with AWS services like Amazon EC2, Amazon S3, EMR, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS.•Extensive working experience with Big data ecosystem - Hadoop (HDFS, MapReduce, Yarn), Spark, Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume, Nifi.•Good experience on Azure cloud components like HDInsight, Databricks, DataLake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, CosmosDB.•Extensive working experience in building PySpark and Spark-Scala applications for interactive analysis, Batch processing, Stream processing. Expertise in writing Spark scripts in Python, Scala, Java, SQL for development analysis.

Swetha B's Current Company Details
Safeway

Safeway

View
Senior Big Data Engineer at Safeway
us
Website:
safeway.com
Employees:
10
Swetha B Work Experience Details
  • Safeway
    Senior Big Data Engineer
    Safeway Sep 2020 - Present
    Us
    • Involved in building a data pipeline and performed analytics using AWS stack (IAM,EMR, EC2, S3, RDS, Lambda, Athena, Glue, SQS, Redshift, and ECS). • Developed Spark applications utilizing Pyspark and Spark-SQL for information extraction, change, and accumulation from numerous document designs for analyzing and changing the information for client utilization designs. • Developed Spark jobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases.• Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual Data Lake.• Created Lambda flow that created data pipeline jobs that loads data to RDS to execute stored procedures and copy the data to Redshift. • Designed and Developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift. • Used Spark-streaming for consuming event-based data from Kafka and joined this data set with existing Hive table data to generate performance indicators for an application.• Developed data warehouse model in Snowflake for over 100 datasets using WhereScape. Created Reports in Looker based on Snowflake Connections. • Created workflows using Airflow to automate the process of extracting weblogs into S3 Data Lake.• Responsible for running the spark jobs along with optimizing, data validation and automation. • Used AWS Lambda, running scripts/code snippets in response to events occurring in CloudWatch. • Integrated applications using Apache tomcat servers on EC2 instances and automated data pipelines into AWS using Jenkins, git, maven.• Developed and run serverless Spark based applications using AWS Lambda service and Pyspark.• Installed, configured & managed RDMS, SQL Server, MYSQL, DB2, PostgreSQL, MongoDB, Cassandra.• Managed and deployed configurations for the entire datacenter infrastructure using Terraform.
  • Neudesic
    Data Engineer
    Neudesic Aug 2018 - Aug 2020
    Irvine, Ca, Us
    • Built the Data pipeline using Azure Service like Data Factory to load the data from legacy SQL server to Azure Data Base using Data Factories, API Gateway services, SSIS packages, Talend jobs and custom Python codes.• Worked with Azure cloud platform like HDInsight, Databricks, Data Lake, Blob, Data Factory, Synapse, SQL DB, SQL DWH. Architected, Designed and Developed Business applications and Data marts for reporting.• Involved in different phases of Development like including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements. • Performed data cleansing and applied transformations using Databricks and Spark data analysis.• Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.• Developed Big Data solutions focused on pattern matching and predictive modeling. • Implemented scalable microservices to handle concurrency and high traffic. Optimized existing Scala code and improved the cluster performance.• Reduced access time by refactoring data models, query optimization implemented Redis cache to support Snowflake. • Created Hive External tables to stage data and then move the data from staging to main tables • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system. • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop. • Utilized Oozie workflow to run Spark, Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed. • Used Azure DataFactory, SQL API, Mongo API, integrated data from MongoDB, MS SQL, cloud (Blob, Azure SQL DB). • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing. • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Semantic Web India Private Limited
    Big Data Developer
    Semantic Web India Private Limited Jan 2017 - Jul 2018
    Bangalore, Karnataka, In
    -Worked as a Data Engineer to review business requirement and compose source to target data mapping documents.-Responsible for building scalable and distributed data solutions using Cloudera CDH. -Researched, evaluated, architect, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms.-Gathered data and performed analytics using AWS stack (EMR, EC2, S3, RDS, Lambda, Redshift).-Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.-Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation. -Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.-Implemented Spark Scala UDF's to handle data quality, filter and validate data sets, Also Involved in converting Java analytical applications to Scala. -Involved in converting Java MapReduce jobs to Scala UDF’s and improved the performance.-Loaded data from web servers using Flume and Spark Streaming API. Used flume sink to write directly to indexers deployed on cluster, allowing indexing during ingestion.-Used Cloudera Hue and Zeppelin notebooks to interact with HDFS cluster. Used Cloudera Manager, Search and Navigator to configure and monitor resource utilization across the cluster. -Involved in designing and developing Data Models and Data Marts.-Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from MySQL Workbench and SQL Server database systems. -Managed the meta-data for the Subject Area models for the Data Warehouse environment. -Generated DDL and created the tables and views in the corresponding architectural layers.-Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop.
  • Uhg
    Big Data Developer
    Uhg Jan 2016 - Dec 2016
    -Worked with Hortonworks distribution. Installed, configured, and maintained a Hadoop cluster based on the business requirements.-Involved in end to end implementation of ETL pipelines using Python and SQL for high volume analytics, also reviewed use cases before on boarding to HDFS.-Experience with Apache bigdata components like HDFS, MapReduce, YARN, Hive, HBase, Sqoop, Pig, Ambari and Nifi.Responsible for writing rack topology scripts and Java map reduce programs to parse raw data.-Migrated from JMS solace to Apache Kafka, used Zookeeper to manage synchronization, serialization, and coordination across the cluster. -Responsible to load, manage and review terabytes of log files using Ambari web UI. -Used Sqoop to migrate data between traditional RDBMS and HDFS. Ingested data, from MS SQL, Teradata, and Cassandra databases. -Implemented various Hive queries for analytics. Created External tables, optimized Hive queries and improved the cluster performance by 30%.-Identified required tables, views and exported them into Hive. Performed ad-hoc queries using Hive joins, partitioning, bucketing techniques for faster data access.-Used Nifi to automate the data flow between disparate systems. Designed dataflow models and complicated target tables to obtain relevant metrics from various sources.-Migrated ETL jobs to Pig scripts to apply joins, aggregations, and transformations.-Developed Bash scripts to get log files from FTP server and executed Hive jobs to parse them.-Performed data analysis using HiveQL, Pig Latin and custom MapReduce programs in Java.-Enhanced scripts of existing Python modules. Worked on writing APIs to load the processed data to HBase tables.-Used Power BI as a front-end BI tool and MS SQL Server as a back-end database to design and develop dashboards, workbooks, and complex aggregate calculations.
  • Punjab National Bank Housing Finance Limited
    Java Application Developer
    Punjab National Bank Housing Finance Limited May 2015 - Dec 2015
    Pune, 410105, In
    -Developed an Intranet application using Spring J2EE, Oracle, Okta, Redis and Postman with microservices architecture and REST services.-Created quality working J2EE code to design, schedule, and cost to implement use case -Used Spring Framework for developing the whole Business Tier Module, Build deployed to WebSphere Application Server. -Used Spring Application Context for configuring & creating various beans for entire Application. -Used Hibernate technology for development ORM System for interacting with Oracle database. -Involved in stored Procedures for interacting with the Oracle database, MongoDB, Cassandra. -Developed, Tested and Deployed application in IBM WebSphere server using RAD. D-Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and system documentation.-Involved in all the phases of Software Development Life Cycle (requirements gathering, analysis, design, development, testing, and maintenance)- Involved in CI/CD process using Jenkins and GIT. Migrated from SiteMinder (Single Sign On) to OKTA and OAuth 2.0. Used Junit and Mockito frameworks for writing unit tests.-Used Log4J, Splunk for analyzing application performance, JFrog to store the artifacts.-Worked on front-end enhancements and added some functionality using JavaScript, HTML and CSS and bootstrap.

Frequently Asked Questions about Swetha B

What company does Swetha B work for?

Swetha B works for Safeway

What is Swetha B's role at the current company?

Swetha B's current role is Senior Big Data Engineer at Safeway.

Who are Swetha B's colleagues?

Swetha B's colleagues are Osleyner Sanarrusia, Shanda Gomes, Cristina Espinosa, Valerie Anderson, Anjum Sadia, Christine Tengan, Denise Reniker.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.