Sai B Email and Phone Number
Having 8+ years of over all experience as a Data engineer with expertise in Big Data, Cloud Technologies and Hadoop components like HDFS, Map-Reduce, Yarn, Apache Pig, Hive, Sqoop, WOOPRA (Web-Analytic Application), shell scripting, Kafka and Spark in Scala.An aspirational and results-oriented professional with a track record of developing large-scale data processing systems and data warehouse solutions for data analytics.
Experian
View-
Senior Data EngineerExperian Sep 2020 - PresentCosta Mesa, Ca, Us• Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala. • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in python • Implemented Spark using python and Spark SQL for faster testing and processing of data. • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala. • Worked with Spark to create structured data from the pool of unstructured data received. • Implemented intermediate functionalities like events or records count from the flume sinks or Kafka topics by writing Spark programs in java and python. • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS.• Chosen and produced information into csv records and put away them into AWS S3 by utilizing AWS EC2 and afterward organized and put away in AWS Redshift. • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS. • Experienced in transferring Streaming data, data from different data sources into HDFS, No SQL databases • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.• Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into data warehouse.• Fostered the clump contents to get the information from AWS S3 stockpiling and do required changes in Scala utilizing Spark system.• Chipped away at a python content to extricate information from Netezza data sets and move it to AWS S3.• Developed a PySpark program that writes dataframes to HDFS as avro files. Environment: Hadoop, Hive, Flume, Map Reduce, Sqoop, Kafka, Spark, Yarn, Cassandra, Oozie, shell Scripting, Scala, Maven, MySQL -
Senior Data EngineerDrug Plastics And Glass Co Mar 2019 - Aug 2020• Analyze and Prepare data, identify the patterns on dataset by applying historical models. Collaborating with Senior Data Scientists for understanding of data • Managed the activities required to maintain a data & process governance structure • Utilize data from external provider to properly class MDM data components (customer category, sub-category, etc.) • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).• Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.• Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.• Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.• Developing UDFs in java for hive and pig, Worked on reading multiple data formats on HDFS using Scala.• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.• Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.• Built price elasticity model for various product and services bundled offering • A Data Platform Solution Architect presently associated with Confidential Corporation with a strong consulting and presales background possessing hands on experience in Big Data, Data Science, Cloud and Enterprise applications. Environment: MS SQL Server, R/R studio, SQL Enterprise Manager, Pyspark, Python, Red shift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office, Outlook, AS E-Miner. -
Data EngineerTruist Sep 2016 - Feb 2019Charlotte, North Carolina, Us• Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka• Performing hive tuning techniques like partitioning and bucketing and memory optimization.• Worked on different file formats like parquet, orc, json and text files.• Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).• Used spark SQL to load data and created schema RDD on top of that which loads into hive tables and handled structured using spark SQL.• Worked on analysing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Spark and Kafka.• As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop and Talend etc.• Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark, YARN, pyspark.• Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.• The Databricks platform follows best practices for securing network access to cloud applications.• Hands on experiences on git bash commands like git pull to pull the code from source and developing it as per the requirements, git add to add files, git commit after the code build and git push to the pre prod environment for the code review and later used screwdriver. yaml which actually build the code, generates artifacts which releases in to production.ENVIRONMENT: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, GitHub, Talend Big Data Integration, Impala. -
Hadoop DeveloperJarus Technologies Pvt Ltd Jan 2015 - Jun 2016• Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming. • Converting the existing relational database model to Hadoop ecosystem. • Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop. • Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances. • Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates. • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS • Managed and reviewed Hadoop and HBase log files. • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive. • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data. • Analyze table data and implement compression techniques like Teradata Multivalued compression • Involved in ETL process from design, development, testing and migration to production environments. • Involved in writing the ETL test scripts and guided the testing team in executing the test scripts. • Involved in performance tuning of the ETL process by addressing various performance issues at the extraction and transformation stages. • Provide guidance to development team working on PySpark as ETL platform. • Writing Hadoop MapReduce jobs to run on Amazon EMR clusters and creating workflows for running jobs • Generating analytics reporting on probe data by writing EMR (elastic map reduce) jobs to run on Amazon VPC cluster and using Amazon data pipelines for automation. Environment: Hadoop, HDFS, pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Kafka.
-
Data AnalystGvk Bio Sciences Jun 2013 - Dec 2014• Gathering data and business requirements from end users and management. Designed and built data solutions to migrate existing source data in Data Warehouse to Atlas Data Lake (Big Data) • Performed all the Technical Data quality (TDQ) validations which include Header/Footer validation, Record count, Data Lineage, Data Profiling, Check sum, Empty file, Duplicates, Delimiter, Threshold, DC validations for all Data sources. • Analyzed huge volumes of data Devised simple and complex HIVE, SQL scripts to validate Dataflow in various applications. Performed Cognos report validation. Made use of MHUB for validating Data Profiling & Data Lineage. • Devised PL/SQL statements - Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance. • Created reports using Tableau/Power BI/Cognas to perform data validation. • Set up a governance process around Tableau dashboard processes • Worked with senior management to plan, define and clarify tableau dashboard goals, objectives and requirement. • Involved in creating Created Tableau dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts etc. using show me functionality. Dashboards and stories as needed using Tableau Desktop and Tableau Server • Responsible for daily communications to management and internal organizations regarding status of all assigned projects and tasks.Environment: Hadoop, Map Reduce,Hive,Aws redshift,SQL, PL/SQL, T/SQL, XML, Informatica, Python,Tableau, OLAP, SSIS, SSRS, Excel, OLTP,Git.
Sai B Skills
Sai B Education Details
-
Jntuh College Of Engineering HyderabadComputer Science
Frequently Asked Questions about Sai B
What company does Sai B work for?
Sai B works for Experian
What is Sai B's role at the current company?
Sai B's current role is Senior Data Engineer at Experian | Actively Looking for Contract Jobs | Big Data | Hadoop | DataFactory | Databricks | Azure | SQL | Stream Analytics | Kafka|Python|Scala| AWS Glue|PySpark|Snowflake|GCP|BigQuery.
What schools did Sai B attend?
Sai B attended Jntuh College Of Engineering Hyderabad.
What skills is Sai B known for?
Sai B has skills like Cloudera, Hdfs, R, Git, Amazon Web Services, Linux, Yarn, Mllib, Mongodb, T Sql, Microsoft Azure, Sqoop.
Free Chrome Extension
Find emails, phones & company data instantly
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial