• IT professional with 8+ years overall experience, specialized in Big Data ecosystem, Data Acquisition, Ingestion, Modeling, Storage Analysis, Integration, Data Processing, and Database Management.• Experience in application development, implementation, deployment, and maintenance using Hadoop and Spark-based technologies like Cloudera, Hortonworks, Amazon EMR, Azure SQL, Azure Data Factory, Azure Data Bricks, Azure Data Lake, Azure HDInsight.• A Data Science enthusiast with strong Problem solving, Debugging, and Analytical capabilities, who actively engage in understanding and delivering to business requirements.• Ample work experience in Big-Data ecosystem - Hadoop (HDFS, MapReduce, Yarn), Spark, Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume.• Good knowledge of Hadoop cluster architecture and its key concepts - Distributed file systems, Parallel processing, High availability, fault tolerance, and Scalability.• Complete knowledge of Hadoop architecture and Daemons of Hadoop clusters, which include Name node, Data node, Resource manager, Node Manager, and Job history server.
Global Atlantic Financial Group, Indianapolis, In
-
Aws Data EngineerGlobal Atlantic Financial Group, Indianapolis, In Mar 2021 - PresentWorked on Apache Spark data processing project to process data from RDBMS and several data streaming sources and developed Spark applications using Python on AWS EMR.• Designed and deployed multi-tier applications leveraging AWS services like (EC2, Route 53, S3, RDS, DynamoDB) focusing on high availability, fault tolerance, and auto-scaling in AWS Cloud Formation.• Configured and launched AWS EC2 instances to execute Spark jobs on AWS Elastic Map Reduce (EMR).• Performed data transformations using Spark Data Frames, Spark SQL, Spark File formats, Spark RDDs.• Transformed data from different files (Text, CSV, JSON) using Python scripts in Spark.• Loaded data from various sources like RDBMS (MySQL, Teradata) using Sqoop jobs.• Handled JSON datasets by writing custom Python functions to parse through JSON data using Spark.• Developed a preprocessing job using Spark Data Frames to flatten JSON documents to flat files.• Improved performance of cluster by optimizing existing algorithms using Spark.• Performed wide, narrow transformations, actions like filter, Lookup, Join, count, etc. on Spark DataFrames.• Worked with Parquet files and Impala using PySpark, and Spark Streaming with RDDs and Data Frames.• Performed reporting analytics on data from AWS stack by connecting it to BI tools (Tableau, Power Bi).• Imported data from AWS S3 into SparkRDD performed transformations and actions on RDD's.• Worked with database administrating team on SQL optimization for databases like Oracle, MySQL, MS SQL.• Assisted in configuring and implemented MongoDB cluster nodes on AWS EC2 instances.• Identified executor failures, data skewness, and runtime issues by monitoring Spark apps through Spark UI.• Ensured database performance in production by stress testing AWS EC2 of DynamoDB environments.• Automated deployments and routine tasks using UNIX Shell Scripting.
-
Azure Data EngineerThomson Reuters Eagan, Mn Apr 2019 - Feb 2021Designed and deployed data pipelines using Azure cloud platform (HDInsight, DataLake, DataBricks, Blob Storage, Azure SQL, Azure Data Factory, Azure Data Bricks, Azure Data Lake, Synapse, SQL, SQL DB, DWH, and Data Storage Explorer).• Developed custom-built ETL solution, batch processing, and real-time data ingestion pipeline to move data in and out of the Hadoop cluster using PySpark and Shell Scripting.• Integrated on-premises data (MySQL, Hbase) with cloud (Blob Storage, Azure SQL DB) and applied transformations to load back to Azure Synapse using Azure Data Factory.• Built and published Docker container images using Azure Container Registry and deployed them into Azure Kubernetes Service (AKS).• Imported metadata into Hive and migrated existing tables and applications to work on Hive and Azure.• Created complex data transformations and manipulations using ADF and Scala.• Configured Azure Data Factory (ADF) to ingest data from different sources like relational and non-relational databases to meet business functional requirements.• Optimized workflows by building DAGs in Apache Airflow to schedule the ETL jobs and implemented additional components in Apache Airflow like Pool, Executors, and multi-node functionality.• Improved performance of Airflow by exploring and implementing the most suitable configurations.• Configured Spark streaming to receive real-time data from Apache Flume and store the stream data using Scala to Azure Table and DataLake used to store and do all types of processing and analytics. Created data frames using Spark Dataframes.• Designed cloud architecture and implementation plans for hosting complex app workloads on MS Azure.• Performed operations on the transformation layer using Apache Spark RDD, Data frame APIs, and Spark SQL and applied various aggregations provided by Spark framework.• Provided real-time insights and reports by mining data using Spark Scala functions.
-
Data EngineerHomesite Insurance Boston, Ma Oct 2016 - Mar 2019Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.• Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.• Performed wide, narrow transformations, actions like filter, Lookup, Join, count, etc. on Spark DataFrames.• Worked with Parquet files and Impala using PySpark, and Spark Streaming with RDDs and Data Frames.• Aggregated logs data from various servers and made them available in downstream systems for analytics by using Apache Kafka.• Improved Kafka performance and implemented security.• Developed batch and streaming processing apps using Spark APIs for functional pipeline requirements.• Worked with Spark to create structured data from the pool of unstructured data received.• Implemented intermediate functionalities like events or records count from the flume sinks or Kafka topics by writing Spark programs in java and python.• Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS.• Experienced in transferring Streaming data, data from different data sources into HDFS, No SQL databases• Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.• Transformed data from different files (Text, CSV, JSON) using Python scripts in Spark.• Loaded data from various sources like RDBMS (MySQL, Teradata) using Sqoop jobs.• Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star Schema
-
Hadoop DeveloperBrio Technologies Private Limited Hyd India Feb 2015 - Jul 2016Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing.• Experienced in installing, configuring and using Hadoop Ecosystem components.• Experienced in Importing and exporting data into HDFS and Hive using Sqoop.• Participated in development/implementation of Cloudera Hadoop environment.• Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop.• Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network• Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.• Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.• Used DataStax Cassandra along with Pentaho for reporting.• Queried and analyzed data from DataStax Cassandra for quick searching, sorting and grouping.• Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to hive and impala.• Designed and implemented a product search service using Apache Solr/Lucene.• Worked in installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, and slots configuration.• Used Yarn Architecture and Map reduce 2.0 in the development cluster for POC.• Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.• Load and transform large sets of structured, semi structured and unstructured data.
-
System EngineerHudda Infotech Private Limited Hyderabad, India Jul 2013 - Jan 2015Participated in the analysis, design, and development phase of the Software Development Lifecycle (SDLC).• Developed test-driven web applications using Java J2EE, Struts 2.0 framework, Spring MVC, Hibernate framework, JavaScript, and SQL Server database with deployments on IBM WebSphere.• Designed and developed NSEP, which is an online web application where students can register, find, search, and apply for the jobs available. Utilized Java J2EE, JavaScript, SQL, HTML, CSS,and XML on Eclipse.• Designed & developed a web Portal using Struts Framework, J2EE. Developed newsletter as part of process improvement tasks using HTML and CSS to report the weekly activities.• Developed front-end, User Interface using HTML, CSS, JSP, Struts, Angular, and NodeJS, and session validation using SpringAOP.• Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features and deployed it on the JBoss server.• Ensured High availability and load balancing by configuring and Implementing clustering of Oracle on WebLogic Server 10.3.• Improved productivity by developing an automated system health check tool using UNIX shell scripts.
Frequently Asked Questions about Smit P
What company does Smit P work for?
Smit P works for Global Atlantic Financial Group, Indianapolis, In
What is Smit P's role at the current company?
Smit P's current role is AWS Data Engineer.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial