Smit P

Smit P Email and Phone Number

AWS Data Engineer @
Smit P's Location
Louisville, Ohio, United States, United States
About Smit P

• IT professional with 8+ years overall experience, specialized in Big Data ecosystem, Data Acquisition, Ingestion, Modeling, Storage Analysis, Integration, Data Processing, and Database Management.• Experience in application development, implementation, deployment, and maintenance using Hadoop and Spark-based technologies like Cloudera, Hortonworks, Amazon EMR, Azure SQL, Azure Data Factory, Azure Data Bricks, Azure Data Lake, Azure HDInsight.• A Data Science enthusiast with strong Problem solving, Debugging, and Analytical capabilities, who actively engage in understanding and delivering to business requirements.• Ample work experience in Big-Data ecosystem - Hadoop (HDFS, MapReduce, Yarn), Spark, Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume.• Good knowledge of Hadoop cluster architecture and its key concepts - Distributed file systems, Parallel processing, High availability, fault tolerance, and Scalability.• Complete knowledge of Hadoop architecture and Daemons of Hadoop clusters, which include Name node, Data node, Resource manager, Node Manager, and Job history server.

Smit P's Current Company Details
Global Atlantic financial group, Indianapolis, IN

Global Atlantic Financial Group, Indianapolis, In

AWS Data Engineer
Smit P Work Experience Details
  • Global Atlantic Financial Group, Indianapolis, In
    Aws Data Engineer
    Global Atlantic Financial Group, Indianapolis, In Mar 2021 - Present
    Worked on Apache Spark data processing project to process data from RDBMS and several data streaming sources and developed Spark applications using Python on AWS EMR.• Designed and deployed multi-tier applications leveraging AWS services like (EC2, Route 53, S3, RDS, DynamoDB) focusing on high availability, fault tolerance, and auto-scaling in AWS Cloud Formation.• Configured and launched AWS EC2 instances to execute Spark jobs on AWS Elastic Map Reduce (EMR).• Performed data transformations using Spark Data Frames, Spark SQL, Spark File formats, Spark RDDs.• Transformed data from different files (Text, CSV, JSON) using Python scripts in Spark.• Loaded data from various sources like RDBMS (MySQL, Teradata) using Sqoop jobs.• Handled JSON datasets by writing custom Python functions to parse through JSON data using Spark.• Developed a preprocessing job using Spark Data Frames to flatten JSON documents to flat files.• Improved performance of cluster by optimizing existing algorithms using Spark.• Performed wide, narrow transformations, actions like filter, Lookup, Join, count, etc. on Spark DataFrames.• Worked with Parquet files and Impala using PySpark, and Spark Streaming with RDDs and Data Frames.• Performed reporting analytics on data from AWS stack by connecting it to BI tools (Tableau, Power Bi).• Imported data from AWS S3 into SparkRDD performed transformations and actions on RDD's.• Worked with database administrating team on SQL optimization for databases like Oracle, MySQL, MS SQL.• Assisted in configuring and implemented MongoDB cluster nodes on AWS EC2 instances.• Identified executor failures, data skewness, and runtime issues by monitoring Spark apps through Spark UI.• Ensured database performance in production by stress testing AWS EC2 of DynamoDB environments.• Automated deployments and routine tasks using UNIX Shell Scripting.
  • Thomson Reuters Eagan, Mn
    Azure Data Engineer
    Thomson Reuters Eagan, Mn Apr 2019 - Feb 2021
    Designed and deployed data pipelines using Azure cloud platform (HDInsight, DataLake, DataBricks, Blob Storage, Azure SQL, Azure Data Factory, Azure Data Bricks, Azure Data Lake, Synapse, SQL, SQL DB, DWH, and Data Storage Explorer).• Developed custom-built ETL solution, batch processing, and real-time data ingestion pipeline to move data in and out of the Hadoop cluster using PySpark and Shell Scripting.• Integrated on-premises data (MySQL, Hbase) with cloud (Blob Storage, Azure SQL DB) and applied transformations to load back to Azure Synapse using Azure Data Factory.• Built and published Docker container images using Azure Container Registry and deployed them into Azure Kubernetes Service (AKS).• Imported metadata into Hive and migrated existing tables and applications to work on Hive and Azure.• Created complex data transformations and manipulations using ADF and Scala.• Configured Azure Data Factory (ADF) to ingest data from different sources like relational and non-relational databases to meet business functional requirements.• Optimized workflows by building DAGs in Apache Airflow to schedule the ETL jobs and implemented additional components in Apache Airflow like Pool, Executors, and multi-node functionality.• Improved performance of Airflow by exploring and implementing the most suitable configurations.• Configured Spark streaming to receive real-time data from Apache Flume and store the stream data using Scala to Azure Table and DataLake used to store and do all types of processing and analytics. Created data frames using Spark Dataframes.• Designed cloud architecture and implementation plans for hosting complex app workloads on MS Azure.• Performed operations on the transformation layer using Apache Spark RDD, Data frame APIs, and Spark SQL and applied various aggregations provided by Spark framework.• Provided real-time insights and reports by mining data using Spark Scala functions.
  • Homesite Insurance Boston, Ma
    Data Engineer
    Homesite Insurance Boston, Ma Oct 2016 - Mar 2019
    Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.• Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.• Performed wide, narrow transformations, actions like filter, Lookup, Join, count, etc. on Spark DataFrames.• Worked with Parquet files and Impala using PySpark, and Spark Streaming with RDDs and Data Frames.• Aggregated logs data from various servers and made them available in downstream systems for analytics by using Apache Kafka.• Improved Kafka performance and implemented security.• Developed batch and streaming processing apps using Spark APIs for functional pipeline requirements.• Worked with Spark to create structured data from the pool of unstructured data received.• Implemented intermediate functionalities like events or records count from the flume sinks or Kafka topics by writing Spark programs in java and python.• Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS.• Experienced in transferring Streaming data, data from different data sources into HDFS, No SQL databases• Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.• Transformed data from different files (Text, CSV, JSON) using Python scripts in Spark.• Loaded data from various sources like RDBMS (MySQL, Teradata) using Sqoop jobs.• Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star Schema
  • Brio Technologies Private Limited Hyd India
    Hadoop Developer
    Brio Technologies Private Limited Hyd India Feb 2015 - Jul 2016
    Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing.• Experienced in installing, configuring and using Hadoop Ecosystem components.• Experienced in Importing and exporting data into HDFS and Hive using Sqoop.• Participated in development/implementation of Cloudera Hadoop environment.• Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop.• Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network• Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.• Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.• Used DataStax Cassandra along with Pentaho for reporting.• Queried and analyzed data from DataStax Cassandra for quick searching, sorting and grouping.• Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to hive and impala.• Designed and implemented a product search service using Apache Solr/Lucene.• Worked in installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, and slots configuration.• Used Yarn Architecture and Map reduce 2.0 in the development cluster for POC.• Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.• Load and transform large sets of structured, semi structured and unstructured data.
  • Hudda Infotech Private Limited Hyderabad, India
    System Engineer
    Hudda Infotech Private Limited Hyderabad, India Jul 2013 - Jan 2015
    Participated in the analysis, design, and development phase of the Software Development Lifecycle (SDLC).• Developed test-driven web applications using Java J2EE, Struts 2.0 framework, Spring MVC, Hibernate framework, JavaScript, and SQL Server database with deployments on IBM WebSphere.• Designed and developed NSEP, which is an online web application where students can register, find, search, and apply for the jobs available. Utilized Java J2EE, JavaScript, SQL, HTML, CSS,and XML on Eclipse.• Designed & developed a web Portal using Struts Framework, J2EE. Developed newsletter as part of process improvement tasks using HTML and CSS to report the weekly activities.• Developed front-end, User Interface using HTML, CSS, JSP, Struts, Angular, and NodeJS, and session validation using SpringAOP.• Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features and deployed it on the JBoss server.• Ensured High availability and load balancing by configuring and Implementing clustering of Oracle on WebLogic Server 10.3.• Improved productivity by developing an automated system health check tool using UNIX shell scripts.

Frequently Asked Questions about Smit P

What company does Smit P work for?

Smit P works for Global Atlantic Financial Group, Indianapolis, In

What is Smit P's role at the current company?

Smit P's current role is AWS Data Engineer.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.