Project and results-oriented IT professional with more than 17 years of experience in analysis, development, and deployment of corporate solutions for a wide variety of industries like retail, transportation, banking, logistics, insurance, energy, and government.- REST API development, Java, Spring Boot, Unit test, CI/CD, TeamCity, GitHub, GitLab, Jenkins.- Cloud services AWS and GCP.- Hadoop ecosystem, HDFS, Kafka, Flink, Sqoop, Spark, Flume, Pig, Hive, Oozie, Tez, ZooKeeper, Hue.- Horton Works, Cloudera, Databricks. - NoSQL Hbase, Cassandra, DynamoDB- Developer and Consultant for Geographic Information Systems (GIS)- Developer and Consultant for Location-Based Systems (LBS)- Developer and Consultant for Big Data and cloud solutions.Technology Management- MapInfo, MapXtreme. NET Mapbasic- ArcView, ArcGIS, Arc / Info- OpenGIS, MapServer, OpenLayers, PostGIS, leaflet- NAVTEQ MapTP API and web services- Geomedia, Oracle Spatial- ASP.NET, PHP 5, Visual Basic, C #, WebServices, Python, Scala- MySQL, SQL Server, Oracle, Postgres/PostGIS- javascript, jQuery, Angular, Bootstrap, HTML, ajax, rest, - Business Intelligence, Location Intelligence- Microstrategy, Visualcrossing,- PMP, Project Management, SDLC, CMMI, Waterfall, Agile, Scrum, Governance.
-
Data ArchitectComcastUnited States -
Senior Data EngineerDuke Energy Corporation Feb 2024 - PresentCharlotte, North Carolina, Us• As a Senior Data Engineer within the Data Science team at Duke Energy, I spearheaded the establishment of robust data pipelines and processing infrastructure to drive predictive modeling for customer energy consumption and usage alerts. • My contributions include the development of Terraform code to deploy autoscaling EMR clusters and EMR Serverless instances using Step Functions workflows. • Additionally, I engineered Terraform scripts to orchestrate Step Functions triggering PySpark Jobs for predictive modeling and to create Lambda functions that initiate State Machine workflows. • Ensuring security and compliance, I meticulously crafted Terraform configurations to implement API Gateway and grant fine-grained IAM access based on least privilege principles. • Rigorous unit and integration testing of the entire processing solution were conducted, followed by seamless deployment of Terraform code across development, QA, and production AWS environments. • Furthermore, I fostered open communication and collaboration with stakeholders, product owners, data scientists, developers, and project managers to ensure the seamless interaction and smooth operation of our data infrastructure. -
Senior Data EngineerComcast Aug 2023 - Dec 2023Philadelphia, Pa, UsOptimized corporate AWS platform, resulting in a remarkable 40% reduction in overall costs. A thorough analysis of current cloud resources, Python and PySpark applications development, ETL redesign, and efficient data pipelines setup using AWS cloud services. • Conducted in-depth analysis of the corporate AWS platform, focusing on Redshift clusters, S3, Lambda, EMR, and EC2, leading to the design and implementation of a resource optimization process. • Developed Python and PySpark notebooks in Databricks for comprehensive analysis and optimization of S3 storage, extraction of VPC data about EMR and EC2, merging and optimizing Parquet files, and listing and filtering contents of S3 buckets based on data types. • Executed analysis and development of ETL processes, establishing data pipelines using AWS cloud services, such as Lambda with Python and Boto3 API, and Event Bridge for effective event scheduling. • Successfully migrated Redshift data pipelines to Athena by refactoring and optimizing Redshift queries, testing code, and comparing results. Implemented a Python Lambda function to execute Athena code with input parameters, format and compress data, and optimize the output file size for optimal processing for storage in S3. • Leveraged Tableau to read data from S3, creating hyper files using a purpose-built tool. Orchestrated the pipelines and used Rundeck for seamless execution. -
Professional DevelopmentCareer Break Feb 2023 - Jul 2023Engaging in coursework for my AWS Certified Solutions Architect certification while concurrently undertaking personal projects.
-
Ml Sr. Data EngineerThe Home Depot Jun 2022 - Feb 2023Atlanta, Georgia, UsWorking at the campaign management team to support the Software Lifecycle of a REST API that responds to the ML (Machine Learning) team data requirements.The requirements include using Reinforcement Learning approach, training the model with user data to maximize the reward of marketing campaigns.The application is divided: • A front-end developed in angular and node • A back-end developed in java and Spring BootOther frameworks were used to support the back-end dev. like Spring Data Cloud Datastore framework and Hibernate for CRUD operations.Google Cloud Services were implemented: • Datastore for schema-less storage • Cloud SQL for MySQL for catalogs storage • BigQuery as the data warehouse for BI and analytics on ML models • Cloud IAM for security and authentication • Compute Engine for instance provisioningI used Docker containers on compute instances to run the applications.TeamCity to automate deployment from Github. I tested the API responses using Postman.For unit testing, Mockito and Junit.The API had an interface to input Campaign Data and Reward Functions.Each campaign should have a name, start-end dates, either Multi-Armed Bandit (MAB) or Contextual Multi-Arm Bandit (CMAB).Each MAB/CMAB should have an algorithm (bandit policy type) like Epsilon Greedy, Thompson Sampling or Upper Confidence Bound (UCB) with their corresponding parameters (alpha, beta, etc.) and a Reward Function.For the Reward Functions, input data is name, description, Campaign Type, Bandit Policy, and a SQL code to calculate the corresponding Reward.The bandit policy types, MAB/CMAB and other ML algorithms were implemented in Python with pandas and NumPy.Attended demo meetings with the ML team to show the progress. The result was a completely functional API with a front-end that will be used by the ML team as a decision engine and campaign management that was deployed into production.The output data is used for BI, analytics and ML modeling. -
Big Data EngineerTwilio Oct 2021 - Apr 2022San Francisco, California, UsAssigned to the gross margin team to gather business requirements from the stakeholders and business users.Understanding of the business rules and analysis of their current requirements.Documentation for functional requirements and discussion with stakeholders.Development, testing and deployment of applications using Spark and Scala: • Dim billable product • Dim customers • Fact cogs messages (cost of goods)Maintenance, update and deployment of current production processes for gross margin: • Summary of counters SVoT (Single Version of Truth) • Stored value transactions • ETL fact call summaryAssigned to the Financial Data Layer team to perform testing and data comparison between Fact Actual Revenue Tables and Financial tables on gross margins. All data comparison were done using SQL on Presto.Assigned to on-call production support on PagerDuty. During this time, I had to review the incidents, mostly from Airflow DAGs SLAs. • Review the DAG and their tasks status • Review the log files and find a possible cause of error • Clear the failed state and rerun, follow up • In case of error in code or tables, raise a ticketDuring on boarding process I took several training courses: • Application Security and Secure Coding • Code of conduct • Incident Response • Privacy • Production access • Product accessEnvironment: Scala, Spark, Python, Presto, Airflow, SQL, Redshift, EC2, EMR, Avro, parquet, Hive, Github, Admiral, Chef, Jenkins, linux, bash, PagerDuty, jira, IntelliJ, DataGrip. -
Big Data EngineerThe Walt Disney Company Mar 2020 - Sep 2021Burbank, Ca, UsProject involved the modernization of the advertising and sales platform for Disney. The original ETL was based on Oracle and in-house development of the AdVisor BI platform. The objective of the AdVisor web app was to deliver a robust platform and nimble front end using current technologies that supported the strategic and operational needs of the Customer Marketing & Sales (CMS) organization. The AdVisor program replaces the Rate Card, Proposal, Order Entry, and Inventory Management components of NCS with a system that improves business workflow, maximizes revenue potential, and supports cross platform salesObjectives: ◦ Improving responsiveness to the needs of the business ◦ Simplifying Data Warehouse Processing ◦ Reducing time to delivery ◦ Having a DataMart ready for BI and data science• Produced a Data Model for 105 tables• Integrated tables in ER model and transformed using SQL and UDF• Developed app to consume from Kafka, parsed the data, created insert statements, connect to Snowflake and pass them with Snowflake API• Architectural model based on modules: Landing, Conformance, Consumption• Oracle as source, Nifi to read and produced rows for Kafka. Topics as Avro and Schema Registry for schema evolution/validation• Data at S3 in Parquet with Spark Core 2.1, 3.0 and Java 1.8• Deploy Java apps on EC2 m5.2xlarge and m5.4xlarge, transformations stored in S3• Develop app in Java reads from S3, parses, create insert statements connect to Snowflake populate tables.• Tested Spark Streaming, Kafka streams, KSQL, Flink API, Flink SQL, chose Kafka, Schema registry, Flink API, Flink SQL with Java.• Developed a Java application that read the schema subject from Schema Registry, parses the fields and creates a Snowflake DDLEnvironment: Spark, Java, Kafka, Flink, Nifi, Airflow, TeamCity, GitHub, GitLab, Schema Registry, Confluent, Snowflake, Oracle, AWS (EC2, S3, KDA), Avri, Parquet, YAML, Bash Scripting, Confluence, Jira -
Big Data Consultant And DeveloperIheartmedia Mar 2018 - Mar 2020New York, Ny, UsDesigned, developed and implemented cutting-edge big data analytics application to support the royalties business function. This project involved migrating legacy applications to new technologies, as well as building fault tolerant, highly available, high performing applications that provide a seamless experience to our end-users.The solutions we built were highly resilient, scalable using cloud-based technologies and automated using CI/CD. • Worked closely with business stakeholders and the business analyst to understand and analyze requirements. • Development of Hive scripts for data extraction and transformation for daily, weekly, monthly and quarterly reports. • Development of Oozie workflows for data extraction and transformations from S3 into Hive. • Development of unit tests for Hive scripts using HiveRunner. • Analysis and development of Java and Scala applications using Spark for data extraction from Elasticsearch data. • Building Spark applications on Scala and Java with Docker images to perform benchmarks on Spark with no Hadoop on a single node. • Working with queries on SQL Server provided by the data team to build our fact and dimension tables from their data warehouse. • Created pipeline schedulers in GitLab to execute periodic data pipelines. • Creation and automation of AWS Hadoop clusters to execute step functions for data pipelines. • Benchmark and query database tables created using Redshift, Redshift Spectrum and Aurora. • Accounts, roles and policies management on AWS IAM (Identity Access Management) for development and business team members. • Monitored system and application performance with troubleshooting and L3 engineering support and resolution of escalated issues. AWS (EMR, S3, IAM, Redshift, Aurora, Redshift Spectrum, Cluster Steps, AWS Marketplace), Hive, SQL Server, Hadoop, Hue, Docker, Spark, Tez, Java, Scala, Avro, Parquet, Ganglia, Oozie, Gitlab (CI / CD, pipeline schedules) -
Spark DeveloperApple Jan 2018 - Mar 2018Cupertino, California, UsBeing part of the process of corporate ETL process modernization I was part of the analysis, design and development of a new solution which integrates Kafka, Spark, Scala, Cassandra, Avro, Protobuf and Oracle to delivers a high trough output of data and transformations from a wide variety of applications like Apple Care, Apple Music, iTunes, iCloud, retailers, etc. - Understanding of the business rules, business logic and use cases to be implemented. - The solution is intended to be highly customizable by the final user, which can create their own data pipelines, transformations and final loading destinations. - The sources of data can be as disparate as CSV, Protobuf, Avro, Kafka, Teradata, Oracle, etc. - The user should be capable to configure the corresponding format of input data, including data schema, the transformations to be applied, including joins, filters, SQL queries and once transformed, Load to their own platforms, using Cassandra as default, but could be Netezza, HDFS, Teradata, Oracle, etc. - The project was intended to be a POC, to demonstrate the data processing in parallel that can be achieved using big data ecosystem tools like Spark implemented with Scala. - Unit testing and Integration testings where performed using ScalaTest. - Other tools used where Maven for the managing the project lifecycle, Splunk for log reporting, Hubble for metrics management and Jenkins for continuous delivery. Other corporate tools belonging to Apple where used also. - The solution was deployed to a Hadoop corporate cluster containing 250 nodes. -
Hadoop Big Data Engineer & ArchitectI3 Solutions Inc May 2017 - Mar 2018Mississauga, On, CaDevelopment of Big Data projects using Hadoop, HDFS and data lakes.Design, build, and support cloud and open source systems to process geospatial data assets via an API-based platform.Processed high volumes of streaming geospatial data from IoT sensors with Kafka, Storm and Spark streaming, showing devices status and location on the digital maps application.Importing and exporting data between HDFS and RDBMS using Sqoop.Pig Latin scripts and Hive (HiveQL) to perform data transformations and incremental loads.Using Flume to handle streaming data and loaded the data into Hadoop clusters.Extensive knowledge of NoSQL databases such as HBase, MongoDB, and Cassandra.Multi Clustered environment, deployment and configuration using Cloudera and HortonWorks platforms.Creation of RDD’s, Datasets and Dataframes for the input data and performed transformations using Spark Python.Development and support of MapReduce jobs in Scala for data cleaning and preprocessing.Big Data Analytics, ETL, Data Analysis and Visualization using Cloudera and HortonWorks platforms.Good experience in working with cloud environment with Amazon Web Services (AWS) EMR, IAM, Lambda, API Gateway, Cognito, CloudFormation, CloudWatch, DynamoDB, Data Pipeline, EC2, and S3. -
Big Data Architect-EngineerCitizens Financial Group, Inc. Oct 2015 - Apr 2017Worked on design, architecture, and implementation of big data pipeline and HDFS ingestion from various sources for efficient process and supporting real-time queries and analysis for financial risk management and decision making. - Involved in architecture and design of distributed time-series database platform using NoSQL technologies like Hadoop/HBase, ZooKeeper. - Responsible for configuring deployment environment to handle the application using Jetty server and Web Logic 10 and Postgres database at the back-end. - Involved in the implementation of Spring MVC Pattern and developed persistence layer using Hibernate framework. - Implemented ORM through Hibernate and involved in preparing the Database Model for the project. - Followed Scrum methodology for the application development. - Supported Map Reduce Programs those are running on the cluster and developed multiple MapReduce jobs in Java for data cleaning and preprocessing. - Developed various helper classes needed following Core Java multi-threaded programming and Collection classes. - Extracted data from Netezza databases to Hadoop framework. - Extracted the data from various sources into HDFS using Sqoop and ran Pig scripts on the huge chunks of data. - Further used pig to do transformations, event joins, elephant bird API and pre -aggregations performed before loading JSON files format onto HDFS. - Involved in resolving performance issues in Pig and Hive with an understanding of MapReduce execution and debugging commands to run optimized code. - Good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. - Good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. -
Data Engineer & Hadoop ArchitectJohnson Controls Sep 2014 - Sep 2015Cork, Ireland, IeThe building of a pipeline and DF's and datasets for analysis which helped the company pinpoint issues, and prioritize actions and investments to maximize ROI.Installed and administered first Hadoop cluster utilizing the Cloudera distribution.Built and supported several AWS, multi-server environment's using Amazon EC2, EBS, and Redshift for benchmark testing and functional comparison.Understand the requirements and prepared architecture document for the Big Data project.Optimized Amazon Redshift clusters, Apache Hadoop clusters, data distribution, and data processingImported Bulk Data into HBase.Programmed ETL functions between Oracle and Amazon Redshift.Perform analytics on Time Series Data exists in Cassandra using Cassandra API.Creating Hive tables, loading with data and writing Hive queries that will run internally.Collecting, aggregating and moving data from servers to HDFS using Flume and Sqoop.Migrated complex programs to in-memory Spark processing using transformations and actions.Collecting the real-time data from Kafka using Spark Streaming.Performed transformations and aggregation to build data model and persists the data into HBase.Worked on POC for IoT devices data, with Spark.Used SCALA to store streaming data to HDFS and to implement Spark for faster processing of data.Worked on creating the RDD's, DF's for the required input data and performed the data transformations using Spark Python.Developing Spark SQL queries, Data frames, import data from Data sources, perform transformations, read/write operations, and save the results to output directory into HDFS.Developed PIG scripts and UDF's for the analysis of semi-structured data.Developed Oozie workflow for scheduling and orchestrating the ETL process.Migrated ETL jobs to Pig scripts to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.Worked with Avro Data Serialization system to work with JSON data formats.Used Amazon S3 to store data. -
Geomarketing And Bi AnalystSofttek Jul 2010 - Dec 2011Monterrey, Nuevo Leon, MxGeomarketing Analyst for Banamex to gather business requirements and translate to user and system requirements.Analysis, design, and testing of the brand new Geomarketing web-based application in a distributed systems environment using Location Intelligence solutions. This application allowed making marketing analysis in a digital mapping environment focused on identifying areas with high potential markets in the banking business. -
Gis CoordinatorSecretaría De Finanzas Del Gobierno Del Distrito Federal Jan 2009 - Jun 2010MxApprove solutions regarding the areas of opportunity in aligning ICT with applicable regulations on the subject.Scrum meetings on a daily basis with the team to keep track of achievements on the different projects under development.Analysis, coordination, research and development of geographic applications to facilitate strategic decision-making in the different government Institutions as well as make available GIS applications for the general public.Development of backend applications to server side using C# and PHO with RDBMS like Postgres, PostGIS (to store spatial data and perform spatial queries) and Oracle database.Extensive use of development tools, API's and languages like Visual Studio, Netbeans, Eclipse, C#, Visual Basic, PHP, HTML, JavaScript, jQuery, leaflet, openlayers, ArcGIS and CartoDB -
Web Development ManagerNeus Dec 2007 - Jul 2008México, Df, MxDevelopment and maintenance of the web platform at global level.Actively participate in the design and development of the own and customers BI systemsActively participate in the design and maintenance of the back office systems and customer supportEstablish the structure and required processes to create and grow the IT direction inside Neus -
Developer And ConsultantTelnorm Jan 2007 - Jul 2008Dallas, Texas, UsAnalysis and development of GIS web applications with geocoding, reverse geocoding, address search, points of interest, thematic maps, social-economic levels.Technologies were used. NET, MapXtreme, MapTP, Google Maps, Opti-Time (routing and logistics)Analysis and development for a GIS application for web with several functionalities such as geocode, inverse geocode, address search, points of interest search, routing, thematic maps and economic level for geographic area.For this project I employed the Navteq’s MapTP mapping technology, MapXtreme, Google Maps and Opti-Time for routing and logistic issues.Job Location Leon, Gto. 13 months:Databases: mySQL, SQL Server 2005 and 2008Operative Systems: Windows XPProgramming Languages: ASP .Net, C#, Visual Basic, MapTP API, MapXtreme API, ChronoX APIOthers: AJAX -
Project Manager And Gis AnalystHildebrando Jun 2005 - Dec 2006Mexico City, Df, Colonia Cuauhtémoc, MxAnalysis of the address location systems worldwide using the geocoding module of SAP Business Objects Data Services. Geocoder development for 80 countries. Development of spatial analysis with MicroStrategy Business Intelligence and Visual CrossingResponsible for the entire Herbalife account, administering 90 human resources among different projects.Responsible for recruiting, interviewing and assignation of the required resources from the client.Elaboration of the monthly programmed costs and profits. -
Sr. Gis ConsultantNeoris Feb 2005 - Jun 2005Miami, Fl, UsConsultancy on functional and operative processes for a GIS system for CemexDevelopment of data acquisition processes.Analysis and development of GIS functionalities for a pilot test in Cemex Costa Rica. -
Gis DeveloperGedas Mar 2001 - Jul 2003Detroit, UsAnalysis, design, development and implementation of business solutions, testing and documentation in the areas of Geographic Information Systems (GIS), Internet, Routing and Logistics. Analysis and consulting for TAO Systems (Spain) of the products available to geographic and cadastral applications under Internet. Development of a GIS laboratory within the company. Among the customers were attended Guanajuato Ministry of Education, Ministry of Planning, Sabritas, Banamex, Grupo Infra and Flecha Amarilla. -
Cartographic Production ManagerInegi May 1993 - Aug 2000Aguascalientes, Aguascalientes, MxCoordination, planning and supervision of activities leading to the development of quality cartographic products for a national program called PROCEDE that gives the certification of land to rural people across the country. Coordination, planning and monitoring digitization activities as a result of indirect techniques for measuring the field data collection, such as GPS, Total Station, aerial photography and photogrammetry.Coordinate the actions necessary to implement a national GIS.Research, analysis and integration of legislation, development plans on issues of cadastre for the State and municipalities. Analysis and integration of information from different government agencies to implement the National Cadastral Information Subsystem.I was in charge of a team of 4 office managers who worked with 27 GIS technicians.
Jose Marti Education Details
-
A Cloud GuruComputer Engineering -
Instituto Tecnológico De LeónSistemas -
Tecnológico De MonterreyTecnologias De La Informacion -
Udemy AcademyTecnología De La Información -
Udemy AcademyBig Data
Frequently Asked Questions about Jose Marti
What company does Jose Marti work for?
Jose Marti works for Comcast
What is Jose Marti's role at the current company?
Jose Marti's current role is Data Architect.
What schools did Jose Marti attend?
Jose Marti attended A Cloud Guru, Instituto Tecnológico De León, Tecnológico De Monterrey, Udemy Academy, Udemy Academy.
Who are Jose Marti's colleagues?
Jose Marti's colleagues are Anthony Martinez, Nicholas Zaremba, Doug Rogers, Erik Nevarez, Josh Patchen, Quinton Ross, Rich Taylor.
Free Chrome Extension
Find emails, phones & company data instantly
Aero Online
Your AI prospecting assistant
Select data to include:
0 records × $0.02 per record
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial