Brian T Di Bella

Brian T Di Bella Email and Phone Number

Practice Delivery Manager - Principal Site Reliability Engineer @ Deloitte | Founder, Lead Technologist, CTO/CIO @ Deloitte
Brian T Di Bella's Location
Dallas, Texas, United States, United States
About Brian T Di Bella

At Deloitte, my tenure as Practice Delivery Manager and Principal Site Reliability Engineer is marked by establishing cutting-edge SRE practices. Our team has developed and instituted pivotal SLOs and SLIs, substantially enhancing development team performance and bolstering system dependability.In parallel, as the founder and technical mastermind of a stealth-mode startup, my strategy synergizes technology with our business objectives, fostering innovation and scalable growth. My commitment to disrupting cognitive fixedness has positioned us to retain a vanguard stance in the digital transformation arena.

Brian T Di Bella's Current Company Details
Deloitte

Deloitte

View
Practice Delivery Manager - Principal Site Reliability Engineer @ Deloitte | Founder, Lead Technologist, CTO/CIO
Brian T Di Bella Work Experience Details
  • Deloitte
    Practice Delivery Manager - Principal Site Reliability Engineer
    Deloitte May 2024 - Present
    Worldwide, Oo
    Creating SRE Practice as the role as Practice Delivery Manager - Principal Site Reliability Engineer by converting to a direct hire consultant.- Leading a team of SREs to support development and infrastructure teams.- Creating SLOs and SLIs for development teams.- Authoring, designing and conducting brown bag sessions on how to implement SRE processes for incident management and observability.- Properly implementing SLOs in Splunk and Dynatrace.
  • Stealth-Mode Startup Company
    Founder, Lead Technologist, Cto/Cio
    Stealth-Mode Startup Company Jun 2022 - Present
    Ashdod, Il
    Oversee technical aspects of the company's strategy to ensure alignment with its business goals.- Partner with functional and technical leaders to identify and plan capabilities necessary to meet short/long-term business needs- Breaking Cognitive fixedness for teams.- Develop and execute strategic plans in support of key objectives in a timely and fiscally responsible manner- Developing and writing SMART goal proposals.- Provide a clear road map of technology and processes to successfully Digitally Transform the company- Develop a plan to scale flexibly to respond to increased user load over time.- Evaluate research and market analysis meets with third-party vendors- Maintains knowledge of industry innovations and technology platforms- Establish and nurture strategic vendor relationships- Find and implement new technologies that yield competitive advantage- Oversee the delivery of systems that meet scaling expectations and benchmarks- Define and oversee Cloud budgeting and spend- Leverage digitalization of processes across the enterprise to build efficiencies- Oversee execution of all aspects and adoption of Digital Transformation- Define and oversee Information Technology budgeting and spend- Promote data providence and democratization- Define Service Level Agreements and Key Performance Indicators (KPIs) around infrastructure, Applications and API's Availability.- Monitor system infrastructure, application and API's performance metrics to ensure functionality and efficiency- Communicate the value creation and business proposition to teams and functional partners- Define and oversee internal relationships and contribution to open source communities. - Establish and define incident management process, engagement, root cause analysis (RCA), postmortem, metrics and KPIs.- Establish and promote a blameless culture of continuous improvement.- Promote the shortening of feedback loops to improve communication and accelerate improvement of code base.
  • Codeforce 360
    Practice Delivery Manager - Principal Site Reliability Engineer
    Codeforce 360 Nov 2023 - May 2024
    Alpharetta, Georgia, Us
    Responsible for implementing Site Reliability practice for Salesforce, writing Service Level Objectives (SLOs), Service Level Indicators (SLIs), Error Budgets for the client.- Creating SLOs, SLIs and Error Budgets and communicating their purpose and use throughout the enterprise.- Creating SLO, SLI and Error Budget monitoring and alerting in Splunk and Dynatrace.- Fostering a SRE culture and breaking down siloed teams throughout the enterprise. - Capturing critical metrics that illustrate infrastructure and application performance as well as the accuracy of the application or services error budget and availability.- Documenting all processes and procedures for maintaining and supporting the Salesforce environment.- Building Splunk and Dynatrace Monitoring and alerting.- Implementing best practices for Site Reliability Engineering Practice.- Interviewing candidates for the SRE team.- Mentoring and training a team of junior engineers which will adopt the process, procedures and culture of a modern SRE team.
  • Comcast
    Senior Cloud Engineer
    Comcast Aug 2022 - Oct 2023
    Philadelphia, Pa, Us
    Responsible for AWS Cloud infrastructure and Architect. Implementing AWS Elasticache Redis and Supporting Xfinity application development team.
  • The Odyssey Project Inc.
    Cofounder, Lead Technologist, Cto/Cio, Principal Cloud Architect, Devops And Sre
    The Odyssey Project Inc. Apr 2018 - Dec 2021
    Austin, Texas, Us
    Responsible for technology choices, IT culture and processes to build a cloud agnostic system that maintains 99.99% availability. Responsibilities: - Creating the vision of our IT infrastructure to have the highest security, scalability, extreme resilience and maintain outstanding performance.- Build of the cloud agnostic Kubernetes Cluster across multiple availability zones and positioned to migrate across regions for development, staging and production.- Containerizing current applications with Docker.- Designing, implementing and testing disaster recovery plan and procedures.- Deploying and configuring monitoring and alerting with Prometheus, Grafana and Kiali. - Deploying and configuring CI/CD pipeline with Gitlab- Deploying and configuring Istio service mesh- Capturing critical metrics that illustrate infrastructure and application performance as well as the accuracy of the application or services error budget and availability. - Documenting all processes and procedures for maintaining and supporting the environment.- Implementing best practices to for cluster maintenance eliminating configuration drift- Writing python APIs and webhooks to process transactions and up date internal resources.- Mentoring and training a team of junior engineers which will adopt the process, procedures and culture of a modern Site Reliability / DevOps team.- Building automation for manual tasks and processes.- Managing cloud costs and purchasing decisions.- Implemented blue green, canary and dark deployments with Istio for zero downtime deployments in production during peak traffic times.
  • Signify Health
    Senior Site Reliability Engineer And Technical Team Lead
    Signify Health Jun 2020 - Sep 2021
    Dallas, Texas, Us
    Leading the technical team for the Site Reliability Engineering (SRE) practice to implement best practices for monitoring and alerting, teams adopt an on-call schedule, implemented a new incident response process and developing an SRE culture of blameless postmortems. Worked across the enterprise to guide support engineers and development teams to adopt an alert auto escalation platform, which allowed teams to manage their on-call schedules, route alerts, create maintenance windows and produce analytics: Mean Time To Detect, Mean Time To Resolve and Mean Time To Failure. Reconfigured their current monitoring to collect golden signals and configured threshold alerts to reduce noise, thereby reducing alert fatigue. Taught my SRE team the concepts and best practices of SRE by illustrating their purpose and value they bring for the company. Worked with mission critical tier 1, tier 2 and tier 3 application teams first to improve the availability of their applications to be above 99.99%. Created customized alerting for Kafka Lag, brokers dropping and consumer drop events, as well as, other custom alerting. As a result, Signify Health's applications' availability greatly improved, which reduced the loss of revenue caused by application outages and greatly increased users productivity. Users were able to utilize the applications that drove revenue and surpassed revenue expectations allowing the company to go public ~10 months after my start date. Leading the SRE team to continue to improve our process and finding opportunities to reduce toil through automation. Building a culture for high performing teams. Implemented blue green, canary and dark deployments with Linkerd for zero downtime deployments in production during peak traffic times.
  • Southwest Airlines
    Team Lead And Senior Site Reliability Engineer
    Southwest Airlines May 2019 - Apr 2020
    Dallas, Tx, Us
    Responsible for working with the Technology Operations team to review practices and find improvements. Working with development teams to ensure reliability of mission critical applications. Deploying and configuring monitoring and alerting on latency, traffic, errors, saturation and utilization. Transferring knowledge and guiding technical decisions in adopting innovation. Mentoring Junior Site Reliability Engineers and influencing direction of proper documentation creation to capture tribal knowledge. Creating a DevOps/SRE culture of learning from our mistakes and documenting processes. Key Engineer in developing an internal innovation team that solves critical engineering challenges for required to fly and required to operate application. Recommended the deprecation of the crew application and point to point network. Defining Service Level Objectives (SLO), Service Level Indicators (SLI) and Service Level Agreements (SLA) around Site Reliability Engineering team and users. Defining monitoring and alerting around walking the users journey through the application. Creating and calculating Error Budgets for development teams.
  • 7-Eleven
    Senior Devops Engineer / Cloud Architect / Sre Supporting Distrubuted Systems
    7-Eleven Oct 2018 - May 2019
    Irving, Tx, Us
    Product manager responsible for mission critical internal business applications for global retailer. Leading innovation and adoption of distributed systems to replace monolithic applications. - Supporting DevOps environment monitoring and continuous delivery environment- Architecting micro-services environments- Supporting distributed systems- Architected, installed and configured Graylog, Grafana, Statsd, Mongodb, ElasticSearch and Jmeter. - Practicing daily CAMS (Culture, Automation, Monitoring, Sharing) model, making work visible and Kaizan, always improving processes.- Making work visible by documenting processes, creating run book and presenting completed projects using Slack, Kanban Board, Jira Projects, Jira Confluence and Agile Process.- Architecting and supporting Micro-Services AWS Environment (EC2, ECS, EBS, VPC, S3, Lambda, Elastic Beanstalk, Cloud Formation, EFS, RDS, DynamoDB, RedShift, Route53, Kinesis Firehose, SNS, SQS, API Gateway, OpsWorks, IAM and more)- Configuring AMI users, groups, roles and policies.- Architected and managed AWS subnets, security groups, private and public networks.- Architected DevOps Micro-Services Development, UAT and Production environments.- Installed and Configured Continuous Delivery tools in AWS: Code Pipeline, CodeBuild, CodeCommit and CodeDeploy - Supporting Serverless Frameworks such as Lambda. - Installed and Configured Continuous Delivery applications: Jenkins, Chef, Puppet, Git and Ansible - Created log retention policy, configured rsyslog, syslogng and logrotate - Managed tickets using Jira Service Desk, Remedy and more.- Created and managed security certificates for AWS and other cloud Public-Key Infrastructure.- Configured and managed SSL/TSL.- Security hardening of Linux and Unix Servers. - Databases: MySQL, Oracle 12c, Mariadb, Mongodb, Cockroachdb, Microsoft SQL, Postgres.- Simian Army: Chaos Monkey, Janitor Monkey, Conformity Monkey and Chaos Kong
  • 7-Eleven
    Senior Devops Engineer / Aws Cloud Engineer
    7-Eleven Mar 2017 - Sep 2018
    Irving, Tx, Us
    Remotely supporting Public and Private Cloud Environments. - Supporting a DevOps Environment for a Global Retail Company, focusing on CAMS philosophy in an Agile environment.- Serious believer in Kaizen philosophy and applies this to daily activities.- Managing users, groups and roles in an AWS cloud environment. - Architecting solutions, monitoring, installing and configuring Apache, Oracle fusion middleware, Openstack, AWS and Google Cloud. - Daily use of Slack, Outlook, Thunderbird and SOCOCO - Monitoring with New Relic, Thousandeyes, Graylog, Grafana, Elasticsearch, Graphite, Statsd and Splunk- Installing and configuring Ansible automation with Jenkins as continuous integration and delivery. Creating and deploying VMs in a cloud environment, attaching storage, managing users and groups Oracle linux, Redhat linux and Ubuntu linux. - Troubleshooting network connections. Creating security groups, virtual routers and networks. - Managing Users, Tenants, Domains, Groups and Roles in a cloud environment. - Managing Quotas, Networking, Storage and Instances in a cloud environment. - Installed, configured, administering and supporting GrayLog, Splunk, Sensu, Graphite, Statsd, ElasticSearch, Mongodb, MySQL, Mariadb, Dupal, Composer, Python 2/3, PHP. - Scritping in Bash and Python to automate alarms - Security Hardening Debian, Ubuntu, Centos 6/7, Redhat 6/7, Fedora, Oracle 5/6/7- Architected a Microservices, Immutable Architecture on AWS ECS Instances with Docker.- Continuous Integration and Delivery using AWS Pipeline, Jenkins.- Version control using Github, Subversion and AWS S3 repository.- Building documentation library through the building and deployment process. - Created document standardization templates- Standardized Process to post completed documents to Slack Channel making work visible- Created DevOps Runbook, responsible for keeping it current.- Simian Army: Chaos Monkey, Janitor Monkey, Conformity Monkey and Chaos Kong
  • Insight Global, Ntt Data, Harvard Pilgrim (Contractor)
    Exalogic Subject Matter Expert & Fusion Middleware Engineer
    Insight Global, Ntt Data, Harvard Pilgrim (Contractor) Sep 2016 - Jan 2017
    - Installed, configured and preformed administrator tasks in Exalogic x3-­2, x4­-2 and x5-2 engineered systems.- Extensive Experience configuring, managing and patching Oracle Linux 5, 6 and 7, as well as, Solaris 11.x, as a Linux/Unix Administrator.- Monitored and managed Exalogic and Exadata environments using Oracle Enterprise Manager Cloud Control 12c (OEMCC).- Deployed agents to and monitored targets using OEMCC 12c (Databases, Application Services, Listeners and vServers).- Daily use of Enterprise Manager Ops Center (Exalogic Control) to create, monitor, control and manage vServers, compute nodes, cloud user accounts, Infiniband Network and more.- Installed, configured and monitored exalogic environment using Oracle Enterprise Manager Cloud Control 11g and 12c- Integrated Lights Out Manager (ILOM) to monitor the Infiniband Fabric and Exalogic Leaf Switches.- Patched vServers in Exalogic x3­-2, x4-­2 and x5-2 environments. - Participated in the deployment and configuration of the external ASR Server. - Preformed Exacheck and other general health check utilities to validate the health of the Exalogic Environment. ­- Backed up Exalogic environment using Exabr. - Installation, configuration and daily use of Oracle Virtual Manager (OVM) on commodity hardware outside of Exalogic. - Configuring monitoring and alerts in Oracle Enterprise Manager Cloud Control 12c.- Managed and configured Oracle Traffic Director (OTD).- Created and configured ​Pools, Projects and Shares on the ZFS Appliances​ as well as setting up alerts within Exalogic Appliance. - Working along side Oracle Advanced Consulting Services (ACS) and Oracle Consulting Services (OCS) to triage Exalogic issues as well as hardware upgrades and maintenance and opened SR’s using My Oracle Support.- Created and uploaded ​Support Bundles​ from the internal​ ZFS Appliance and created ZFS snap shots.- Manually removed faults on ZFS appliance command line interface. - Install and configured Splunk
  • Buzzclan, Bed Bath And Beyond, Ibm, Sophistocated Systems, State Of Ohio
    Oracle Fusion Middleware Administrator, Exalogic Engineer, Senior Associate
    Buzzclan, Bed Bath And Beyond, Ibm, Sophistocated Systems, State Of Ohio Aug 2014 - Sep 2016
    This Oracle Partner is a business consulting company collaborating to provide Oracle software advisory services and implementation services. Installed, configured and supported Openstack: Horizon, Nova, Neutron, Keystone, Heat, Swift, Cinder and Rabbitmq. Created and configured OpenStack Virtual Machines in Horizon and managed certificates for Primary Key Infrastructure. Created and defined internal and external Networks and Routers in OpenStack, created floating IP, flavors and volumes in OpenStack using Horizon and CLI, triaged issues in Openstack using Command Line Interface Exalogic/Private Cloud Engineer. Exalogic Elastic cloud. UNIX Admin (Solaris, Linux), Database/Weblogic.Managing/administrating Exalogic using EMOC, OEM and CLI user interfaces. Active passive setup for OTD as a part of Exalogic. Managing and Configuring the users and access. Applied PSU patches to Exalogic rack. Exalogic Elastic cloud (Physical and Virtual) Interacting with Oracle support. Create worksheet to monitor ZFS storage performance. Backup (ExaBR) and Report Generation (Exachk, Exalogs) backup configurations. Managing/Administering Exadata/Exalogic components like InfiniBand switches, ZFS storage, compute nodes. Running EXAcheck & RACcheck reports on Exadata/Exalogic. Handling OEM & ILOM alerts for all the Environment Monitoring the whole stack using EM12c. Managing/administrating Exalogic using OEM and managing Exalogic using CLI/Graphical user interfaces.Other cloud technology: OpenStack, Docker, AWS, Simian Army: Chaos Monkey, Janitor Monkey, Conformity Monkey and Chaos KongAdministered, monitored and managed multiple virtualized machines in a development, test and production.
  • Consulting
    Exalogic Engineer, Unix/Linux Systems Administrator, Oracle Fusion Middleware Administrator
    Consulting Jan 2002 - Jul 2014
    Configure and migrate Operating System, Middleware Applications and Database. Installation & Configuration of Oracle SOA 11g. Installation/configuration/troubleshooting of issues in Oracle Identify Management suite (Oracle Internet Directory, Oracle Access Manager, Oracle Identity Manager, webgate configuration). Scripting for stop/start/reload/configure/troubleshoot all aspects of Weblogic Application Servers. Responsible for monitoring daily operations, securing contracts with new clients and updating current contracts with client base. Worked in a virtualized server environment based on Oracle Virtual Machine (Oracle VM). Configure of multiple managed servers across machines in various topologies (Including maximum availability architecture) with Weblogic and JMS clustering for high availability and scalability. Basic knowledge in Microsoft Active Directory, LDAP and Oracle RAC database environment. Monitoring daily matrix numbers and other activity reports. Creative solutions to structuring work flow, projects, new technology, data capture, web layout and functional design. Coordinate with business analysts, developers, and DBA's to ensure optimal solution, performance and reliability.Additional Experience: Compliance Manager and Project Manager. Interfaced directly with Executives and Sales Channel Management to report production numbers, compliance issues and all aspects of the project. Trained and supported Verizon FiOS representatives on using Verizon's Sale Software. Managed sales data for a team of 20 marketing representatives on the Verizon FiOS marketing campaign. Tracked completion and test scores using Verizon's online tracking system. Manage and track compliance for our team.Environment: Oracle WebLogic server, Oracle Fusion Middleware applications, SOA/BPEL/BPM, Cacoo, HTML5, CSS3, SQL and AJAX, Taleo, Peoplesoft, Google Webmaster Tools, Google Adwords, Google Analytics, ArcGIS, ArcView, ArcInfo, Microsoft Office Suite, Open Office Suite.

Brian T Di Bella Education Details

  • Thayer School Of Engineering At Dartmouth
    Thayer School Of Engineering At Dartmouth
    Digital Transformation
  • Harvard Business School Online
    Harvard Business School Online
    Certificate In Entrepreneur Essentials
  • Harvard Business School Online
    Harvard Business School Online
    Certificate Of Specialization In Entrepreneurship And Innovation
  • Harvard Business School Online
    Harvard Business School Online
    Certificate In Disruptive Strategy
  • Harvard Business School Online
    Harvard Business School Online
    Certificate In Design Thinking And Innovation
  • Harvard Business School Online
    Harvard Business School Online
    Business Analytics
  • Ucla
    Ucla
    Geography/Environmental Studies - Gis And Remote Sensing
  • Santa Monica College
    Santa Monica College
    Environmental Geography / Gis

Frequently Asked Questions about Brian T Di Bella

What company does Brian T Di Bella work for?

Brian T Di Bella works for Deloitte

What is Brian T Di Bella's role at the current company?

Brian T Di Bella's current role is Practice Delivery Manager - Principal Site Reliability Engineer @ Deloitte | Founder, Lead Technologist, CTO/CIO.

What schools did Brian T Di Bella attend?

Brian T Di Bella attended Thayer School Of Engineering At Dartmouth, Harvard Business School Online, Harvard Business School Online, Harvard Business School Online, Harvard Business School Online, Harvard Business School Online, Ucla, Santa Monica College.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.