Tony Holmes Email and Phone Number

Q: What is Tony Holmes's role at the current company?

Tony Holmes's current role is Sre Leader

Q: What is Tony Holmes's email address?

Tony Holmes's email address is tony@crosswinds.net

Q: What schools did Tony Holmes attend?

Tony Holmes attended University Of Waterloo

Sre Leader at @ Apple

Mountain View, CA, US

Tony Holmes's Location

Mountain View, California, United States, United States

Tony Holmes's Contact Details

Tony Holmes work email

to****@****nds.net View
Valid
to****@****ube.com View
Valid

Tony Holmes personal email

n/a

About Tony Holmes

Seasoned veteran with extensive Engineering Leadership Experience across multiple industries. Highly effective customer- and product-centric leader specializing in building high performance teams, connecting every level of an offering to the customer experience via cutting edge observability and insights.

Tony Holmes's Current Company Details

Apple

View

Sre Leader

Mountain View, CA, US

Website:: apple.com
Employees:: 163018

Tony Holmes Work Experience Details

Sre Leader

Apple

Mountain View, Ca, Us

View
Head Of Sre

Affirm Apr 2024 - Present

San Francisco, California, Us

Building and Evolving the SRE function at Affirm

View
Sre Leader

Apple Feb 2022 - Mar 2024

Cupertino, California, Us

The ACS Platform SRE team is responsible for building and operating the platform on which the customer-loved Apple Services are built. I lead the SRE teams responsible for FaceTime, iMessage, iCloud Edge, Push Notification, and inter-service communications.● Created a culture of strong psychological safety enabling engineers to be honest and open● Empowered SMEs to drive decisions and inform leadership instead of "asking permission"● Applied toil reduction measures to reduce on-call and interrupts, increasing project throughput● Eliminated on-call overload through team alignment with, and only funding most critical services● Decisively eliminated toxic behaviors, and directly resolved pre-existing inter-team and interpersonal conflicts● Effectively used the Code Red framework to provide stability and focus for critical service improvement● Championed strategic bets where expected impact exceeded normally accepted (short) timeframes● Technologies: Java, Go, Python, k8s, L4/L7 LBs (nginx, custom), DNS, service discovery, BGP, ECMP, DDoS, (m)TLSOutcomes and Results:● Toil/Operational work reduced from 80% (Apr 2022) to 63% (Feb 2023)● Promotion of manager from M1 to M2, 5 engineers to Senior● Defined and rolled out Service Criticality to all of iCloud● Improved overall iCloud service intercommunication stability

View
Head Of Site Reliability Engineering (Sre)

Wish Mar 2021 - Jan 2022

San Francisco, Ca, Us

The SRE team was responsible for defining and implementing SRE principles across Wish.● Transformed the SRE organization at Wish from an operational to customer service centric model● Created and codified critical business-wide processes including platform/service criticality, incident management (IM), unbiased performance/promo templates and process, SLI/SLO creation, precise service performance measurement, and improved on-call● Implemented toil reduction strategies including SLO definition, aligned alerts against SLOs, and enabled self-service● Instilled a culture of constant incremental improvement, and earlier SRE engagement by implementing the PRR (production readiness review) process directly into the SDLC workflow, to streamline and accelerate launches● Established relationships with Engineering, Product, Infrastructure, and Security orgs to champion Reliability focus, align SRE roadmap with key initiatives, and steer skill priorities for SRE hiring● Mentored Product and Engineering leads on Reliability and Software best practices and org structure● Enabled engineering teams to improve services by surfacing performance and cost efficiency data● Technologies: k8s, L4/L7 LBs (nginx, ELB), AWS, DNS, HashiCorp Vault, BGP, ECMP, Python, Go, PromotheusOutcomes and Results:● Delivered clear prioritization framework and rubric for consistent company-wide use● Doubled SRE team size by attracting and Staff+ SREs to own and lead domains● Shortened new product process production deployment from 16+ to 6 weeks● Successful adoption of Code Yellow and large scale IM process across organizations● Reduced SLI measurement error by 94% and reduced end-user latency by 40%● Increased utilization of cloud resources from <25% to 55% on average saving $4.5M monthly

View
Site Reliability Engineering Manager

Youtube Dec 2019 - Feb 2021

San Bruno, Ca, Us

The YouTube Search/ML SRE team owned the Search, and Machine Language infrastructure. When hired, the team owned 5 total service domains which imposed too much cognitive load. My first act was to identify domains of alignment, and reorganize into three teams.● Resolved gaps in the feature cost tooling (predicative cost/capacity impact) by identifying the source of the gap, finding a complementary solution, PoC'ing, and driving adoption through all of YouTube● Developed programs to improve End User qualitative experience, and create contextual mitigation strategies● Leveraged previous remote-team experience to seamlessly transition into Work From Home context during COVID onset, successfully onboard new hires, and maintain strong team communications and delivery velocity● Championed the End-User journey as the primary health metric (versus system health)Outcomes and Results:● Drove a 95% reduction in End User session level errors of all severities● Served 35% COVID-19 increase in traffic while decreasing deployed footprint by 25%● Identified and fixed low levels issue saving 10% in service dedicated server costs● Reduced Search p99 tail latency from 120s+ to 20ms

View
Sr Engineering Manager

Riot Games Aug 2018 - Jul 2019

Los Angeles, Ca, Us

Lead the Core Identity Services team responsible for Identity, DNS, and Authentication platform services.● Developed and mentored Managers, Leads, and ICs in career and craft development● Co-driver of new org structure to increase engineering to manager ratio from 1:1 to 3:1● Engaged with customers and stakeholders outside of the org to build strong alliances, identify gaps in current solutions to inform future strategies and roadmaps for a cloud-first PaaS offering● Established the first formal documented PIP process which became the templateOutcomes and Results:● Rebuilt team to address performance issues, resulting in 10x delivery throughput● Delivered successful DNS consolidation in 6 weeks after 3 previous failed attempts over 5 years

View
Sr Engineering Manager

Netflix May 2016 - Aug 2018

Los Gatos, Ca, Us

The Labs team is responsible for supporting the platform and availability of the internal developer version of the Netflix application, manage pre-production and released external partner hardware (ie. LG, Sony) for certification, and support the internal Certification and Test teams and external partners through the certification process.● Restructured the single team into 3 domains of focus, and drove the creation of a test and certification PaaS offering supporting internal and external stakeholders, including bootstrapping SRE into the local org● Mentored ICs, Leads, and Managers career and craft development, in both technical and soft skills● Engaged with partner teams to effectively surface critical metrics and signals for alerting● Introduced cloud and datacenter paradigms to build a new high density device test environment● Created physical device container and pluggable interface (patented) to simplify deployment● Unified Netflix Reference Application (RefApp) pipelines into single release framework that supported diverse cadences for external customer (6 month) and internal development (rapid iteration)Outcomes and Results:● Increased device (consumer quality) availability from 40% to 96%, RefApp use for testing 100x● Decreased AWS cost from $1.2m to under $200k yearly● Reduced average RefApp deployment time from 3.5 days to 10 seconds

View
Site Reliability Engineering Manager

Linkedin Sep 2015 - May 2016

Sunnyvale, Ca, Us

Lead the Identity Site Reliability team responsible for all Profile Data, Connections, and Network Service (Saas).● Resolved deep morale and technical issues, and rebuilt into a motivated and cohesive team● Mentored and developed each of the engineers into the SME for areas of interest and development● Implemented SRE principles to align and prioritize work via OKRs, SLOs, and a toil reduction plan● Improved the quality of the observability/metrics platform by eliminating 95% of (non-useful) dataOutcomes and Results:● Mentored an IC into management, one to Staff level, one to Senior level● Established and published first internal service SLOs and drafted SLAs, used OKRs to prioritize work● Scaled profile capacity from 400k rps to 1mil rps● Built measurement system to measure and validate capacity and SLO● Delivered first and only zero-incident year end / New Year (2016)

View
Sr Engineering Manager

Sysomos / Marketwired Jul 2013 - Jun 2015

The Systems Engineering team owned all aspects of the Sysomos social analytics and Marketwired PR service including the infrastructure, networking, colocation, systems management, security, deployments, and HA / disaster recovery. I lead a team of 2 Managers, 2 engineering ICs, and 45 Contractors.● Drove the move from an ad-hoc, unstructured platform to an SOA based 3-tier architecture, allowing the legacy systems to operate and scale without code change.● Lead the design and build-out of an 8000 node Hadoop platform across two datacenters. Used this as an opportunity to mentor the Managers and ICs into primary ownership of domains, growing them.● Spearheaded the design of a next-gen virtualized Hadoop PoC with IBM and our developers to reduce data-center footprint by 98%● Managed all aspects of the RFP and RFQ process, OPEX and CAPEX reporting for the infrastructure / hardware, colocation and datacenter contracts, and service partnershipsOutcomes and Results:● Automated deployments reduced delivery from days to minutes● Reduced data-center costs by 30% by scaling down systems during off peak● Reduced incident rates from multiple a night to 1 a week● SOA pattern decoupled and isolated legacy code, accelerating new developmen
Team Lead/Senior Systems Administrator

Datawire Communication Networks Inc. / First Data May 2010 - May 2013

The Systems Platform team was responsible for all operational aspects of a PCI-1 / Sox compliant transaction transport network supporting across 330,000 merchants in 28 datacenters worldwide.● Primary SME for yearly PCI / SOC compliance audits● Drove modernization of the platform from single-use servers to a virtualized environment improving security, observability,deployments, and rollbacks● Planned and executed mission critical load balancer migration, and modernization of observability stack● Lead architect of distributed back off environment to improving Primary Security SME for and SSL/TLS services, responsible forexploit mitigation and large certificate migration● Designed and deployed SSL acceleration to improve capacity and performanceOutcomes and Results:● Backoffice: SLA availability improved 99.5% to 99.99%, reduced costs 73% and record update latency from 5min to under 1s● SSL improvements increased total network capacity from 20k rps to over 120k rps with no additional hardware● Virtualization platform reduced deployment/rollback from 5 hours to 30mins (<1min emergency)
Team Lead/Senior Unix Administrator

Sunwing Travel Group / Signature Vacations Oct 2007 - Apr 2010

Team Lead of Systems Admin team focused on security best practices, business continuity and disaster recovery plans, budgeting, and infrastructure evolution planningPrimary responsibility was the security, reliability, and performance of heterogeneous Unix systems, storage, and WebLogic based Java applicationsDrove automation efforts to minimize manual intervention and reduce downtime
Co-Founder / Cto / Systems Architect

Crosswinds Internet Communications Inc 1997 - Oct 2007

Architected and deployed secure and reliable free Web and Email based service for 1.8mil users100% distributed office, lead a team of 12 engineers around the world to deliver servicesResponsible for all technical systems for production service, internal development, and productivity
Design Engineer

Leitch Technology International Inc Apr 1996 - Jun 1998

Designed and built custom hardware and OS for video data and control network (Still File projects)Created roadmap and implemented mixed Unix/Windows network for global R&D with disaster recovery, data backup and recovery, and redundant connectivityReverse engineered and clean room implementation of compatible IPX protocol for FreeBSDContributed to software and hardware process standards, security and performance consulting

Tony Holmes Skills

Linux Unix High Availability Servers Networking Data Center System Architecture Load Balancing Disaster Recovery Cloud Computing Network Security Team Leadership Operating Systems Mysql Network Architecture Integration Shell Scripting Unix Shell Scripting Vpn Ubuntu Red Hat Linux It Infrastructure Operations Tcp/ip Network Monitoring Tools Apache Hardware Freebsd Virtualization Perl Security Firewalls Windows Php Hp Ux Lamp Administration Xen Drbd Vmware Esx Esxi Ospf Postgresql Dns Hadoop It Strategy System Deployment Technical Vision Agile Methodologies Distributed Systems Scalability Organizational Design Software Development Life Cycle Software Development Strategy Performance Management Engineering Project Management Web Services Scrum Amazon Web Services Python Sql Leadership Management Performance Improvement Cross Functional Team Leadership Technical Direction Site Reliability Engineering Microservices Engineering Leadership

Tony Holmes Education Details

University Of Waterloo

Computer Engineering With Coop

Frequently Asked Questions about Tony Holmes

What company does Tony Holmes work for?

Tony Holmes works for Apple

What is Tony Holmes's role at the current company?

Tony Holmes's current role is Sre Leader.

What is Tony Holmes's email address?

Tony Holmes's email address is to****@****nds.net

What schools did Tony Holmes attend?

Tony Holmes attended University Of Waterloo.

What are some of Tony Holmes's interests?

Tony Holmes has interest in Children, Economic Empowerment, Politics, Environment, Education, Science And Technology, Disaster And Humanitarian Relief, Health.

What skills is Tony Holmes known for?

Tony Holmes has skills like Linux, Unix, High Availability, Servers, Networking, Data Center, System Architecture, Load Balancing, Disaster Recovery, Cloud Computing, Network Security, Team Leadership.

Who are Tony Holmes's colleagues?

Tony Holmes's colleagues are Paige Miller, Rao Priyanka, Kapil Singh, Sriram Moorthy, Joy Chinen, Yiğit Efe Karaş, Alixandra Peña.

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles

Get direct phone numbers & mobile contacts

Access company data & employee information

Works directly on LinkedIn - no copy/paste needed

Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.

Security Check