Matthew Evans

mattevans.cloud | me@mattevans.cloud | [ 704-207-6585 ]

Site Reliability Engineer with a strong belief in K.I.S.S., judicious usage of AI to improve productivity, and a love of automation, observability, and working collaboratively to achieve the impossible.

Skills

site reliability engineering programming devops databases containers linux systems administration webservers project management system architecture & design hybrid-clouds

SRE: Grafana, Loki, Prometheus / Victoria Metrics, Datadog, New Relic
DevOps: Puppet, Ansible, Terraform, Jenkins, ArgoCD, Kubernetes, Docker
Cloud: AWS, Azure, AWS CDK, Hybrid (VMware, Proxmox, Hyper-V)
Programming: Bash, Python, Go, Github Actions
Databases: Postgres, Aurora Postgres, MongoDB, sqlite
Linux: RHEL & Derivatives, Debian & Derivatives

Experience

Director of Site Reliability Engineering, Cyware

2023 to Present, Charlotte, NC (remote)

Hired to separate SRE function from DevOps, but remain very hands-on.
Led team of 5 Senior Site Reliability Engineers, reporting to Senior VP of Engineering.
Evaluated several observability platforms and settled on continuing with Grafana stack due to highest ROI.
Wrote numerous custom API solutions in Python to ingest crtitical metrics from Cyware core applications into Prometheus and automate mundane tasks, like EC2 disk expansions.
Deployed ‘masterless Puppet’ GitOps-based state configuration for legacy EC2 platform, to stabilize and prevent recurring issues due to inconsistent configurations.
Wrote custom solution (Python + Go) to pull metrics for all alerts to move to data driven based decision making.
Wrote logic and Grafana dashboard to track ‘all-customer’ and ‘per-customer’ uptime and SLA metrics.
Built-out follow-the-sun on-call rotation in Opsgenie, blameless post mortem, and established first set of SLIs, which led to achievable SLOs and SLAs.
In < 1 year, decreased number of average monthly alerts from over 1,000 to less than 50, and brought uptime from 95% to 99.98% across all customers.
Partnered with DevOps to champion “Next-Generation” Kubernetes based GitOps platform, rolled out in January 2024.
Responsible for disaster recovery and cloud-related BCP architecture and SOPs.
Championed and deployed hybrid-cloud for non-production workloads, decreasing monthly AWS spend by over 50%.
Finally, championed a “document-all-the-things!” approach, to prevent accumulation of tribal knowledge, prevent errors and inconsistencies, and improve operational time to resolution.

Technologies used: Python, Bash, Terraform, AWS, Azure, Grafana, Loki, LogQL, Prometheus, PromQL, ArgoCD, Puppet, Kubernetes, Helm, Docker

Director of Site Reliability Engineering, Prometheus Alternative Investments

2020 to 2023, Charlotte, NC (remote)

Joined this mobile-first startup, which unfortunately lost funding in early 2023 and closed shop.
Led team of 4 employees and 6 contractors in Pakistan.
Led the development and execution of the cloud strategy, resulting in a 400% reduction in AWS costs while improving reliability from < 90% to > 99.99%, as measured by BetterStack.
With no defined DevOps function, combined DevOps & SRE into one team, greatly improving collaboration from developer workstation through production deployment.
Re-architected AWS platform using AWS Fargate and MongoDB Cloud to reduce costs and complexity, while allowing platform to scale rapidly to meet MAU target.
Worked in partnership with Datadog Partner to quickly deploy full RUM and observability stack, while transfering operational knowledge to my team, saving both time and money versus the go-it-alone approach.
Established SLIs, SLOs, and resultant SLAs.

Technologies used: AWS, CDK, Terraform, Fargate, Datadog, BetterStack, Opsgenie, Github Actions

Director of SRE & CISO, Alpha Theory (SaaS) / Centerbook Partners (Hedge Fund)

2011 to 2020, Charlotte, NC (remote)

As a member of the executive team, participated in organizational planning and direction, interviewing of all new hires, and ensuring security had a voice at the highest levels of the organization.
Managed team of 5 employees and 4 outsourced contractors.
Led the organization from startup, through midlife, and finally to a parent organization with $200 billion under management, while acting as interim-CTO to the wholly owned hedge fund subsidiary, Centerbook Partners ($2B AUM)
Led the efforts to move from on-prem to an AWS/Azure hybrid infrastructure, allowing the SaaS application to utilize ephemeral virtual machines, reducing various daily jobs’ time by several orders of magnitude.
As CISO, led annual black-box penetration testing for infrastructure and application, responsible for vulnerability management program, and overall security posture across the entire organization.

Technologies used: AWS, Azure, VMware, pfSense, Cisco, Datacenter, Megaport, Puppet, Bash, Jenkins

Other Noteworthy Employers

2023 to 2023, Wells Fargo
2009 to 2011, Honda Aircraft
2007 to 2009, IBM
2005 to 2007, AIG (United Guaranty)

Awards & Recognition

Won bid to deploy first public WiFi in Center-City Park in Downtown Greensboro, NC.
Honored by Honda Aircraft CEO Michimasa Fujino for ingenious internet-based streaming broadcast system for OshKosh airshow, saving the company several million over traditional satellite live-broadcast provider.

Projects

ipcheck.sh (Python to Golang learning project)

Certifications

Certified Information Systems Security Professional (CISSP)

2017-Present
Verify at Credly

AWS Solutions Architect, Associate

2023-Present
Verify at Credly

Certified Kubernetes Administrator (CKA)

2024-Present

Certified Kubernetes Security Specialist (CKS)

2024-Present