SSL expiry: why we forget until the site is down

HTTPS outages from expired certs are embarrassing because they are preventable. The cert was fine yesterday; today every browser shows a full-stop warning.

Shorter lifetimes, more discipline

Public CAs issue shorter certificates to encourage automation and reduce compromise blast radius. Manual annual purchase still exists for some enterprise chains, but Let’s Encrypt and peers pushed the industry toward frequent renewal.

Staging and admin subdomains get forgotten first. A wildcard on `*.example.com` does not help if someone issued a separate cert for `tools.example.com` three years ago on a different registrar.

Auto-renewal hooks fail silently when DNS validation breaks, HTTP-01 challenges hit the wrong load balancer, or firewall rules change. Monitoring must verify the served chain, not only “renewal job green.”

What to check beyond expiry date

Intermediate chain completeness, TLS version support, cipher suites, HSTS preload eligibility — expiry is one line on the report.

Mixed content and third-party scripts can break after you fix the main cert if embeds pull HTTP assets.

The SSL Certificate Checker on DroidXP gives a planning checklist, OpenSSL command snippets for your hostname, and links to external labs — we do not scan your servers from our side; you run checks from your machine with clear consent.

Operational habits that work

Centralize inventory: every hostname, issuer, renewal method, owner team. Review quarterly even if alerts exist.

Alert at 30 and 7 days, plus on failed renewal jobs. PagerDuty for the marketing microsite too — that is where execs notice first.

After migration to a CDN, confirm the edge cert is managed there, not only origin — double expiry dates confuse incidents.

Incident response when it happens

Renew or reissue immediately, deploy full chain, verify with `openssl s_client` and external SSL labs. Post a short status if user impact was broad.

Root-cause: missing alias, neglected subdomain, or human process bypassed for a “quick” self-signed in staging that leaked to prod.

Document runbook in the same repo as infra code. The next outage should not be a treasure hunt through email for the registrar login.