The only things that continue to amaze me are the number of (mostly "enterprise") software products that simply won't get with the times (or get it wrong, like renewing the cert, but continuing to use the old one until something is manually restarted), and the countless IT departments that still don't support any kind of API for their internal domains...
You've got to remember that the reduction to a 45-day duration is industry-wide - driven by the browsers. Any CA not offering automated renewal (which in practice means ACME) is going to lose a lot of customers over the next few years.
The other CAs aren't prohibitively priced for anyone who has a business need for lots of certificates, in case Let's Encrypt disappears or goes rogue.
Not if you look at the per-cert pricing, but if you factor in the cost of "dealing with incompetent sales" and "convincing accounting to keep the contract going", they absolutely are.
> If you have a server or other device that requires automatic issuance of certificates and supports the ACME protocol, you can use our free 90-day ACME certificates on all plans.
Zerossl is integrated with Caddy by default and there’s no indication from Caddy that you would only be able to renew the cert twice before needing to cough up some money.
Yeah, no one's rewriting a bunch of software to support automating a specific, internet-facing, sometimes-reliable CA.
Yes it's ACME, a standard you say. A standard protocol with nonstop changing profile requirements at LE's whim. Who's going to keep updating the software every 3 months to keep up? When the WebPKI sneeze in a different direction and change their minds yet again. Because 45 will become 30 will become 7 and they won't stop till the lifetime is 6 hours.
"Enterprise" products are more often than not using internal PKI so it's a waste.
I would like to see the metrics on how much time and resources are wasted babysitting all this automation vs. going in and updating a certificate manually once a year and not having to worry the automation will fail in a week.
Well, I could regale you with my anecdotes on how all my 'grab a LetsEncrypt cert on Docker deploy and whenever remaining lifetime goes below 75%' services have not had a single TLS-related outage in at least a year, and how we just had a multi-day meltdown caused by a load-bearing cert that everyone forgot about expiring, but I doubt you'll be interested.
I'm not here to take away your right to set up a few more 5-year+ internal CAs (or, hey, how about an entire self-signed hierarchy? can't be that hard...), and neither is LetsEncrypt. But on the Big Bad Internet (where Google is the leading cause of More Security, and LetsEncrypt is merely the leading vendor catering to that), things have moved on slightly.
Also, everything is using https now. Living in a low-income country, certificates were too expensive to use them where they weren't absolutely required, but not anymore. This is orthogonal to automation, I'm just pointing out that LE is not as demonic as you make it out to be.
I'm afraid enterprise users are on their own, probably approximately no-one else is interested in going back to the old ways of doing it. (Maybe embedded.)
The best solution I’ve found so far was to implement a custom cert manager using the formidable acmez library.
I have multiple systems. Legacy that I've inherited, and modern that are able to automatically update their certs.
The automated updates require almost zero maintenance, there was a week a couple of years ago when I had to switch some configuration for a new root cert.
The legacy system requires manual installation and multiple weeks of my time every year because of the number of environments, and since generation of the certs requires me to file a request for someone to manually create it, they invariably typo something and it has to be redone everywhere.
So multiple engineers, over multiple weeks?
Manual process at a guess is £50k pa, while the automated is close to an annual £0?
After all, the logical approach is to schedule a daily cron job for your renewal process, which only contacts the server when the current cert has less than X days of validity remaining. Scheduling a one-off cron job every, say, 60 days would be extremely error-prone!
With regards to future reductions: the point is to force companies to automate renewal, as they have proven to be unable to renew in time during incidents due to dozens of layers of bureaucracy. 45 days means one renewal a month, which is probably sufficient for that. Anything below 7 days isn't likely to happen, because at that point the industry considers certification revocation unnecessary.
Internal PKI is not relevant here. Anyone with weird internal cert use shouldn't be using public certs anyways, so moving those people to certs backed by self-signed internal CAs is a win. They are free to use whatever validity they want with those.
It really doesn't change that often. And whether this is a "breaking" change is something that's debatable (you shouldn't hard-code the cert lifetime, but I suspect many programs do, so de-facto it may or may not be a breaking change).
If you look at the Go implementation for example (https://github.com/golang/crypto/commits/master/acme), then there haven't been any major changes since it was first written in 2018: just bugfixes and basic maintenance tasks. Maybe I missed a commit but it certainly doesn't require 3-month update cycles. You can probably take that 2018 code and it'll work fine today. And since it doesn't hard-code the 90 days it will continue to work in 2028 when this change is made. So it's more "you need to keep updating the software every decade to keep up".
> We expect DNS-PERSIST-01 to be available in 2026
Very exciting!
https://datatracker.ietf.org/doc/html/draft-sheurich-acme-dn...
I really would have felt better with a random token that was tied to the account, rather than the account number itself. The CA side can of course decide to implement it either way , but all examples are about the account ID.
There are pros and cons of various approaches.
> Up to 50 certificates can be issued per registered domain (or IPv4 address, or IPv6 /64 range) every 7 days. This is a global limit, and all new order requests, regardless of which account submits them, count towards this limit.
This is hard to deal with when you have a large number of subdomains and you'd rather (as per their recommendations) not issue SAN certificates with multiple subdomains on them.
We are working on further improvements to our rate limits, including adding more automation to how we adjust them. We're not ready to announce that yet.
We wanted to get this post out as soon as we'd decided on a timeline so everyone's on the same page here.
[1] https://letsencrypt.org/docs/rate-limits/#new-certificates-p...
- Have one node with its own ACME account. It controls key generation and certificate renewal, and then the new key+cert is copied to all nodes that need it. Some people don't like this solution because it means you're copying private keys around your infrastructure.
- Have one node with its own ACME account. The other nodes generate their own TLS keys, then provide a CSR to the central node and ask it to do the ACME renewal flow on their behalf. This means you're never copying keys around, but it means that central node needs to (essentially) be an ACME server of its own, which is a more complex process to run.
- Have one ACME account, but copy its account key to every node. Have each node be in charge of its own renewal, all using that shared account. This again requires copying private keys around (though this time its the ACME key and not the TLS key).
- Give every node its own ACME account, and have each node be in charge of its own renewal.
The last solution is arguably the easiest. None of the nodes have to care about any of the others. However, it might run into rate limits; for example, LE limits the number of new account registrations per IPv6 range, so if you spin up a bunch of nodes all at once, some of them might fail to register their new accounts. And if your organization is large enough, it might run into some of LE's other rate limits, like the raw certificates-per-domain limit. Any of the above solutions would run into that rate limit at the same time, but rate limit overrides are most easily granted on a per-account basis, so having all the nodes share one account is useful in that regard.
Another factor in the decision-making process is what challenge you're using. If you're using a DNS-based challenge, then any of these solutions work equally well (though you may prefer to use one of the centralized solutions so that your DNS API keys don't have to live on every individual node). If you're using an HTTP-based challenge, you might be required to use a centralized solution, if you can't control which of your frontends receives the HTTP request for the challenge token.
Anyway, all of that is a long-winded way to say "there's no particularly wrong or right answer". What you're doing right now makes sense for your scale, IMO.
Don't pin to certs you don't control.
The real modern way to do certificate pinning is to not do certificate pinning at all, but I'm sure that you've already heard this countless times before. An alternative option would be to run your own private CA, generate a new public/private keypair every 45 days, and generate certificates with that public key using both your private CA and Let's Encrypt, and then pin your private CA instead of the leaf certificates.
It's allowed, but the intent of short cert expiration is to also have short private key lifetimes, so that point in time key compromises have time limited forgery potential. If the same key is used for a long period, someone who got the key once can download your public certificate and use that with the key they have.
> The real modern way to do certificate pinning is to not do certificate pinning at all, but I'm sure that you've already heard this countless times before. An alternative option would be to run your own private CA, generate a new public/private keypair every 45 days, and generate certificates with that public key using both your private CA and Let's Encrypt, and then pin your private CA instead of the leaf certificates.
The tricky thing here is you need to serve your private CA signed cert to pinned clients and a PKI cert to browser clients. If you want cert pinning for browsers (which afaik, is very limited availability), you should pick at least two CAs that you are pretty sure will continue to issue you certs and won't be delisted; bonus if you're also confident they won't issue certs for your domains without your explicit consent.
I would also recommend two private CAs, if you're doing private CAs. Store them physically separate, so if one is compromised or becomes unavailable, you can use the other.
On the other hand, if you're using an app specific server, there's no need for you to use public certificates. A self-generated one with a five or ten year validity will pin just as nicely. That breaks if you need web browsers or third parties to talk to the same API, of course.
- Make sure your app checks that enough trusted embedded Signed Certificate Timestamps are present in the certificate (web browsers and the iOS and Android frameworks already do this by default).
- Disallow your app to trust certificates that are more recently requested than N hours. This might be hard to do.
- Set up monitoring to the certificate transparency logs to verify that no bad actor has obtained a certificate (and make sure you are always able to revoke them within N hours).
- Make sure you always have fresh keys with certificates in cold storage older than N hours, because you can't immediately use newly obtained certificates
But then they set some shorter lifetime, and I was forced to set up automation and now I've gotten a way of doing it and it's pretty easy to do. So now I either `cloudflared` or make sure certbot is doing its thing.
Perhaps if they'd made that more inconvenient I would have started using Caddy or Traefik instead of my good old trusty nginx knowledge.
So you can start renewing with 30d of lifetime remaining. You probably want to retry once or twice before alerting. So lets say 28d between alert and expiry.
That seems somewhat reasonable. But is basically the lower margin of what I consider so. I feel like I should be able to walk away from a system for a month with no urgent maintenance needed. 28d is really cutting it close. I think the previous 60d was generous but that is probably a good thing.
I really hope they don't try to make it shorter than this. Because I really don't want to worry about certificate expiry during a vacation.
Alternatively they could make the acceptable behaviour much higher. For example make 32d certificates but it is acceptable to start renewing them after 24h. Because I don't really care how often my automation renews them. What matters is the time frame between being alerted due to renewal failure and expiry.
You might want to consider force-renewing all your certs a few days before your vacation. Then you can go away for over 40 days. (Unless something else breaks…)
You could also run a simple program that checks each site and tells you the remaining lifetime of the cert used, to verify that you didn’t miss any cert.
It all depends on the scale of your operations, of course.
Hopefully their software stack will be able to automate this by 2028.
“One quantitative benefit is that the maximum lifetime of certificates sets a bound on the size of certificate revocation lists. John Schanck has done heroic work on CRLite at Mozilla to compress CRLs, and the reduction from 398 days to 47 days further shrinks them by a factor of more than 8. For Let’s Encrypt the current limit is 90, so a more modest but still useful factor of 2.”
https://lobste.rs/s/r2bamx/decreasing_certificate_lifetimes_...
Until web browsers started to believe that no, that was too much of a convenience, so now long expiration certs became rejected. What's the proposed solution from the "industry"? to run a whole automation pipeline just to update a file in each example folder every few months? bonkers. These should be static examples, no reason to having to update those any earlier than every few years, at most.
One of the biggest issues is handling renewals at scale, and I hate it. Another increasingly frusturation is challenges via DNS are not quick.
It is already getting dangerously close to the duration of holiday freeze windows, compliance/audit enforced windows, etc.
Not to mention the undue bloat of CT logs.
How do those affect automated processes though? If the automation were to fail somehow during a freeze window, then surely that would be a case of fixing a system and thus not covered by the freeze window.
> Not to mention the undue bloat of CT logs.
I'm not sure what you mean by "CT logs", but I assume it's something to do with the certificate renewal automation. I can't see that you'd be creating GBs of logs that would be difficult to handle. Even a home-based selfhosted system would easily cope with certificate logs from running it hourly.
Reducing the lifetime of certificates increases the number of certificates that have to be issued, and therefore the number of certs that are logged to CT. This increases the cost to CT operators, which is unfortunate since the set of operators is currently very small.
However, a number of recent improvements (like static-ct-api and the upcoming Merkle Tree Certs) are making great strides in reducing the cost of operating a CT log, so we think that the ecosystem will be able to keep up with reductions in cert lifetime.
45 days ? So long ? Who needs so long living certificates ? A couple of miliseconds shall be enough. /s
I dont follow. Why? Why not an hour? A ssl failure is a very effective way to shut down a site.
"you should verify that your automation is compatible with certificates that have shorter validity periods.
To ensure your ACME client renews on time, we recommend using ACME Renewal Information (ARI). ARI is a feature we’ve introduced to help clients know when they need to renew their certificates. Consult your ACME client’s documentation on how to enable ARI, as it differs from client to client. If you are a client developer, check out this integration guide."
Oh that sounds wonderful. So every small site that took the LE bait needs expensive help to stay online.
Do they track and publish the sites they take down?
To your actual content, unless you did something weird and special snowflake like, everything will just keep working with this.
>So every small site that took the LE bait needs expensive help to stay online.
It's all automated. They don't need help to stay online.
2) Clock skew.
I agree with the terminology "bait", because the defaults advocated by letsencrypt are horrible. Look at this guide [0].
They strongly push you towards the HTTP-01 challenge which is the one that requires the most amount of infrastructure (http webserver + certbot) and is the hardest to setup. The best challenge type in that list is TLS-ALPN-01 which they dissuade you from! "This challenge is not suitable for most people."
And yet when you look at the ACME Client for JVM frameworks like Micronaut [1], the default is TLS and its the simplest to set up (no DNS access or external webserver). Crazy.
[0] https://letsencrypt.org/docs/challenge-types/
[1] https://micronaut-projects.github.io/micronaut-acme/5.5.0/gu...
You’re completely misinterpreting the linked document. See what it says at the start:
> Most of the time, this validation is handled automatically by your ACME client, but if you need to make some more complex configuration decisions, it’s useful to know more about them. If you’re unsure, go with your client’s defaults or with HTTP-01.
This is absolutely the correct advice. For Micronaut, this will guide you to using TLS-ALPN-01, which is better than HTTP-01 if the software supports it. But for a user who doesn’t know what’s what, HTTP-01 is both the easiest and the most reliable, because, as they say, “It works with off-the-shelf web servers.” Typical web servers which don’t know about ACME themselves can be told “serve the contents of such-and-such a directory at /.well-known/acme-challenge/” which is enough to facilitate HTTP-01 through another client; but they don’t give you the TLS handshake control required to facilitate TLS-ALPN-01.
The goal is to move to short lived certs to make the fragile system of revocation lists and public certificate logs unnecessary.
This is a living organism with moving parts and a time limit - you update nginx with a change that breaks .well-known by accident, or upgrade to a new version of Ubuntu and suddenly some dependency isn't loading correctly, or that UUID generator you depended on to generate the name for the challenge doesn't get loaded, or certbot becomes obsolete because of some API change and you can't upgrade to the latest because the OS is older and you installed it from the package manager.
You eventually see it in your exception monitoring or when an ssl monitor detects the cert is about to expire. Then you have to drop that other urgent thing you needed to get done, come in and debug it, fix it, and re-issue all the certs at the rate limit allowed. That's assuming you have that monitoring - most sites probably don't.
If you detect that issue with 1/3 of the cert left, you will now have 15 days to figure that out instead of 30. If you can't finish it in time, or you don't learn about it in time, the site(s) hard fail on every web browser that visits and you've effectively got a full site outage until you repair it.
So you discover it's because of certbot not working with a new API change, and you can't upgrade with the package manager. Now you need to figure out how to compile it from source, but it doesn't like the python that is currently installed and now you need to install that from source, but that version of python breaks your python web app so you have to figure out how to migrate your app to that version of python before you can do that, and the programmer that can do that is on a week long whitewater rafting trip in Idaho.
Aside from all that, what happens if a hacker manages to wreck the let's encrypt infra so badly they need 2 weeks to get it back online? The internet archive was offline for weeks after a ddos attack. The cloudflare outage took one site of mine down for less than 10 minutes, it's not hard to imagine a much worse outage for the web here.
Sometimes these kind of decisions seem to come from bodies that think the Internet exists solely for doing the thing they do.
Happens to me with the QA people at our org. They behave as if anything happens just for the purpose of having them measure it, creating a Heisenberg situation where their incessant narrow-minded meddling makes actually doing anything nearly imposible.
Consider the inevitable need for immediate renewal due to an incident. Would you rather have this renewal happen via a fast, automated and well-tested process, or a silently broken slow and manual one?
You knew exactly when it was going to fail and you could put it on your calendar to schedule the work, which consisted of an email validation process and running a command to issue the certificate request from your generated key.
The only moving part was the issued certificate, which you copied and pasted over and reloaded the server. There are a lot less things to go wrong on this process, which at one point I could do once every two years, than in a really complicated automated background task that has to happen within 15 days.
I love short duration automated free certs, but I think we really need to have a conversation about how short we can make them before we make it so humans no longer have the time required to fix problems anymore.
There are other CAs that offer certs via ACME. For example, Google Trust Services.