Many Let's Encrypt renewals had errors today
112 points
1 hour ago
| 10 comments
| letsencrypt.status.io
| HN
jaas
54 minutes ago
[-]
Let's Encrypt has been working normally for most of the day. There was a ~90 minute period during which some of our users would have received a higher error rate due to upstream networking issues, but the majority of requests were successful even during that period.

It seems our status.io notes are being misinterpreted as much more severe than they were intended to reflect.

Edit: Note that this was written in response to a previous submission title implying that Let's Encrypt was entirely down most of the day.

reply
widdakay
37 minutes ago
[-]
I'm not sure if your higher error rate is sticky per user or something, but I've tried 10+ times throughout the day and have had 0 successes. They all come back as internal server error. That's why I eventually posted.
reply
jaas
29 minutes ago
[-]
It would not have been sticky for the entire day. If it was sticky at all, it would have been only during the 90 minute period I referenced. It's most likely that there is some other issue with how you're requesting the cert. Folks can help debug at: https://community.letsencrypt.org/
reply
widdakay
28 minutes ago
[-]
I ran the exact same command now and it's working, so it is possible I was unlucky and was hitting all the worst possible cases.
reply
sgt
25 minutes ago
[-]
Could it be that he was simply throttled while retrying? That seems plausible, and it would make it seem like a long outage.
reply
widdakay
24 minutes ago
[-]
I updated the post title to say (Fixed) now.
reply
jaas
21 minutes ago
[-]
Since Let's Encrypt wasn't down most of the day if would be helpful if you could update the title to reflect that.
reply
widdakay
13 minutes ago
[-]
I updated the title. Let me know if you think it's more accurate. It did appear as down for me though.
reply
jaas
12 minutes ago
[-]
Yeah, thanks
reply
widdakay
1 minute ago
[-]
I did not intend this to hit the top of the front page lol. I just posted it and then came back 15 minutes later to it having exploded.
reply
dlcarrier
1 hour ago
[-]
That explains why one of my IoT vendors is using an expired certificate.

I wish Firefox would just give a mild warning for a recently expired certificate, instead of treating it the same as a true man-in-the-middle attach. It's not like someone who couldn't factor the private key in 200 days could in 201 days or even 300 days.

I'm convinced that we'd have better security, if we didn't have so much security theater. You'd think TLS is useless, from the warning my phone gives if I connected to a public Wi-Fi AP, but then again there's nothing in TLS (or WPA) that prevents it from being used in a way that is completely useless: https://www.youtube.com/watch?v=M1si1y5lvkk

reply
jaas
53 minutes ago
[-]
> That explains why one of my IoT vendors is using an expired certificate.

I don't think so. There was a dip in success rates for 90 minutes today, but nobody should be renewing their certificate within 90 minutes of expiration. If you're at that point, something went wrong weeks ago.

reply
mannyv
22 minutes ago
[-]
"nobody should be renewing their certificate within 90 minutes of expiration"

You obviously haven't worked with hardware guys.

"I mean, what's the point of those last 30 days if you need to renew it 30 days before expiration? Why not just renew it before it expires? If I'm required to renew it 30 days before the expiration date then the expiration date is a lie, isn't it?"

reply
ozim
14 minutes ago
[-]
If they make 7 days grace period then expiration date will be a lie and of course every one will use grace period like it would be normal thing ;)
reply
LtWorf
31 minutes ago
[-]
> weeks ago

How long do you think a certificate lives?

reply
jaas
24 minutes ago
[-]
Mostly 90 days, and we recommend renewing at 60 days for 90 day certs. That gives more than four weeks of leeway.

If you're one of the few early adopters of short-lived (6-day) certs you should renew at 3 days, giving you 3 days for a successful renewal. A 90 minute outage, even if it was a full outage, would not interfere with a successful renewal.

reply
bebop
18 minutes ago
[-]
90 days moving to 45 but you can and should renew earlier than that. Automating this process means that you should be request a new certificates roughly 60 days (or 30 soon) after the issuance of the previous certificate. That way you would have plenty of time to deal with renewal issues. The process for renewal should have back off and retries built in. This prevents a situation where a down time for the issuer means that your production environments are non-functional.
reply
Biganon
12 minutes ago
[-]
They work at letsencrypt, I'm pretty sure they know.
reply
dingaling
42 minutes ago
[-]
> I wish Firefox would just give a mild warning for a recently expired certificate

Nope, if the SSL industry continues to insist on increasingly short cert lifetimes then I want Firefox to give no quarter when a cert expires.

Play by their rules and fall by their rules too.

reply
mannyv
19 minutes ago
[-]
Certificate expiry is less severe than an untrusted issuer or a host mismatch.

The former is most likely an administrative error (ie: someone forgot to renew, or the auto-renew is failing). The latter is more likely to be an MTM attack.

I'm not sure how you would use an expired cert as an attack vector. By loading in an old cert into an expired domain so you could spoof older content?

reply
tgsovlerkhgsel
6 minutes ago
[-]
Revocation information may not be available for expired certificates. Not that it matters much because the last time I checked revocation didn't really work for non-expired certificates either, but I think that (+ the risk of people treating expired certificates as worthless and thus increasing the risk of exposure) is the main reason.

Also of course domains changing owners, but again... I don't think we have good monitoring for that during the current long lifetime, so maybe a grace period where a warning is shown but it's easier to click through would be a good idea. Perhaps combined with a requirement to keep revocation information (and keep revoking expired certificates) X days past expiry.

reply
mcpherrinm
10 minutes ago
[-]
If a key is breached, the certificate can be revoked, but that revocation goes away once the certificate is expired.

Expiry is a pretty fundamental part of the security model of certificates.

reply
MobiusHorizons
24 minutes ago
[-]
How does that help? Seems like mostly the end user suffers.
reply
bruce511
31 minutes ago
[-]
But it's only the extreme warning that alerts the website (usually via a customer complaining) that the cert hasn't been renewed. Having the lesser warning just kicks the can down the road.

The IoT should have updated the certs weeks in advance. If they haven't done it by day 0 then their process is broken and delaying the scary warning to say day +5 won't solve anything.

reply
tgsovlerkhgsel
4 minutes ago
[-]
A warning with a clear clickthrough button would work for alerting - the default TLS warnings are designed to be somewhat hard to bypass to make people think twice.
reply
fragmede
40 minutes ago
[-]
omg new tom7!
reply
Kesseki
1 hour ago
[-]
To be clear, “Degraded Performance” means just that, not “down.” Let’s Encrypt’s issuance is mostly working fine.
reply
saagarjha
1 hour ago
[-]
I see you are unfamiliar with status page-ese. “Degraded performance” is a term which means some form of “the entire datacenter is probably on fire”.
reply
Kesseki
1 hour ago
[-]
Although I only post here personally, I work for Let’s Encrypt.
reply
ofrzeta
35 minutes ago
[-]
It would be better to say this upfront. I am not blaming you in any way but this would prevent responses such as the parent's (hopefully).
reply
number6
1 hour ago
[-]
Thanks you for your work!
reply
dlcarrier
1 hour ago
[-]
Let them know that they're having an outage. If their monitors aren't telling them so, they might need to host them off-site.
reply
Kesseki
1 hour ago
[-]
Let's Encrypt is operating normally. If you're having trouble, please post the details on the community forum so that folks can help you out. There is external monitoring in place.
reply
AceJohnny2
45 minutes ago
[-]
A common confusion; this interpretation only applies to OVH.

ref: https://www.reuters.com/article/world/millions-of-websites-o...

reply
xarope
30 minutes ago
[-]
That would a Microsoft'ese, "Some regions are encountering issues" => "The entire world is down, but our status page is working"
reply
AceJohnny2
59 minutes ago
[-]
I thought it meant "electricity has ceased to be a physical phenomenon in the general vicinity of our servers"
reply
widdakay
1 hour ago
[-]
I have tried many times to renew my certs and have had 0 successes throughout today. It seems to be 100% degraded to me.
reply
Kesseki
1 hour ago
[-]
That’s unexpected. Please post details on the “Help” topic of the Let’s Encrypt community forum so that folks can take a look.
reply
gib444
1 hour ago
[-]
What % of requests succeeded vs failed? How many certificates were issued during the outage vs the average? That might actually clear things up
reply
saagarjha
1 hour ago
[-]
Seems not ideal for an entity who seems to be pushing for shorter expiration periods all the time
reply
xp84
59 minutes ago
[-]
I think it’s mostly Apple and maybe Google who have the hard-ons for the shortest expiries possible.
reply
fragmede
30 minutes ago
[-]
To be fair, if someone managed to steal a set of keys to Gmail.com and icloud.com, I would want them to expire as short a time as possible too.
reply
spragl
3 minutes ago
[-]
That is right, but one thing is not like the other. You have always been free to set expiry low on your own certificates, but that is not the same as enforcing it on everyones ceritificate.
reply
notrealyme123
28 minutes ago
[-]
I think revoking them would be better in such a case.
reply
Dylan16807
56 minutes ago
[-]
If it goes past 24 hours, that becomes a real worry.

If anyone is renewing certificates with less than a day remaining, that's an issue on their end far more than anything else.

reply
tonyhart7
1 hour ago
[-]
isn't this the other way around ??? because shorter expiration time resulting on more issuing cert and therefore make it more prone to downtime
reply
pibaker
1 hour ago
[-]
What are the viable alternatives to LE? And in case none exists, what does it take to build one?

Requirements: free, available to everyone, automation friendly, issues certificates that are actually considered trustworthy by other parties.

reply
treesknees
1 hour ago
[-]
ZeroSSL – free 90-day certs via ACME, also has a web UI for cert management

Google Trust Services – free ACME certs, requires a Google account for registration

SSL.com Free DV SSL – offers free 90-day certs through ACME

reply
polpo
53 minutes ago
[-]
I use acme.sh for certs on my personal server and was a little surprised when it started using ZeroSSL by default. Despite being more "corporate" I decided to roll with it and it's worked just fine.
reply
JumpCrisscross
25 minutes ago
[-]
Have the EU or Canada pushed to launch an analog of their own?

It seems a bit silly that a service that could be forced by EO to revoke foreign certificates is the backbone of so much of the internet.

reply
dlcarrier
56 minutes ago
[-]
This video explores a little on how certificate authorities were given their authority and a lot on how it can fail: https://www.youtube.com/watch?v=M1si1y5lvkk

It's a bit mathy, but if you can make it through that, I highly recommend watching the whole video, especially if you like dad jokes.

reply
evbogue
1 hour ago
[-]
Like peers could sign sites?
reply
otabdeveloper4
1 hour ago
[-]
> What are the viable alternatives to LE?

None. Big tech intentionally made Let's Encrypt a single point of giant failure.

> And in case none exists, what does it take to build one?

A new Internet and Web standards stack. The whole problem is self-imposed -- we could have published self-signed Ed25519 keys on the DNS instead, and the result would be more secure than whatever it is we have now.

reply
ardeaver
1 hour ago
[-]
I realize this is very much not the point, but the fact that the "Active Incident" banner is green is upsetting.
reply
Kesseki
1 hour ago
[-]
The banner's colour is based on the "Incident Status;" it's green because services are currently operational. It would be yellow or red if the impact were more severe.
reply
dlcarrier
1 hour ago
[-]
Their monitors don't seem to be detecting the outage. Sometimes they run directly on the server, and aren't able to detect routing or DNS problems.
reply
NewJazz
1 hour ago
[-]
We're operating normally, but with reduced redundancy. We continue to work with our upstream ISP to identify and resolve the issue.
reply
nubinetwork
1 hour ago
[-]
It's a good thing that acme clients try to renew early, rather than leaving it to the last minute...
reply
drsalt
1 hour ago
[-]
thats too bad
reply
hermeticlock
1 hour ago
[-]
:(
reply
tomalbrc
1 hour ago
[-]
The amount of misinformation on this site is astonishing. "Hacker News"..
reply
bruce511
20 minutes ago
[-]
You are getting down-voted for this, which I think is a bit unfair. (I expect I'll get the same.)

Although you don't expand your thesis, as a general feeling, I agree. But, to be fair, it has always been thus, and it has been this way in every forum ever.

I'm old enough to remember the irony in "I read about it on the internet so it must be true" statements, which have existed since the internet was News (NNTP) not web.

In truth, any time you get a random group of people together, of different ages and backgrounds, all of whom self-describe as "smart" you're going to get a lot of chaff mixed in with the wheat.

To some extent you need to simply ignore the nonsense. There's plenty of it and "correcting people who are wrong" is seldom received well.

reply