I'm also rather sceptical of things that "sanitise" HTML, both because there's a long history of them having holes, and because it's not immediately clear what that means, and what exactly is considered "safe".
[1] https://developer.mozilla.org/en-US/docs/Web/API/Element/set...
Good idea to ship that one first, when it's easier to implement and is going to be the unsafe fallback going forward.
The mythical refactor where all deprecated code is replaced with modern code. I'm not sure it has ever happened.
I don't have an alternative of course, adding new methods while keeping the old ones is the only way to edit an append-only standard like the web.
(Assuming transpilers have stopped outputting it, which I'm not confident about.)
For example, esbuild will emit var when targeting ESM, for performance and minification reasons. Because ESM has its own inherent scope barrier, this is fine, but it won't apply the same optimizations when targeting (e.g.) IIFE, because it's not fine in that context.
But I can see what you mean, even if then it would still be better for it to print the code that does what you want (uses a few Wh) than doing the actual transformation itself (prone to mistakes, injection attacks, and uses however many tokens your input data is)
Maybe the last 10 years saw so much more modern code than the last cumulative 40+ years of coding and so modern code is statistically more likely to be output? Or maybe they assign higher weights to more recent commits/sources during training? Not sure but it seems to be good at picking this up. And you can always feed the info into its context window until then
>> Maybe the last 10 years saw so much more modern code than the last cumulative 40+ years of coding and so modern code is statistically more likely to be output?
The rate of change has made defining "modern" even more difficult and the timeframe brief, plus all that new code is based on old code, so it's more like a leaning tower than some sort of solid foundation.
Don't even try to allow inline <svg> from untrusted sources! (and then you still must sanitise any svg files you host)
But I agree, my default approach has usually been to only use innerText if it has untrusted content:
So if their demo is this:
container.SetHTML(`<h1>Hello, {name}</h1>`);
Mine would be: let greetingHeader = container.CreateElement("h1");
greetingHeader.innerText = `Hello, {name}`;Content-Security-Policy: require-trusted-types-for 'script'
…then it blocks you from passing regular strings to the methods that don't sanitize.
Preventing one bug class (script execution) is good, but this still allows arbitrary markup to the page (even <style> CSS rules) if I'm reading the docs correctly. You could give Paypal a fresh look for anyone who opens your profile page, if they use this. Who would ever want this?
If you mean to convey that it's possible to configure it to filter properly, let me introduce you to `textContent` which is older than Firefox (I'm struggling to find a date it's so old)
How would I set a header level using textContent?
document.createElement("h1").textContent = `Hello, ${username}!`
If you allow <h1> in the setHTML configuration or use the default, users with the tag in their username also always get it rendered as markup .setHTML("<h1>Hello</h1>", new Sanitizer({}))
will strip all elements out. That's not too difficult.Plus this is defense-in-depth. Backends will still need to sanitize usernames on some standard anyhow (there's not a lot of systems out there that should take arbitrary Unicode input as usernames), and backends SHOULD (in the RFC sense [1]) still HTML-escape anything they output that they don't want to be raw HTML.
If that's true, seems like it's still a security risk given what you can do with CSS these days: https://news.ycombinator.com/item?id=47132102
Or I guess you could completely restyle and change the text of UI elements so it looks like the user is doing one thing when they're actually doing something completely different like sending you money
The main case I can think of is wanting some forum functionality. Perhaps you want to allow your users to be able to write in markdown. This would provide an extra layer of protection as you could take the HTML generated from the markdown and further lock it down to only an allowed set of elements like `h1`. Just in case someone tried some of the markdown escape hatches that you didn't expect.
I think this might be the answer. There's no point to it by itself (either you separate data and code or you don't and let the user do anything to your page), but if you're already using a sanitiser and you can't use `textContent` because (such as with Markdown) there'll be HTML tags in the output, then this could be extra hardening. Thanks!
How exactly, given that setHTML sanitizes the input? If you don't want to have any HTML tags allowed, seems you can configure that already? https://wicg.github.io/sanitizer-api/#built-in-safe-default-...
The article says that the output is:
<h1>Hello my name is</h1>
So it keeps (non-script) html tags (and presumably also attributes) in the input. Idk how you're asking "how" since it's the default behaviorStripping HTML tags completely has always been possible with the drop-in replacement `textContent`. Making a custom configuration object for that is much more roundabout
I can see how it's a way of allowing some tags like bold and italic without needing a library or some custom parser, but I didn't understand what the point of this default could be and so why it exists (a sibling comment proposed a plausible answer: hardening on top of another solution)
> Yes, because that's the default configuration, if you don't want that, stop using the default configuration?
"don't use it if it's not what you want" is perhaps the silliest possible answer to the question "what's the use-case for this"
Maybe you meant .innerHTML? .innerText AFAIK doesn't try to parse HTML (why would it?), but I don't understand what you mean with nonstandard, both .innerHTML and .innerText are part of the standards, and I think they've been for a long time.
> but I didn't understand what the point of this default could be and so why it exists (a sibling comment proposed a plausible answer: hardening on top of another solution) [...] the question "what's the use-case for this"
I guess maybe third time could be the charm: it's for preventing XSS holes that are very common when people use .innerHTML
That information is in the question, so sadly no this still doesn't make sense to me because I don't understand any scenario in which this is what the developer wants. You always still need more code (to filter the right tags) or can just use textContent (separating data and code completely, imo the recommended solution)
> Maybe you meant .innerHTML? .innerText AFAIK doesn't try to parse HTML (why would it?)
No, I didn't mean that, yes it does, and no I don't know why it is this way. If you don't believe me and don't want to check it out for yourself, I'm not sure what more I can say to help the conversation
The default might be suitable for something like an internal blog where you want to allow people to sometimes go crazy with `<style>` tags etc, just not inject scripts, but I would expect it to almost always make sense to define a specific allowed tag and attribute list, as is usually done with the userland predecessors to this API.
Don't get me wrong, better than nothing, but also really really consider just using "setText" instead and never allow the user to add any sort of HTML too the document.
This new method they've cooked up would be called eval(code,options) if html was anything other than a markup language
https://stackoverflow.com/questions/78516750/parametrize-tab...