FilterHN

Exif Smuggling (2025)

105 points

by rolph

5 days ago

| past

| 8 comments

| github.com

| HN

▲

BoppreH

5 days ago

[-]

Oh, that's clever. It's not just hiding the payload in the Exif, it's hiding the fact that the payload came from the network at all, by reading it from the browser cache (presumably after embedding the image into a page the user visited).

So you have a package that doesn't include (directly) malicious code or make network calls, yet it can still run malicious code from the network. This is much better than simple obfuscation because you can vary the payload, like a command-and-control server.

▲

nine_k

5 days ago

[-]

More than that; the trigger code can sit passively and just check the cache for whatever payloads may come its way.

I suppose image sanitizers come soon to browsers. Only sanitized images will be cached; anything the browser can't make sense of will be thrown away.

▲

account42

4 days ago

[-]

Exif is only the most convenient method here - you can use steganography hide arbitrary data right in the image content itself. Sanitizing would that would mean messing with how images look.

▲

8n4vidtmkvmk

5 days ago

[-]

ComfyUI embeds workflows in the EXIF data. It's very handy. Would be a little sad if they stripped that out but there are alternatives. I suppose if it's only cached images and not manually downloaded images it wouldn't be bad. It'd probably break some website somehow though.

▲

Grom_PE

5 days ago

[-]

It isn't necessary to use Exif to embed arbitrary data inside an image. Could as well use PNG extra chunk, JFIF app marker, or simply append data to the end of the file.

It would be more interesting to devise a method that survives all extra data stripping and re-encoding, perhaps taking advantage of deterministic encoders, assuming they don't randomize pixel data on purpose.

In other words: turning the image data stream itself into a polyglot.

▲

Levitating

5 days ago

[-]

Do you mean steganography?

▲

algoth1

5 days ago

[-]

Isnt this the principle behind synthid?

▲

Grom_PE

5 days ago

[-]

Maybe if you look at it from far away enough.

Watermarking tries to resist image data manipulation. Smuggling data is concerned with preservation of bytes.

Though if we're executing arbitrary code on the target anyway, ways of embedding data in an image are vast, including watermarking/steganography.

▲

nine_k

5 days ago

[-]

Steganography has rather obvious size limits if you want the image continue looking innocent. EXIF data is way less limited.

▲

Gigachad

5 days ago

[-]

More generally it’s called Steganography.

▲

Omni5cience

5 days ago

[-]

Why is this a link to a random fork that has no commits, rather than the original?

▲

dfedbeef

5 days ago

[-]

GitHub star farming, I'm guessing?

▲

_def

5 days ago

[-]

Many many years ago I saw someone using an image hoster which only checked mime type, and not filename. That's the important bit after all right? Uploading an image as image.php worked, and if the exif comment contained php code, it ran.

▲

qingcharles

4 days ago

[-]

More than that, you need to check the file is a valid image, not just the mime type. I remember a host that let me upload an aspx file as a jpg and it allowed me to execute it and browse their entire file system until I found the SQL Server and network administrator passwords in a text file.

The passwords were both "internet".

▲

pwdisswordfishs

21 hours ago

[-]

For server implementations that aren't braindead, that is indeed the important bit. Computers don't inherently know how to run PHP. If the request handler doesn't look at the file extension to decide whether or not to pass the contents to the PHP interpreter (if PHP is even installed on the system), then image.php isn't going to run any PHP.

▲

motohagiography

5 days ago

[-]

is this within the category of normal steganographic encodings and packers, or does it have the ability to execute itself? you can encode anything as anything. I am interpreting it's a slightly interesting tool to fool signature based detection, but isn't something like running a weird machine in an external decoder.

▲

ale42

5 days ago

[-]

Weren't similar techniques already used years ago by malvertizers to hide malicious code into images published for ads so it wouldn't be detected? (although it might have been more like steganography)

▲

saghm

5 days ago

[-]

I'm not sure if this is exactly what you're referring to, but apparently years ago there were exploits bundling JAR files into GIFs to sneakily have them executed by the Java browser plugin: https://en.wikipedia.org/wiki/Polyglot_(computing)#GIFAR_att...

▲

mpeg

5 days ago

[-]

Back in the day I wrote a PoC exploit for my employers app that abused an image upload api by embedding a jar file inside an svg as XXE which then got me RCE. Fun times.

▲

rolph

5 days ago

[-]

if anything i would use EXIF data to enhance stego.

generally its the JPEG standard that allows the payload, manipulation by abusing EXIF is how you operate the exploit.

there is a 64k file segment specified for JPEG, and you can abuse it to hold any "data" you want, as well as extending to other segments, for more storage.

the raw steganography in most primative form is a comparison of two photos, one of which is pixelshifted to encode the data.

in advanced form, the pixels hold the encrypted data, but the application segments of the JPEG hold keys and or matrix values, and you need a reference image. you can move fairly large volumes of ASCII representation like this before its noticed

you basicly write a webpage that local caches the payload and keys, then abuses EXIF to build and execute an exploit on the target.

▲

firefax

5 days ago

[-]

this is a variation on a common theme in steganography, but still interesting and giving something a name can be a useful contribution in itself

▲

porphyra

5 days ago

[-]

Mildly annoying how almost everything strips out EXIF data nowadays, in part due to security concerns like this, and then I can't find out what camera, lens, and settings were used to take photos.

▲

AndrewStephens

5 days ago

[-]

My static site generator strips out exif data from images and I would expect all sensible sites would do the same. There is a lot of personal information jammed in there - if you post a picture of your dog making a funny face to social media you don’t want the exact GPS coordinates of your house plastered over the internet.

You have to be selective though, some of the EXIF data specifies things like color spaces and orientation that is used by browsers for displaying the image properly.

▲

dllu

5 days ago

[-]

For my personal website I have a lot of photography-oriented blog posts [1], but I have special code to strip out GPS info from the location if it's close to my home [2].

EDIT: my vibe-coding slop agent put my home GPS lat long in the example config in the README lol. Please don't rob my house; I'll go run git-filter-repo later.

[1] https://daniel.lawrence.lu/blog/2023-12-20-trip-to-europe/

[2] https://github.com/dllu/pupphoto/blob/main/gps.py#L81

▲

booi

5 days ago

[-]

Does this inadvertently reveal the location of your home? It's like a cloud of photos except in this one circle.

▲

dllu

5 days ago

[-]

Yeah, if I take a dense grid of photos near my house, it would reveal a 500 m circle. But in practice I don't take _that_ many photos in the neighborhood. Also, the circle isn't perfectly centered on my home.

▲

mkoryak

5 days ago

[-]

I hid my toy vibe coded site's code inside the alpha channel of its logo. https://dogself.com

I probably should have minified it too...