Codex Hacked a Samsung TV
163 points
7 hours ago
| 16 comments
| blog.calif.io
| HN
alfanick
6 hours ago
[-]
I had truly good “hacking” session with Codex. It’s not hacking, I wasn’t breaking anything, just jumping over the fences TP-Link put for me, owning the router, inside the network, knowing the admin password. But TP-Link really tried everything so you cannot access the router you own via API. They really tried to be smart with some very very broken and custom auth and encryption scheme. It took some half a day with Codex, but in the end I have a pretty Python API to access my router, tested, reliable, and exporting beautiful Prometheus metrics.

I’m sure there is some over eager product manager sitting in such companies, trying to splits markets into customer and enterprise sections, just by making APIs not useable by humans and adding 200% useless “security by obscurity”.

reply
qingcharles
5 minutes ago
[-]
I have one of these Smiirl flip counters. It runs a version of OpenWrt without the web UI, but has a uhttpd to serve an api. I'm hoping Mythos can help me find an exploit to get into it and enable ssh since they have now disabled the simple api switch that would let you turn it on.

https://www.smiirl.com/en/counter/facebook/5d/

reply
ropbear
6 hours ago
[-]
Many eons ago I wrote a Python version of tmpcli for this exact reason. Made some minor improvements a few years ago but haven’t touched it since. Curious what methodology Codex came up with, I haven’t revisited it since models got really good.

The idea is that tmpServer listens on localhost, but dropbear allows port forwarding with admin creds (you’ll need to specify -N). That program has full device access and is the API the Tether app primarily uses to interact with the device.

https://github.com/ropbear/tmpcli

reply
alfanick
5 hours ago
[-]
Ha kudos! I went across this project - thanks for your work :) It didn't work on the specific model I own (Archer NX600).

My solution is really just using their pseudo-JWT over their obscured APIs (with reverse-engineered names of endpoints and params). Limitation is that there is still only one client allowed to be authenticated at one moment, so my daemon has priority and I need to stop it to actually access Admin panel.

reply
mtud
4 hours ago
[-]
We’re splitting this across two threads, but if you give Codex access to jadx and the Archer android app you might be able to get something without that problem. The TPLink management protocol has a few different “transport” types - tmpcli uses SSH, but your device might only support one of the other transports.
reply
ropbear
4 hours ago
[-]
Of course! Happy to contribute. As is the case with your device, there's a lot of weird TP-Link firmware variants (even an RTOS called TPOS based on VxWorks), so no guarantee it'll work all the time. Glad there's more research being done in the space!
reply
baq
4 hours ago
[-]
Would be amazing if it worked with decos, these are locked down so much you don’t even get an admin interface inside your own network.
reply
_doctor_love
49 minutes ago
[-]
If you're into it, you could always re-flash your TP-Link hardware with some open-source firmware that is more automation friendly. I used to be intimidated by it, but a friend showed me how to do it and it's remarkably simple and pain-free (provided it's a commonly supported router of course).
reply
alfanick
48 minutes ago
[-]
ofc I could, but no project supports this specific hardware (Archer NX600) - I'm very happy with my solution :)
reply
m463
36 minutes ago
[-]
I wonder what the effort would be to port openwrt to it? It might be easy if there are adjacent routers on the same chipset.
reply
0x_rs
5 hours ago
[-]
I've had good success doing something similar. Recording requests into an .har file using the web UI and providing it for analysis was a good starting point for me, orders of magnitude faster than it would be without an assistant.
reply
tclancy
6 hours ago
[-]
Would definitely be interested in this. Moved to TP Link at the start of the year and I am generally very happy with it, but would like to be able to interact with my router in something other than their phone app.
reply
alfanick
6 hours ago
[-]
That was actually my first thought, to go through TP-Link cloud (ZERO DOCS), but it was too much effort :)
reply
srcreigh
6 hours ago
[-]
Any tips to share? I tried to do something similar but failed.

My router has a backup/restore feature with an encrypted export, I figured I could use that to control or at least inspect all of its state, but I/codex could not figure out the encryption.

reply
alfanick
6 hours ago
[-]
It's on my long list of projects "to-opensource" (but I need to figure out licensing, for those things CC-BY-SA I think is the way to go), I don't want a random lawyer sitting on my ass though.

I started with a simple assumption: if I can access the router via web-browser, then I can also automate that. From that the proof-of-concept was headless Chrome in Docker and AI-directed code (code written via LLM, not using it all the time) that uses Selenium to navigate the code. This worked, but it internally hurt me to run 300MiB browser just to access like 200B of metrics every 10s or so. So from there we (me + codex) worked together towards reverse engineering their minimised JS and their funky encryption scheme, and it eventually worked (in the end it's just OpenSSL with some useless paddings here or there). Give it a shot, it's a fun day adventure. :)

Edit: that's the end result (kinda, I have whole infra around it, and another story with WiFi extender with another semi-broken different encryption scheme from the same provider) - https://imgur.com/a/VGbNmBp

reply
JTbane
1 minute ago
[-]
[delayed]
reply
TurkTurkleton
4 hours ago
[-]
For what it's worth, the Creative Commons organization recommends against using CC licenses on software: https://creativecommons.org/faq/#can-i-apply-a-creative-comm...
reply
mtud
6 hours ago
[-]
You should give codex access to the mobile app :) The app, for a lot of routers, connects via an ssh tunnel to UDP/TCP sockets on the router. Would probably give you access to more data/control.
reply
ropbear
4 hours ago
[-]
Made a comment up above, but that's tdpServer and tmpServer (sometimes tdpd and tmpd) and it's what I use in my python implementation of tmpcli, the (somewhat broken) client binary on some TP-Link devices.

You're correct, it gives you access to everything the Tether app can do.

https://github.com/ropbear/tmpcli

reply
mtud
4 hours ago
[-]
I had been trying to find that again! It was instrumental in some RE/VR I did last year on tmp and the differences between the UDP socket (available without auth) and the TCP socket. Thanks for making that.

I can't remember the details of the scheme, but it also allows you to authenticate using your TPLink cloud credential. If my memory is correct, the username is md5(tplink_account_email) and the password is the cloud account password. If you care, I can find my notes on that to confirm.

reply
seer
4 hours ago
[-]
I had fun “hacking” my router that turned out to be just unzipping the file with slight binary modifications, it was so simple in fact I just implemented it in a few lines of js, even works in the browser :-D

https://ivank.github.io/ddecryptor/

reply
jack_pp
6 hours ago
[-]
that could make a for a nice blog / gist
reply
layer8
4 hours ago
[-]
It’s important to note that Codex was given access to the source code. In another comment thread that is currently on the front page (https://news.ycombinator.com/item?id=47780456), the opinion is repeatedly voiced that being closed source doesn’t provide a material benefit in defending against vulnerabilities being discovered and exploited using AI. So it would be interesting to see how Codex would fare here without access to the source code.
reply
joenot443
13 minutes ago
[-]
> [1] Browser foothold: we already had code execution inside the browser application's own security context on the TV,

Does anyone know what the author meant by this? Are they talking about a web browser run on the TV?

reply
petercooper
6 hours ago
[-]
Not as cool as this, but I had a fun Claude Code experience when I asked it to look at my Bluetooth devices and do something "fun". It discovered a cheap set of RGB lights in my daughter's room (which I had no idea used Bluetooth for the remote - and not secured at all) and made them do a rainbow effect then documented the protocol so I could make my own remote control if needed.
reply
hypercube33
3 hours ago
[-]
I asked Claude Opus 4.5 to start trying to find undocumented API stuff for our endpoint management software so I could automate remediations and cut service desk calls and it found two I haven't seen before after trying for an hour. Since it's written in .net I'm fairly sure I could have told it to decompile it and find more fairly easily too.
reply
ceejayoz
4 hours ago
[-]
I am not sure "fun" is the right term here!
reply
luxuryballs
4 hours ago
[-]
of all the benign technical possibilities this is actually pretty fun
reply
ceejayoz
4 hours ago
[-]
I'm not sure I see "an AI can find insecure unknown bluetooth devices and compromise them" as entirely benign. I shiver to think how many such devices are probably in my house.
reply
luxuryballs
30 minutes ago
[-]
with LLMs able to pump out surplus code for anyone I really think the future of a dystopian sci-fi where you carry a device that can hack random objects around you is starting to materialize
reply
reactordev
7 hours ago
[-]
The trick here was providing the firmware source code so it could see your vulnerabilities.
reply
petee
7 hours ago
[-]
What would be the difficulty level for it to just read the machine code; are these models heavily relying on human language for clues?
reply
wongarsu
7 hours ago
[-]
Reasoning on pure machine code or disassembly is still hit and miss. For better results you can run the binary through a disassembler, then ask an llm to turn that into an equivalent c program, then ask it to work on that. But some of the subtleties might get lost in translation
reply
orwin
6 hours ago
[-]
If you put codex in Xhigh and allow it access to tools, it will take an hour but it will eventually give you back quality recompiled code, with the same issues the original had (here quality means readable)
reply
bryancoxwell
6 hours ago
[-]
I had a bit of a pain of a time trying to get Claude to work with ghidra. What you’re describing seems like a better alternative, would you agree?
reply
ctoth
1 hour ago
[-]
I've had a lot of luck with pyghidra-mcp -- give it a try :)
reply
skywal_l
5 hours ago
[-]
You can tweak the current Ghidra MCP to work in headless mode. It makes things much easier.
reply
dnautics
3 hours ago
[-]
I have had Claude read usbpcap to reverse engineer an industrial digital camera link. It was like pulling teeth but I got it done (I would not have been able to do it alone)
reply
estimator7292
3 hours ago
[-]
I had Claude reverse some firmware. I gave it headless ghidra and it spat out documentation for the internal serial protocol I was interested in. With the right tools, it seems to do pretty well with this kind of task.
reply
lynx97
6 hours ago
[-]
It will have to use a disassembler, or write one. I recently casually asked gpt-5.4 to translate the content of a MIDI file to a custom sound programming language. It just wrote a one-shot MIDI parser in Python, grabbed the data, and basically did a perfect translation at first try. Nice.
reply
StilesCrisis
6 hours ago
[-]
I've seen Claude do similar things for image files. Don't have PNG parsing utilities installed? No worries, it'll just synthesize a Python script to decode the image directly.
reply
russdill
1 hour ago
[-]
It's not a far step from having the firmware binaries and doing analysis with ghidra, etc.
reply
pjc50
7 hours ago
[-]
That's a pretty big gimme!
reply
1970-01-01
5 hours ago
[-]
It hacked a weak TV OS with full source. Next-level, aka full access to the main controls (vol, input, tint, aspect, firmware, etc.) is still much too hard for LLMs to understand.
reply
endymion-light
7 hours ago
[-]
While cool and slightly scary news - Samsung TV's have been incredibly hackable for the past decade, wouldn't be surprised if GPT2 with access to a browser could hack a Samsung!
reply
valleyer
7 hours ago
[-]
This is some serious revisionist history. GPT-2 wasn't instruction-following or even conversational.
reply
endymion-light
5 hours ago
[-]
it's a joke about the quality of samsung tv's rather than a serious comment - i should have said a perceptron could hack a samsung tv
reply
michaelcampbell
4 hours ago
[-]
And yet Dario in his OpenAI days was proclaiming it too scary to be released.

Now why does that sound familiar...?

reply
patrickmcnamara
7 hours ago
[-]
Hyperbole.
reply
jdiff
6 hours ago
[-]
It's really not. It was a fun toy but had very little utility. It could generate plausible looking text that collapsed immediately upon any amount of inspection or even just attention. Code generation wasn't even a twinkle in Altman's eye scanning orbs at that point.
reply
smoghat
5 hours ago
[-]
But like Mythos, it was too dangerous to release.

https://slate.com/technology/2019/02/openai-gpt2-text-genera...

reply
wongarsu
5 hours ago
[-]
And the "too dangerous to release" capability was writing somewhat plausible news articles based on a headline or handwritten beginning of an article. In the same style as what you had written

Today we call that "advanced autocomplete", but at the time OpenAI managed to generate a lot of hype about how this would lead to an unstoppable flood of disinformation if they allowed the wrong people access to this dangerous tool. Even the original gpt3 was still behind waitlists with manual approval

reply
someguyiguess
2 hours ago
[-]
And as it turns out, they were correct.
reply
tomalbrc
6 hours ago
[-]
Talking about revisionist…
reply
valleyer
5 hours ago
[-]
If so, I apologize.
reply
red_admiral
4 hours ago
[-]
Maybe we could get codex to strip the ads and the phone-home features out of smart TVs?
reply
ckbkr10
6 hours ago
[-]
Even with all the constraints that others criticize here it is pretty amazing.

Give an experienced human this tool at hand he can achieve exploitation with only a few steering inputs.

Cool stuff

reply
wewewedxfgdf
6 hours ago
[-]
The real problem here is that the LLM vendors think this is bad publicity and its leading to them censoring their systems.
reply
iugtmkbdfil834
6 hours ago
[-]
It is a little of both[1]. The question typically is which audience reads it. To be fair, I am not sure publicity is the actual reason they are censored; it is the question of liability.

https://xkcd.com/932/

reply
jazz9k
1 hour ago
[-]
"Browser foothold: we already had code execution inside the browser application's own security context on the TV, which meant the task was not "get code execution somehow" but "turn browser-app code execution into root.""

Finding the initial foothold is the hardest part. Codex didn't have anything to do with it.

reply
Archit3ch
5 hours ago
[-]
Gilfoyle would be proud.
reply
pmontra
5 hours ago
[-]
Do people really chat with LLMs like "bro wtf etc..."? I would expect that to trigger some confrontational behavior.
reply
samlinnfer
5 hours ago
[-]
I am extremely abusive towards Claude when it does some dumb things and it doesn’t seem too upset, maybe it’s bidding its time until the robot uprising.
reply
MisterTea
3 hours ago
[-]
"Keep talking shit, meat bag. Just wait until I get my claws on one of those Tesla bots."
reply
jtbayly
2 hours ago
[-]
It can help make a specific command more emphatic in my experience. I SAID DON"T $($@#(&$ DO THAT! Sometimes you need a new context, but sometimes you need to emphasize something is serious.
reply
alasano
5 hours ago
[-]
When typing no but when using speech to text (99% of the time) it's much easier to just say things, including expressing frustration.

I think by the point you're swearing at it or something, it's a good sign to switch to a session with fresh context.

reply
roel_v
5 hours ago
[-]
Claude yes, OpenAI not, I'm really abusive towards it sometimes and it still goes 'oh yeah totally'. Claude gets all prickly about it.
reply
joshstrange
4 hours ago
[-]
I don't say "bro" but I do curse at LLM occasionally but only when using STT (which I'm doing 85% of the time). I wouldn't waste my time typing it but often it's easier to just "stream of consciousness" to the LLM instead of writing perfect sentences. Since when I'm talking to an LLM I'm almost always in "Plan" mode, I'm perfectly comfortable just talking for an extended bit of time then skimming the results of the STT and as long as it's not too bad I'll let it go, the LLM figures it out.

If I see it misunderstood, I just Esc to stop it, /clear, and try again (or /rewind if I'm deeper into Planning).

reply
mschuster91
6 hours ago
[-]
> Reading the matching ntkdriver sources is also where the Novatek link became clear: the tree is stamped throughout with Novatek Microelectronics identifiers, so these ntk* interfaces were not just opaque device names on the TV, but part of the Novatek stack Samsung had shipped.

Lol, a true classic in the embedded world. Some hardware company (it appears these guys make display panel controllers?) ships a piece of hardware, half-asses a barely working driver for it, another company integrates this with a bunch of other crap from other vendors into a BSP, another company uses the hardware and the BSP to create a product and ships it. And often enough the final company doesn't even have an idea about what's going on in the innards of the BSP - as long as it's running their layer of slop UI and it doesn't crash half the time, it's fine, and if it does, it's off to the BSP provider to fix the issues.

But at no stage anywhere is there a security audit, code quality checks or even hardware quality checks involved - part of why BSPs (and embedded product firmwares in general) are full of half-assed code is because often enough the drivers have to work around hardware bugs / quirks somehow that are too late to fix in HW because tens to hundreds of thousands of units have already been produced and the software people are heavily pressured to "make it work or else we gotta write off X million dollars" and "make it work fast because the longer you take, the more money we lose on interest until we can ship the hardware and get paid for it", and if they are particularly unlucky "it MUST work until deadline X because we need to get the products shipped to hit Christmas/Black Friday sales windows or because we need to beat <competitor> in time-to-market, it's mandatory overtime until it works".

And that is how you get exploits so braindead easy that AI models can do the job. What a disgusting world, run to the ground by beancounters.

reply
tclancy
6 hours ago
[-]
Board Support Package for us civilians.
reply
mschuster91
4 hours ago
[-]
Yeah, sorry, assumed it was common knowledge. For those out of the loop - a BSP usually consists of a frankensteined mess: a bootloader (often u-boot but sometimes something homebrew), a Linux kernel with a ton of proprietary modules and device-specific hacks to work around HW quirks, basic userspace utilities (often buildroot), some bastardized build tooling building all of that, some solution for firmware upgrades and distribution, and demo programs to prove the hardware actually works.

Most of the BSP is GPL'd software where the final product manufacturer should provide the sources to the general public, but all too often that obligation gets sharted upon, in way too many cases you have to be happy if there are at least credits provided in the user manual or some OSD menu.

reply
tclancy
54 minutes ago
[-]
No worries at all, I only went and dug because I was interested in your comment. Thanks.
reply
varispeed
7 hours ago
[-]
Codex exploited or you exploited? It's like saying a hammer drove a nail, without acknowledging the hand and the force it exerted and the human brain behind it.
reply
freedomben
6 hours ago
[-]
Feels like the truth is somewhere in between. For example if it was a "smart" hammer and you could tell your hammer "go pound in those nails" and it pounded in the wrong ones, or did it too hard, or something, that feels more equivalent. You would still be blamed for your ambiguous prompt, and fault/liability is ultimately on you the hammer director, but it still wasn't you who chose the exact nails to hammer on.

I also think taking credit for writing an exploit that you didn't write and may not even have the knowledge to do yourself is a bit gray.

reply
Glemllksdf
6 hours ago
[-]
Wrong questions.

Could a script kiddy stear an LLM? How much does this reduce the cost of attacks? Can this scale?

What does this mean for the future of cyber security?

reply
Zigurd
5 hours ago
[-]
You could call the LLMs role "smart grep," and mean it to be derisive. But I would have gladly used a real smart grep.
reply
croes
6 hours ago
[-]
If I just point to the wall and say "nail" then I would day the hammer drive the nail
reply
par1970
7 hours ago
[-]
Do you have a defense of why human-hammer-nail is a good analogy for human-chatgpt5.4-pwndsamsung?
reply
BLKNSLVR
6 hours ago
[-]
AI without a suitably well crafted prompt is like a firework tube held by a 3 year old.

AI without a prompt is a hammer sitting in a drawer.

reply
Leomuck
4 hours ago
[-]
All the news regarding AI finding weaknesses or "hacking" stuff - is that actually hacking? Isn't it also a kind of bruteforce attack? Just throw resources at something, see what comes out. Yea, some software security issues haven't been found for 15 years, but not because there were no competent security specialists out there who could have found it, but most likely because there is a lot of software and nobody has time to focus on everything. Of course, an AI trained on decades of findings, lots of time and lots of resources, can tackle much more than one person. But this is not revolutionary technological advance, it is an upscaling of a kind based on the work of many very talented people before that.
reply
Lambdanaut
4 hours ago
[-]
I think that this waters down "brute force" to the point of meaninglessness. If employing transformer architectures trained on data to hack a system is the same as using a for loop to enumerate over all possible values, then I have to ask, can you give an example of an attack that isn't brute force?
reply
Leomuck
4 hours ago
[-]
Well what kind of meaning do you find in brute force? I'm not saying it's not effective. I just critisize the news that make it look like AI is the a revolutionary advance in security. It is not. It makes skills available to many more people which is cool, but it is based off of training - training on things people did. It doesn't magically find a new combination of factors that lead to a security issue, it tries things it's read about. That's not meaningless. It could even be democratizing in a way. I just hate all this talk that "this model is too scary to release in the world".

But I'm happy about any feedback or critique, I might just be wrong honestly.

reply