It allows you to generate JSON or YAML from the coordinates you tack on the image. tack runs entirely in your browser, there is no server side component to it, so good in terms of privacy.
Hope this is helpful.
My first prompt was a Preview-annotated image where I drew the bounding boxes of where the lights would go with a green line and marked the power socket with a red dot.
I just tried the exact same prompt but using tack instead of marking with 'pen' on the image. It completed it with much better results, one-shot (instead of having to steer 3 times) and in 1/20th of the original time.
Token usage for the original approach: `Tokens ↑ 2.4m • ↓ 50.2k • 2.2m (cached) • 10.1k (reasoning)`
Token usage using coordinates instead of drawing: `Tokens ↑ 79.2k • ↓ 4.7k • 72.7k (cached) • 1.5k (reasoning)`
Not very surprising that a JSON of coordinates is more efficient than drawing a crude line on an image, but I couldn't be bothered because GitHub still charges per request instead of per token. If I'd had this 20 minutes earlier I WOULD have bothered, though. Nice work :)
Which means you correctly detected I'm on mobile where your app doesn't work, so instead of broken app you will now show me short video how your app works on desktop, right?!?