Show HN: BNNR – a closed-loop pipeline for improving vision models
1 points
1 hour ago
| 0 comments
| HN
Hi HN,

we’ve been working on computer vision models for a while, and one thing kept coming up: improving them is surprisingly unstructured.

You train a model, try a few augmentations, tweak some hyperparameters, run it again - and if the metric goes up, you keep it. But it’s often unclear why it improved, or whether the model is actually learning something better.

We kept running into questions like:

- Did this change actually help, or is it just noise?

- Is the model focusing on the right features?

- Are we improving generalization, or just overfitting differently?

So we built BNNR (Bulletproof Neural Network Recipe) - an open-source PyTorch toolkit that turns model improvement into a closed loop:

- Train a controlled baseline

- Explain what the model actually learned (via XAI methods like OptiCAM / GradCAM)

- Improve by testing candidate strategies in parallel

- Prove the result with structured comparisons

One thing we focused on is avoiding “blind” changes. Instead of committing to a single idea (e.g. an augmentation), BNNR evaluates multiple candidates and only keeps those that measurably improve a selected metric. The goal is to reduce manual trial-and-error, not replace control.

We also use explainability as part of the loop, not just for visualization.

For example, in one experiment a model classifying airplanes was mostly focusing on the sky background rather than the object itself. This kind of behavior is hard to spot from metrics alone. After applying a targeted modification based on the model’s attention, the focus shifted toward the airplane, and performance improved on held-out data.

Under the hood, some of the improvements are driven by XAI-based transformations:

- ICD (Intelligent Coarse Dropout) masks the most salient regions (what the model relies on too much), forcing it to learn from broader context

- AICD (Anti-ICD) does the opposite - it masks less relevant regions and keeps only what the model considers important

We don’t treat these as “magic augmentations”, but as ways to test hypotheses about what the model is actually using. BNNR works with its own augmentations as well as external libraries (e.g. Albumentations)

Each run is tracked and visualised in a live dashboard, where you can see:

- baseline vs improved metrics

- per-class performance

- attention maps before/after

- candidate branches being explored in parallel and which ones were selected or discarded

There is a trade-off: evaluating multiple candidates in parallel adds compute cost. In practice, we’ve found it comparable (or better) than manually running multiple experiments and tuning setups - but with much more structured results.

It’s still early, but currently supports:

- image classification

- multi-label classification

- object detection (just added today)

Would really appreciate feedback - especially from people experimenting with vision models or training pipelines.

You can try it here:

GitHub: <https://github.com/bnnr-team/bnnr>

Website: <https://www.bnnr.dev/>

Colab: <https://colab.research.google.com/github/bnnr-team/bnnr/blob...>

No one has commented on this post.