Show HN: Nomadic – Minimize RAG Hallucinations with 1 Hyperparameter Experiment
94 points
11 days ago
| 12 comments
| HN
Hey HN! Mustafa, Lizzie, and Varun here from NomadicML (https://nomadicml.com). We’re excited to show you Nomadic (https://github.com/nomadic-ml/nomadic): a platform focused on parameter search to continuously optimize AI systems.

Here’s a simple demo notebook where you get the best-performing, statistically significant configurations for your RAG — and improve hallucination metrics by 4X in just 5 minutes — with a single Nomadic experiment: https://tinyurl.com/4xmaryyw

Our lightweight library is now live on PyPI (`pip install nomadic`). Try one of the README examples :) Input your model, define an evaluation metric, specify the dataset, and choose which parameters to test.

Nomadic emerged from our frustration with existing HPO (hyperparameter optimization) solutions. We heard over and over that for the sake of deploying fast, folks resort to setting HPs through a single, expensive grid search or better yet, intuition-based “vibes”. From fine-tuning to inference, small tweaks to HPs can have a huge impact on performance.

We wanted a tool to make that “drunken wander” systematic, quick, and interpretable. So we started building Nomadic - our goal is to create the best parameter search platform out there for your ML systems to keep your hyperparameters, prompts, and all aspects of your AI system production-grade. We started aggregating top parameter search techniques from popular tools and research (Bayesian Optimizations, cost-frugal flavors).

Among us: Built Lyft’s driver earnings platform, automated Snowflake’s just-in-time compute resource allocation, became a finalist for the INFORMS Wagner Prize (top prize in industrial optimization), and developed a fintech fraud screening system for half a million consumers. You might say we love optimization.

If you’re building AI agents / applications across LLM safety, fintech, support, or especially compound AI systems (multiple components > monolithic models), and want to deeply understand your ML system’s best levers to boost performance as it scales - get in touch.

Nomadic is being actively developed. Up next: Supporting text-to-SQL pipelines (TAG) and a Workspace UI (preview it at https://demo.nomadicml.com). We’re eager to hear honest feedback, likes, dislikes, feature requests, you name it. If you’re also a optimization junkie, we’d love for you to join our community here https://discord.gg/PF869aGM

add-sub-mul-div
10 days ago
[-]
Lots of grassroots interest in this from a flood of new accounts created in the last few hours.

========

baileyw6 2 hours ago [flagged] [dead] | prev | next [–] excellent work!

r0sh 3 hours ago [flagged] [dead] | prev | next [–] cracked team!

mlw14 3 hours ago [flagged] [dead] | prev | next [–] Interesting library, is it like unit testing for RAGs? Can't wait to try it out!

lncheine 2 hours ago [flagged] [dead] | prev | next [–] Interesting library, can't wait to try it out!

Linda_ll 2 hours ago [flagged] [dead] | prev | next [–] Congrats on the launch! Excited for what’s to come :)

bmountain17 3 hours ago [flagged] [dead] | prev | next [–] Great new platform to boast AI performance, can't wait to try the Python library!

jjBailey 1 hour ago [flagged] [dead] | prev | next [–] Cool library, I’ll test it out

sidkapoor39 3 hours ago [flagged] [dead] | prev | next [–] Congrats on the launch! Excited to see how this streamlines Hyperparameter optimization. Keep up the great work!

brucetry 1 hour ago [flagged] [dead] | prev | next [–] Ver interesting, similar to unit test for RAGs? Love to try it out

jjBailey 1 hour ago [flagged] [dead] | prev | next [–] Very interesting library!! Can’t wait to try it!

luxxxxx 1 hour ago [flagged] [dead] | prev | next [–] Interesting library! Is it like unit testing for RAGs? Can’t wait to try it out!

kangjl888 2 hours ago [flagged] [dead] | prev | next [–] Huge congratulations to the NomadicML team on the launch of Nomadic! The platform looks like a game-changer for optimizing AI systems, excited to see how it transforms hyperparameter search for the community.

nishsinha2345 21 minutes ago [flagged] [dead] | prev | next [–] Excited to try out this library! would this help make unit testing easier? Or be used instead of unit testing?

greysongy5 19 minutes ago [flagged] [dead] | prev | next [–] Wow, this seems like it would really help automated RAG testing. What are the top use cases today?

sidvijay10 5 minutes ago [flagged] [dead] | prev [–] We're looking for a RAG testing framework for searching UGC. So far we've just been running evals manually w/o a library. Will try out Nomadic and see if it's more convenient.

reply
mutant
7 days ago
[-]
Llm answer bros. I was wondering when we'd start seeing this more.
reply
_eric_z_lin
10 days ago
[-]
This looks like a really useful tool for keeping AI systems optimized, especially as models and data evolve over time. I'm curious, have you considered how Nomadic might integrate into CI/CD pipelines? It seems like it could be valuable for automatically re-tuning parameters and ensuring performance doesn't degrade with new model versions or data updates. Any plans for features that would support this kind of continuous optimization workflow?
reply
simbasdad
10 days ago
[-]
Thank you so much. Yes, we believe CI/CD pipelines are a treasure trove of data for continuous ML system optimization (these are non-deterministic systems run repeatedly with new evaluation results at each run), where you get to learn about your own ML systems. Nomadic integrates well here to continuously collect data, that it can then use to better identify optimal HP configs on the same systems. We envision this as: every time you run CI/CD pipeline, you get more data with which you can learn about your ML system better, and Nomadic is your engine to realize this.
reply
elizabethhu
10 days ago
[-]
to add on-- if you're more interested in real-time optimization (where the best configs are automatically set and iterated on in your system), Nomadic can integrate directly at the application level within your production code. You can then make queries like nomadic.get_optimal_value(experiment_id="...", default=...) to fetch the most recent and optimal hyperparameters for your system. This approach lets you continuously refine and set the best versions of your production system using both your CI/CD pipeline and historical production data
reply
mustafabal
10 days ago
[-]
Hi all, we've received a few questions offline on what type of developers can benefit the most from Nomadic's offerings. We believe the Nomadic SDK & Workspace can benefit a wide range of developers:

1. Solo ML practitioners looking to streamline their workflows, 2. MLEs in small to mid-size companies wanting FAANG-level capabilities, 3. Data science teams aiming to productionize models more efficiently, 4. Startups needing to quickly deploy ML features without a large engineering team.

Our goal is to provide tools that let any team serve high-quality ML features, regardless of size or resources. We're trying to bridge the gap between cutting-edge ML research and optimized, deployable solutions.

If you want to dig deeper, please peruse our Nomadic Docs (https://docs.nomadicml.com), Workspace Demo (https://demo.nomadicml.com), and contact at info@nomadicml.com.

reply
elizabethhu
10 days ago
[-]
Hey HN! I'm Lizzie, one of the cofounders of NomadicML - excited to get your thoughts on our demo and repo.

We started working on Nomadic because we saw people wanted to ship out powerful and reliable systems but very often didn't have a map of it:

Which embedding model works best for my RAG? What temperature to set? What threshold for similarity search?

We wanted a tool to make the decision process of answering these types of questions systematic and affordable instead of resorting to intuition or something like a single expensive grid search, then set it and forget it... give us your most honest feedback!

reply
bonnet_clement
9 days ago
[-]
This is a really cool library built in such a short time! I'm very excited to try it out! Small feature suggestion: I could be wrong, but having the standard deviation or some statistical significance alongside the score (whether it's retrieval, inference, or overall) would strengthen the decision-making around parameter optimization. Easier to know the confidence around a hyper-parameter choice. Great work!!
reply
Jadiker
10 days ago
[-]
It looks like the hallucination score is somewhat related to perplexity in the sense that it relies on specific tokens. This could cause issues because rephrasing or using slightly different terms could lead to a higher hallucination score. E.g. if the correct answer is "John Smith is the world's best baker" then "Mary Kay is the world's best baker" would have a better score (lower hallucination) than "Leading maker of baked items across all the continents: John Smith" according to your metric.

Are there any plans to make updates to this score or add in different metrics for more accurately detecting hallucinations that don't penalize rephrasing?

reply
varunkrishnan17
10 days ago
[-]
Thanks for the well-thought out question Jadiker!

This is a potential limitation of N-gram precision with context matching, which we were using in the RAG demo for simplicity (though even with this, I don't think it would be so extreme :-) )

We already offer two other different hallucination detection approaches which should mitigate this problem - an LLM-as-a-judge model for evaluation, and semantic similarity matching. We've also considered, for example, using metrics such as BertScore. Do you have other ideas? :-)

reply
piinuma
6 days ago
[-]
Interesting! This looks great. Would want to see it develop more
reply
wwang4768
9 days ago
[-]
Can't wait to try out the library and hear more about the use cases/updates!
reply
Jadiker
10 days ago
[-]
Looks interesting! How does it compare to things like Optuna or RayTune or Weights and Biases?
reply
varunkrishnan17
10 days ago
[-]
Really appreciate it Jadiker! We're obviously in a similar space, but we think we offer some strong differentiators: (1) Functionality that is more specific to your LLM use cases (for example, being able to easily kick off a RAG Retrieval / Inferencing Experiment). (2) Ability to easily customize and visualize your results - for example, through custom evaluators, and carefully curated heatmaps - both through our SDK and our managed service.
reply
altairmn
10 days ago
[-]
Our customers use our platform to build low-latency voice and video pipelines. They utilize RAG in voice bots to improve response accuracy.

Is it possible to programmatically interface with Nomadic’s hyperparameter search through an authenticated endpoint, with the ability to generate user-specific tokens for secure access?"

reply
mustafabal
10 days ago
[-]
Certainly!

The Nomadic SDK supports 1st-party integrations with various open & closed-source ML/LLM providers. These are done through authenticated endpoints for interfacing securely with your models. Also, as noted in the Custom Evaluation section of our docs (https://docs.nomadicml.com/features/evaluations), you can provide your custom objective_functions and detail your model access logic, which may include custom authentication & access rules. A sample of this is present in our "Basic RAG" cookbook (link: https://colab.research.google.com/drive/1rv2f-qxgoN_eVDFu6Um...).

When integrated with the upcoming Nomadic Workspace, you can obtain your Nomadic API key and sync your local Nomadic models, experiments & experiment results with our managed service. The demo of this model/experiment/experiment result visualizatio is live at https:demo.nomadicml.com, please check it out and let us know your thoughts!

reply
varunkrishnan17
11 days ago
[-]
Hi I'm Varun - one of the cofounders of Nomadic!

Been a pleasure to work with Mustafa and Lizzie on this! Hopefully you can solve a pain point I personally have had for so long - how can you easily verify that your model continues to perform well?

reply
rnvarma
11 days ago
[-]
For my company, we don't have complex chains, but generally are giving a large context and looking to get structured outputs. Curious how this could help with that? We don't currently use any eval frameworks.
reply
varunkrishnan17
11 days ago
[-]
That's a great use case of Nomadic! We support many Eval frameworks in the optimization, but one is a LLM-as-a-Judge model, where you can input custom weights based on your metrics of interest! Adhering to a proper structure could be one of them :-)
reply