Here’s a walkthrough: https://www.youtube.com/watch?v=TbOfx6UPuX4.
ML teams waste too much time on generic heavy lifting. Every project follows the same pattern: 20% understanding objectives, 60% wrangling data and engineering features, 20% experimenting with models. Most of this is formulaic but burns months of engineering time. Throwing LLMs at it isn't the answer as that just trades engineering time for compute costs and worse accuracy. Plexe automates this repetitive 80%, so your team can work faster on what actually has value.
You describe your problem in plain English ("fraud detection model for transactions" or "product embedding model for search"), connect your data (Postgres, Snowflake, S3, direct upload, etc), and then Plexe: - Analyzes data and engineers features automatically - Runs experiments across multiple architectures (logistic regression to neural nets) - Generates comprehensive evaluation reports with error analysis, robustness testing, and prioritized recommendations to provide actionable guidance - Deploys the best model with monitoring and automatic retraining
We did a Show HN for our open-source library five months ago (https://news.ycombinator.com/item?id=43906346). Since then, we've launched our commercial platform with interactive refinement, production-grade model evaluations, retraining pipeline, data connectors, analytics dashboards, and deployment for online and batch inference.
We use a multi-agent architecture where specialized agents handle different pipeline stages. Each agent focuses on its domain: data analysis, feature engineering, model selection, deployment, and so on. The platform tracks all experiments and generates exportable Python code.
Our open-source core (https://github.com/plexe-ai/plexe, Apache 2.0) remains free for local development. For the paid product, our pricing is usage-based, with a minimum top up of $10. Enterprises can self-host the entire platform. You can sign up on https://console.plexe.ai. Use promo code `LAUNCHDAY20` to get $20 to try out the platform.
We’d love to hear your thoughts on the problem and feedback on the platform!
curl -X POST "XXX/infer" \ -H "Content-Type: application/json" \ -H "x-api-key: YOUR_API_KEY" \ -d '{}'
How do I know what the inputs/outputs are for one of my models? I see I could have set the response variable manually before training but I was hoping the auto-infer would work.
Separately it'd be ideal if when I ask for models that you seem to not be able to train (I asked for an embedding model as a test) the platform would tell me it couldn't do that instead of making me choose a dataset that isn't anything to do with what I asked for.
All in all, super cool space, I can't wait to see more!
I'm a former YC founder turned investor living in Dogpatch. I'd love to chat more if you're down!
1. Depending on your dataset the training could take from 45 mins to a few hours. We do need add an ETA on the build in the UI.
2. The input schema is inferred towards the end of the model building process, not right at the start. This is because the final schema depends on the decisions made regarding input features, model architecture etc during the building process. You should see the sample curl update soon, with actual input fields.
3. Great point about upfront rejecting builds for types of models we don't yet support. We'll be sure to add this soon!
We're in London at the moment, but we'd love to connect with you and/or meet in person next time we're in SF - drop us a note on LinkedIn or something :)
Few questions: 1. Can it work with tabular data, images, text and audio? 2. Data preprocessing code is deployed with the model? 3. Have you tested use cases when ML model was not needed? For example, you can simply go with average. I'm curious if agent can propose not to use ML in such case. 4. Do you have agent for model interpretation? 5. Are you using generic LLM or have your own LLM tuned on ML tasks?
1. Tabular data only, for now. Text/images also work if they're in a table, but unfortunately not unstructured text or folders of loose image files. Full support for images, video, audio etc coming sometime in the near future.
2. Input pre-processing is deployed in the model endpoint to ensure feature engineering is applied consistently across training and inference. Once a model is built, you can see the inference code in the UI and you'll notice the pre-processing code mirrors the feature engineering code. If you meant something like deploying scheduled batch jobs for feature processing, we don't support that yet, but it's in our plans!
3. The agent isn't explicitly instructed to "push back" on using ML, but it is instructed to develop a predictor that is as simple and lightweight as possible, including simple baseline heuristics (average, most popular class, etc). Whatever performs best on the test set is selected as the final predictor, and this could just be the baseline heuristic, if none of the models outperform it. I like the idea of explicitly pushing back on developing a model if the use case clearly doesn't call for it!
4. Yes, we have a model evaluator agent that runs an extensive battery of tests on the final model to understand things like robustness to missing data, feature importance, biases, etc. You can find all the info in the "Evaluations" tab of a built model. I'm guessing this is close to what you meant by "model interpretation"?
5. A mix of generic and fine-tuned, and we're actively experimenting with the best models to power each of the agents in the workflow. Unsurprisingly, our experience has been that Anthropic's models (Sonnet 4.5 and Haiku 4.5) are best at the "coding-heavy" tasks like writing a model's training code, while OpenAI's models seem to work better at more "analytical" tasks like reviewing results for logical correctness and writing concise data analysis scripts. Fine-tuning for our specific tasks is, however, an important part of our implementation strategy.
Hope this covers all your questions!
1. AutoML tools work on clean data. Data preparation requires an understanding of business context, the ability to reason on the data in that context, and then produce code for the required data transformations. Given that this process could not be automated with "templated" pipelines, teams using AutoML still have to do the hardest - and arguably most important - part of the data science job themselves.
2. AutoML tools use "templated" models for regression, classification, etc, which may not result in as good a "task-data-model fit" as the sort of purpose-written ML code a data scientist or ML engineer might produce.
3. AutoML tools still require a working understanding of data science technicalities. They automate the running of ML training experiments, but not the task of deciding what to do in the first place, or the task of understanding whether what was done actually fits the task.
With this in mind, we've seen that most ML teams don't find traditional AutoML tools useful (they only automate the "easy" part), while software teams don't find them accessible (data science knowledge is still required).
Plexe addresses both of these issues: the agents' reasoning capabilities enable it to work with messy data (as long as you provide business context), and to ENTIRELY abstract the deeper technicalities of building custom models fitting the task and the data. We believe this makes Plexe both useful to ML teams and accessible to non-ML teams.
Does this line up with your experience of AutoML tools?
Sounds very practical in real-world use cases. I trained a ML model couple months ago, I think it's a good case to test this product.
This also highlights the important role of the user as a (potentially non-technical) domain expert. Hope that makes sense!
P.S. Thanks for the feedback on the video! We'll update it to show the cleaning and labelling process :)
It would be more useful for the export to have an option (or by default) to include everything from the session.
p.s. kudos on the promo code that enable folks to kick the tires with as little friction as possible.
Caveat: as a more technical user, you can currently "hack" around this limitation by storing your images as byte arrays in a parquet file, in which case the platform can ingest your data and train a CV model for you. We haven't tested the performance extensively though, so your mileage may vary here.