LLM Workflows then Agents: Getting Started with Apache Airflow
128 points
2 days ago
| 9 comments
| github.com
| HN
jumski
1 day ago
[-]
Nice to see some workflow engine action on Hacker News! :-)

I'm currently building pgflow, which is a simple, postgres-first engine that uses task queues to perform real work.

Have explicit DAG approach, strong typesafety, nice DSL in TypeScript and a dedicated task queue worker that allows it to run solely on Supabase without any external tools.

I'm super close to the alpha release, if you guys want more info, check out the readme for SQL core (https://github.com/pgflow-dev/pgflow/tree/main/pkgs/core#rea...) or my Twitter (https://x.com/pgflow_dev).

Hope that grabs someone attention :-) Cheers

reply
CjHuber
1 day ago
[-]
Exactly what I was looking for without even knowing it :) EDIT: well I knew I need smt like this, but I thought I'd had to build a very rudimentary version myself. Thank you for saving me tons of time in my project
reply
jumski
1 day ago
[-]
Thanks! That's the reason I'm building it - I needed something like this but there was nothing abailable.

I'm very close to releasing an alpha, will post here when ready!

reply
mushufasa
1 day ago
[-]
this is really cool!

That said, my impression is that Airflow is a really dated choice for a greenfield project. There isn't a clear successor though. I looked into this recently, and was quickly overwhelmed by Prefect, Dagster, Temporal, and even newer ones like Hatchet and Hamilton

Most of these frameworks now have docs / plugins / sister libraries geared around AI agents

It would be really helpful to read a good technical blog doing a landscape of design patterns in these different approaches, and thoughts on how to fit things together well into a pipeline given various quirks of LLMs (e.g. nondeterminism).

This page is a good start, even if it is written as an airflow-specific how-to!

reply
Hasz
1 day ago
[-]
Dated doesn’t mean bad (usually the opposite in my experience!) What issues do you have with Airflow?
reply
gre
1 day ago
[-]
Here's my problems with MWAA (amazon hosted airflow.) I have about 100 dags which maxes out the scheduler thread. Airflow parses all the files every minute so it's always parsing around 94% cpu. I could run a second scheduler thread if I coordinate with my SRE team and get the terraform deployed...it's really tedious.

Related possibly, my dags get kill -9 for no apparent reason. The RAM usage is not that high, maybe 2gb out of 8gb system RAM in use. No reason is given in the logs.

I am trying to switch to dagster, not because it's awesome, but because it hasn't crashed randomly on me.

reply
alittletooraph2
1 day ago
[-]
This feels like an MWAA issue but I understand how that often gets conflated with it being an Airflow issue.
reply
gre
1 day ago
[-]
You're right, it doesn't happen when developing locally, only in MWAA. This was the answer given by the Airflow team as well and I figured they would punt before I asked.

I realize Amazon is taking an open source project and making a ton of money on it (the instance prices are ridiculous for what you get) and the incentives are misaligned for the Airflow team to help AWS make it better unless AWS paid them to help fix it.

It's crap all around, and Airflow gets a bad rap from AWS's terrible MWAA product based on it.

reply
mblast311
1 day ago
[-]
MWAA is hot garbage. I had similar issues and switched to running it on EKS instead.
reply
mdaniel
1 day ago
[-]
> What issues do you have with Airflow?

Their operational perspective is catastrophic; how does one view the logs for a dag through the UI[1]? Why can't it store the python in the database they have attached to their deployment, versus making me jump through 80,000 hoops to put the files in the right magic directory on disk of every worker[2]?

1: no, not <https://airflow.apache.org/docs/apache-airflow/stable/ui.htm...> I mean the log, you know, like in the old days of $(tail -f /var/log/the.thing). I'm open to the answer hiding somewhere in this gobbledygook <https://airflow.apache.org/docs/apache-airflow/stable/admini...> but who is the target audience for having such a fancy UI and omitting log viewing from it, doubly so if there's some alleged http just for viewing logs

2: https://airflow.apache.org/docs/apache-airflow/stable/core-c... and double-plus-good anytime python software mentions PYTHONPATH -- that's how you know you're in for a hot good time https://airflow.apache.org/docs/apache-airflow/stable/admini...

reply
6LLvveMx2koXfwn
1 day ago
[-]
We deploy on K8s in OpenStack from a scheduled GitHub Actions pipeline which aggregates DAGs into a new container build based on hashes of hashes. This works well with almost no intervention.

WRT your 1, above any DAG output to stdout/err is available via the logs tab from the graph view of the individual tasks. Almost all our DAGs leverage on the PythonOperator though, not sure if that standardises this for us and your experience is muddied by more complexity than we currently have?

WRT 2. we generate an uber requirements.txt running pyreqs from the pipeline and install everything in the container automatically. Again no issues currently - although we do need to manually add the installation of test libraries to the pipeline job as for some reason auto-discovery is flakier for unit-tests frameworks.

reply
jedberg
1 day ago
[-]
I'd be curious if this scratches your itch:

https://www.dbos.dev/blog/durable-execution-crashproof-ai-ag...

reply
hbarka
1 day ago
[-]
Pleasantly surprised to see the name Mike Stonebraker in the About Us.
reply
jedberg
1 day ago
[-]
About to jump into an eng meeting with him right now!
reply
bashfulpup
1 day ago
[-]
This space is honestly a mess. I did an in depth survey around 1.5 yrs ago and my eventual conclusion was just to build with airflow.

You either get simplicity with the caveate that your systems need to perfectly align.

Or you get complexity but will work with basically anything (airflow).

reply
febed
1 day ago
[-]
Would be interested to know what drawbacks you found with Dagster or Prefect.
reply
jt_b
1 day ago
[-]
Prefect is amazing. Built out an ETL pipeline system with it at last job and would love to get it incorporated in the current one, but unfortunately have a lot of legacy stuff in Airflow. Being able to debug stuff locally was amazing and super clean integration with K8S.
reply
nikolayasdf123
1 day ago
[-]
+1 to this. other solutions over-promise, under-deliver, poor developer relations and communication, "open-source, but pay us" style open-source, and is indeed a mess
reply
itsallrelative
1 day ago
[-]
Truthfully have been a little skeptical of how many workloads will actually need “agents” vs doing something totally deterministic with a little LLM augmentation. Seems like I’m not the only one that thinks the latter works a lot of the time!
reply
petesergeant
1 day ago
[-]
Yes! I just wrote an article on this: https://sgnt.ai/p/hell-out-of-llms/
reply
drdaeman
1 day ago
[-]
I'm sorry, I don't really know Airflow, but what's the point of `@task.agent`, as compared to plain old `return my_agent.run_sync(...)`? To me it feels like a more restrictive[1], and possibly less intuitive[2] API.

[1]: Limited to what decorator arguments can do. I suspect it could become an issue with `@task.branch` if some post-processing would be needed to adjust for smaller models' finickinesses.

[2]: As the final step is described at the top of the function.

reply
jlaneve
1 day ago
[-]
Disclaimer: author of the SDK here.

It is _potentially_ more restrictive than writing pure Python functions, but the plus side is that we can interject certain Airflow-specific features into how the agent runs. And this isn't mean for someone who knows agents inside & out / wants the low-level customizability.

The best example of this today is log groups: Airflow lets you log things out as part of a "group" which has some UI abstractions to make it easier. This SDK takes the raw agent tool calls and turns them each into a log group, so you can see a) at a high level what the agent is doing, and b) drill down into a specific tool call to understand what's happening within the tool call.

To your point about the `@task.llm_branch`, the SDK & Pydantic AI (which the SDK uses under the hood) will re-prompt the LLM up to a certain number of attempts if it receives output that isn't the name of a downstream task. So there shouldn't be much finickiness.

reply
greatgib
1 day ago
[-]
Decorators in the usage example looks useless, and more to show off than being a real convenience.

In real life program, I don't think that you will have hundreds of calls to LLM or agent in your app so much that you have any code gains to decorator but at the opposite the decorator will make it very hard to have parametric values or values not hard coded but from config that you don't set up upfront at application startup like globals. That is a bad practice...

reply
jlaneve
1 day ago
[-]
Disclaimer: author of the SDK here.

Airflow actually uses decorators to indicate something is an explicit task in a data pipeline vs just a utility function, so this follows that pattern!

It also uses an "operator" under the hood (Airflow's term for a pre-built, parameterized task) which can be subclassed and customized if you want to do any customization.

reply
datadrivenangel
1 day ago
[-]
I'm looking into using LLM calls inside SQL Triggers to make agents / 'agentic' workflows. Having LLM powered workflows can get you powerful results and are basically the equivalent of 'spinning up' an agent.
reply
jumski
1 day ago
[-]
I'm not sure if this would scratch your itch, but I'm building a Postgres-native workflow engine that separates orchestration from execution. I want to be able to start my flows from within db triggers.

It just exposes set of functions to propagate the DAG through states and queues tasks for a task worker to perform the actual work and acknowledge completion or failure back to the sql orchestrator.

I was working on it for last few months and it will be ready in upcoming weeks, first version is dedicated to work on Supabase but I plan to make it agnostic.

If you want to learn more, check out the SQL Core readme which explains the whole concept (https://github.com/pgflow-dev/pgflow/tree/main/pkgs/core#rea...) or my Twitter for updates and some demos (https://x.com/pgflow_dev).

reply
fancy_pantser
1 day ago
[-]
Been having a great time with postgresml for this exact kind of thing. If you don't need a complex DAG but have a simple pipeline or work queue that can be easily represented in postgres anyway, it's very straightforward to work with and nicely encapsulates all of your processing (traditional data munging and LLM calls) together with a modest extension of a familiar system.
reply
jackthetab
1 day ago
[-]
Know of any online examples of the same?
reply
falcor84
1 day ago
[-]
This is about workflows that use AI, but it lead me to actually think of the inverse - has anyone experimented with AI agents defining and iterating upon long-running workflows?
reply
nikolayasdf123
1 day ago
[-]
nice. airflow is a good fit for this
reply
ldjkfkdsjnv
1 day ago
[-]
Extremely bearish on existing tools solving agentic workflows well. If anyone, it will be temporal. Airflow and the like simply were not designed for high dynamic execution, and so have all sorts of annoyances that will make them lose.
reply
alittletooraph2
1 day ago
[-]
Temporal’s great! That being said, there is something about being able to orchestrate LLMs and agents using what many already use to orchestrate their data workflows because there’s already proven out reliability, scalability, observability, etc. I’m sure there are boundary conditions for really advanced agentic workflows though…
reply
acchow
1 day ago
[-]
Temporal is for a static graph with idempotent nodes. Powerful LLM workflows don’t fit this model.
reply
ldjkfkdsjnv
1 day ago
[-]
Temporal is absolutely not for a static graph, idempotent nodes yes. Please explain your argument more
reply
jedberg
1 day ago
[-]
Have you checked out DBOS Transact[0]? DBOS is designed for high dynamic execution, and doesn't have the overhead or complexity of Temporal [1].

Disclosure, I'm the CEO of DBOS.

[0] https://github.com/dbos-inc/dbos-transact-py

[1] https://www.dbos.dev/blog/durable-execution-coding-compariso...

reply
ldjkfkdsjnv
1 day ago
[-]
I have seen it! And appreciate your response, just haven't had the time to dive in.

If you want a product hint from me, I think that adding integrations natively into the platform that would allow vibe coders to build asynchronous agents easier would really boost revenue. Like email, text, etc.

Probably not your vision, just a suggestion

reply
jedberg
1 day ago
[-]
> that would allow vibe coders

We've experimented with that actually! Six months ago it was terrible, but the new models are getting pretty good.

And it's definitely easier for an AI to generate DBOS code to make a fully formed distributed system than a fully formed distributed system somewhere else.

reply
ldjkfkdsjnv
8 hours ago
[-]
Making an asynchronous ai agent is still hard, there is a disconnect between the agentic LLM code (langgraph, openai agents, etc) and asynchronous distributed systems/message passing. True AI agents will have a cohesive joining of the two.
reply
nikolayasdf123
1 day ago
[-]
maybe it is not "highly dynamic execution" in the first place. daily/hourly schedule for batch processing is not too bad. and of course, rarely run jobs (e.g. github review, slack, etc. as author says in post) is definitely ok
reply