FilterHN

Ask HN: How do you verify cron jobs did what they were supposed to?

5 points

by BlackPearl02

12 hours ago

| past

| 2 comments

| HN

I've been running into this issue where my cron jobs "succeed" but don't actually do their job correctly.

For example:

Backup cron runs, exit code 0, but creates empty files

Data sync completes successfully but only processes a fraction of records

Report generator finishes but outputs incomplete data

The logs say everything's fine, but the results are wrong. Actually, the errors are probably in the logs somewhere, but who checks logs proactively? I'm not going through log files every day to see if something silently failed.

I've tried:

Adding validation in scripts - works, but you still need to check the logs

Webhook alerts - but you have to write connectors for every script

Error monitoring tools - but they only catch exceptions, not wrong results

I ended up building a simple monitoring tool that watches job results instead of just execution - you send it the actual results (file size, count, etc.) and it alerts if something's off. No need to dig through logs.

But I'm curious: how do you all handle this? Are you actually checking logs regularly, or do you have something that proactively alerts you when results don't match expectations?

▲

krunck

1 hour ago

[-]

My way of doing things:

  1. Scripts should always return an error (>0) when things did not go as planned and 0 when they did. Always.
  2. Scripts should always notify you when they return >0. Either in their own way or via emails sent by Cron.
  3. Use chronic ( from Debian moreutils package) to ensure that cron jobs only email output when they ended in error. That way you don't need to worry about things sent to STDOUT spamming you.
  4. Create wrapper scripts for jobs that need extra functionality: notification, logging, or sanity checks.

▲

PenguinCoder

14 minutes ago

[-]

+1 for chronic. Very useful for knowing when a cron fails without needing to manually review every log run.

▲

maliciouspickle

32 minutes ago

[-]

this is not a direct answer to the original question, but problems like this are what let to the creation of orchestrator tools like airflow, luigi, dagster, prefect, etc.. these tools provide features which help increase task/job observability, ease of debugging, and overall reliability of scheduled jobs/tasks.

it is a natural progression to move on from cron and adopt an orchestrator tool (many options nowadays) when you need more insight into cron, or when you start finding yourself building custom features around it.

i would do some research into orchestators and see if there are any that meet your requirements. many have feature sets and integrations that’s solve some of the exact problems you’re describing

(as a data engineer my current favorite general purpose orchestrator is dagster. it’s lightweight yet flexible)

edit: as a basic example, in most orchestrators, there is a first class way to define data quality checks, if you have less data than expected, or erroneous data (based upon your expectations) you can define this as an automated check

you can then choose to fail the job, or set a number re-retries before failing , or send a notification to some destination of your choice( they have integrations with slack, and many other alerting tools)

i like dagster because it is geared for hooking into the data itself. you can use it to ‘run a job’ like a some function, but really it shines when you use its ‘data asset features’ that tracks the data itself over time, and provides a nice UI to view and compare data from each run over time. hook in alerting for anomalies and you’re good to go!

they have many more features depending on the tool , and some more or less complicated to set up.