I'm certainly willing to believe that yaml is not the ideal answer but unless we're comparing it to a concrete alternative, I feel like this is just a "grass is always greener" type take.
You write a compiler that enforces stronger invariants above and beyond everything is an array/string/list/number/pointer.
Good general-purpose programming languages provide type systems that do just this. It is criminal that the industry simply ignores this and chooses to use blobs of YAML/JSON/XML with disastrous results---creating ad-hoc programming languages without a typesystem in their chosen poison.
YAML is used for the declarative part of structuring the job graph. The host (in this case, GitHub) would need to call into your code to build the job graph. Which means it would need to compile your code, which means it needs its own build step. This means it would need to run on a build machine that uses minutes because GitHub is not going to just run arbitrary code for free.
There's no guarantee that your arbitrary language is thread safe or idempotent so it can't really run in parallel like how a declarative file could be used.
So now you're in a situation where you add another spin up and tear down step even if your actual graph gen call is zero cost.
There's a reason it works the way it does.
https://www.reddit.com/r/funny/comments/eccj2/how_to_draw_an...
I am not sure you can do this whilst having the granular job reporting (i.e. either you need one YAML block per job or you have all your jobs in one single 'status' item?) Is it actually doable?
I don't think anybody serious has any argument in favor of CloudFormation templates.
Note: mostly using Deno these days for this, though I will use .net/grate for db projects.
Some do just that: dagger.io. It is not all roses but debugging is certainly easier.
When something is written in a real programming language (that doesn't just compile down to YAML or some other data format), this becomes much more challenging. What should you do in that case? Attempt to parse the configuration into an AST and operate over the AST? But in many programming languages, the AST can become arbitrarily complex. Behavior can be implemented in such a way as to make it difficult to discover or introspect.
Of course, YAML can also become difficult to parse too. If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals. But in principal, that's still at least more tractable than trying to parse an AST.
cough CloudFormation cough
There are multiple ways to safely run untrusted code.
I for one enjoy how build.rs in rust does it: you have a rust code that controls the entire build system by just printing stuffs on stdout.
There are other ways of course
In any case, regardless whatever clever method you try to use, even if you're successful, it's not as straightforward and easily understood and extensible as OPA policy. Let's say you succeed in governing Rust code. OK, but now I have developers who are writing in Python and Java and TypeScript. What now? Develop a new, customized solution for each one? No thanks
You already don't have to use YAML. Use whatever language you want to define the configuration, and then dump it as YAML. By using your own language and outputting YAML, you get to implement any solution you want, and GitHub gets to spend more cycles building features.
Simple example:
1. Create a couple inherited Python classes
2. Write class functions to enable/disable GHA features and validate them
3. Have the functions store data in the class object
4. Use a library to output the class as YAML
5. Now craft your GHA config by simply calling a Python object
6. Run code, save output file, apply to your repo
I don't know why nobody has made this yet, but it wouldn't be hard. Read GHA docs, write Python classes to match, output as YAML.If you want more than GHA features support [via configuration], use the GHA API (https://docs.github.com/en/rest/actions) or scripted workflows feature (https://github.com/actions/github-script).
There are existing solutions around, but do miss out a bunch of things that are blatantly missing in the space:
- workflow visualisations (this is already working - you can see an example of workflow relationship and breakdowns on a non-trivial example at https://github.com/http4k/http4k/tree/master/.github/typeflo...);
- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what. Testing workflows anyone? :)
- security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;
- compliance tests around permitted Action versions;
- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;
- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;
Early days yet, but am planning to make it free for OSS and paid for commercial users. I'm also dogfooding it on one of my other open source projects so to make sure that it can handle non-trivial cases. Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!
Wish me luck!
Here's some fun examples to see why HCL sucks:
- Create an if/elseif/else statement
- Do anything remotely complex with a for loop (tip: you're probably going to have to use `flatten` a lot)
https://ant-contrib.sourceforge.net/tasks/tasks/if.html
<if>
<equals arg1="${foo}" arg2="bar" />
<then>
<echo message="The value of property foo is 'bar'" />
</then>
<elseif>
<equals arg1="${foo}" arg2="foo" />
<then>
<echo message="The value of property foo is 'foo'" />
</then>
</elseif>
<else>
<echo message="The value of property foo is not 'foo' or 'bar'" />
</else>
</if>https://ant-contrib.sourceforge.net/tasks/tasks/for.html
<for param="file">
<path>
<fileset dir="${test.dir}/mains" includes="*.cpp"/>
</path>
<sequential>
<propertyregex override="yes"
property="program" input="@{file}"
regexp=".*/([^\.]\*)\.cpp" replace="\1"/>
<mkdir dir="${obj.dir}/${program}"/>
<mkdir dir="${build.bin.dir}"/>
<cc link="executable" objdir="${obj.dir}/${program}"
outfile="${build.bin.dir}/${program}">
<compiler refid="compiler.options"/>
<fileset file="@{file}"/>
<linker refid="linker-libs"/>
</cc>
</sequential>
</for>
Yes, programming with them was as fun as you're imagining. (if (equals foo "bar")
(then (echo "The value of property foo is 'bar'"))
(elseif (equals foo "foo")
(then (echo "The value of property foo is 'bar'")))
(else (echo "The value of property foo is not 'foo' or 'bar'")))
That’s not a Lisp-like language I particularly like, but it’s not flat-out insane like Ant appears to be.Advocates for the inappropriate use of XML (basically, anywhere it was used as anything other than a markup language) have lot to answer for.
It takes your programming language version and turns it into github actions yaml, so you dont need to do any of that sort of thing.
- workflow visualisations (this is already working - you can see an example of workflow relationship and breakdowns on a non-trivial example at https://github.com/http4k/http4k/tree/master/.github/typeflo...);
- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what; - security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;
- compliance tests around permitted Action versions;
- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;
- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;
Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!
:)
I like the things I depend on to actually have a funding model, so that's actually more appealing to me than something fully free.
Could I possibly ask you to reply with the model of your phone so can make sure it works ok after have fixed?
You're blog looks just perfect to me, and loads fine even on my extremely slow Redmi A5. And even the code examples look fine.
So sorry! I must've accidentally written my comment on the wrong page.
The post I thought I was responding to showed a website with an animated "breathing" background which was completely locking up my phone.
Edit: Just to drive this point home, if you're the author https://blog.yossarian.net you've made a perfect looking blog and I wouldn't change a thing.
I really enjoyed working with the Earthfile format[1] used for Earthly CI, which unfortunately seems like a dead end now. It's a mix of Dockerfile and Makefile, which made it made very familiar to read and write. Best of all, it allowed running the pipeline locally exactly as it would run remotely, which made development and troubleshooting so much easier. The fact GH Actions doesn't have something equivalent is awful UX[2].
Honestly, I wish the industry hadn't settled on GitHub and GH Actions. We need better tooling and better stewards of open source than a giant corporation who has historically been hostile to open source.
[1]: https://earthly.dev/earthfile
[2]: Yes, I'm aware of `act`, but I've had nothing but issues with it.
That is the key function any serious CI platform needs to tackle to get me interested. FORCE me to write something that can run locally. I'll accept using containers, or maybe even VMs, but make sure that whatever I build for your server ALSO runs on my machine.
I absolutely detest working on GitHub Actions because all too often it ends up requiring that I create a new repo where I can commit to master (because for some reason everybody loves writing actions that only work on master). Which means I have to move all the fucking secrets too.
Solve that for me PLEASE. Don't give me more YAML features.
Working with ADO pipelines is painful.
- Make change locally
- Push change
- Run pipeline
- Wait forever because ADO is slow
- Debug the error caused by some syntax issue in their bastardized version of yaml
- Repeat
I've seen few thousands-line YAML files with anchors riddled all over the place. It was impossible to deal with. Rewriting it in Jsonnet paid off immediately.
Another example is Nixpkgs. It's quite pleasant to deal with despite the size of its codebase.
Jokes aside, I like proper yaml anchors. Other CI's do support these and it made writing yaml actions much easier, esp. complicated cross-building recipes with containers and qemu.
I say this as someone that built entire Jenkins Groovy frameworks for automating large Jenkins setups (think hundreds of nodes, thousands of Jenkins jobs, stuff like that).
Although, I think it is generally an accepted practice to use declarative configuration over imperative configuration? In part, maybe what the article is getting at, maybe?
We write Bash or Python, and our tool will produce the YAML pipeline reflecting it.
So we dont need to maintain YAML with over-complicated format.
The resulting YAML is not meant to be read by an actual human since its absolute garbage, but the code we want to run is running when we want, without having to maintain the YAML.
And we can easily test it locally.
Honestly, just having a linter should be enough. Ideally, anything complicated in your build should just be put into a script anyways - it minimizes the amount of lines in that massive YAML file and the potential for merge conflicts when making small changes.
I use CUE to read yamhell too
GitHub Actions have a lot of rules, logic and multiple sublanguages in lots of places (e.g. conditions, shell scripts, etc.) YAML is completely superficial, XML would be an improvement due to less whitespace sensitivity alone.
Plus it has exactly enough convenience-feature-related sharp edges to be risky to hand to a newbie, while wearing the dress of something that should be too bog-simple to have that problem. I, too, enjoy languages that arbitrarily decide the Norwegian TLD is actually a Boolean "false."
It's the workflow for developing pipelines that's the problem. If I had something I could run locally - even in a debug dry-run only form that would go a long way to debugging job dependencies, etc. Testing failure cases flow conditional logic in the expected manner etc.
Language implementations for yaml vary _wildly_.
What does the following parse as:
some_map:
key: value
no: cap
If I google "yaml online" and paste it in, one gives me:{'some_map': {False: 'cap', 'key': 'value'}}
The other gives me:
{'some_map': {'false': 'cap', 'key': 'value'}}
... and neither gives what a human probably intended, huh?
Most notably it only offers three base types (scalar string, array, object) and moves the work of parsing values to stronger types (such as int8 or boolean) to your codebase where you tend to wrap values parsed from YAML into other types anyway.
Less surprises and headaches, but very niche, unfortunately.
i.e:
.scoped-env: &scoped-env
key1: value1
...
Dot targets are ignored semantically, only inheritors make them useful.Then further down you can reuse *scoped-env wherever you need.
You can also have anchors on individual lines and compose them. It's useful.
The author suggests using ad-hoc syntax, or meta-keys. Gitlab supports that [1] and I use them as well.
Different but also combinable uses, for various people. Nothing wrong with it.
Now only if they supported paths filter for `workflow_call` [2] event in addition to push/pull_request and my life would be a lot easier. Nontrivial repos have an unfortunate habit of building some sort of broken version of change detection themselves.
The limit of 20 unique workflow calls is quite low too but searching the docs for a source maybe they have removed it? It used to say
> You can call a maximum of 20 unique reusable workflows from a single workflow file.
but now it's max of 4 nested workflows without loops, which gives a lot of power for the more complex repos [3]. Ooh. Need to go test this.
[1] https://docs.github.com/en/actions/reference/workflows-and-a...
[2] https://docs.github.com/en/actions/reference/workflows-and-a...
[3] https://docs.github.com/en/actions/how-tos/reuse-automations...
Generate it from Dhall, or cue, or python, or some real language that supports actual abstractions.
If your problem is you want to DRY out yaml, and you use more yaml features to do it, you now have more problems, not fewer.
I find it an absolute shame that languages like Dhall did not become more popular earlier. Now everything in devops is yaml, and I think many developers pick yaml configs not out of good reasons but defaulting to its ubiquity as sufficient.
yaml 1.2 was released in 2009, and it fixed this problem. this is an implementation issue.
Every single implementation people actually use seems to be a messy mix of yaml 1.1 and 1.2....
Maybe if the yaml project wants to consider this fixed, they should have written some correct reference parsers themselves for any languages in need, and encouraged their use.
So the Norway problem persists.
is there a parser that says that it's Yaml 1.2 compliant that uses that regex? I don't know of one.
The config generators are very simple, and should to be written in whatever language your developers already know - which likely means Python or Javascript or Go.
Asking the team to add a new build dependency, learn a new language, and add a new build step would create considerably more problems, not fewer. Used sparingly and as needed, YAML anchors are quite easy to read. A good editor will even allow you to jump to the source definition just as it would any other variable.
Being self-contained without any additional dependencies is a huge advantage, particularly for open source projects, IMHO. I'd wager very few people are going to learn Dhall in order to fix an issue with an open source project's CI.
YAML isn't that hard. Most GitHub Actions configs I see are well under 500 lines; they're not crumbling under the weight of complexity.
I'm saying GHA should use a proper programming language instead of assembly.
If generating your GitHub Actions config from a programming language works for you, fantastic. I'm just happy we now have another (IMHO, attractive) option.
I mostly agree with the article that with GitHub Actions specifically, I try to refactor things to the top-level "workflow" level first, and then yeah resort to copy and paste in most other cases.
I'm a little less adamant that GitHub should remove anchor support again than the original poster, but I do sympathize greatly, having had to debug some CircleCI YAML and Helm charts making heavy use of YAML anchors. CircleCI's YAML is so bad I have explored options to build it with a build process. Yeah, it does create new problems and none of those explorations got far enough to really improve the process, but one of the pushes to explore them was certainly that YAML anchors are a mess to debug, especially when you've got some other tool concatenating YAML files together and can result in anchor conflicts (and also other parts of the same YAML that depend on a particular form of how anchor conflicts overwrite each other, oof). I don't see GitHub Actions necessarily getting that bad just by enabling anchors, but I have seen enough of where anchors become a crutch and a problem.
If you're dealing with 10s of files that are 1000s of lines long, then YAML anchors may very well not be the ideal option. Having the choice lets each team find what works best for them.
:) :) :)
.github/workflows in my current project: 33 files, 3913 lines total, 1588 lines unique.
(and this was _after_ we moved all we can into custom actions and sub-workflows)
Templating GitHub Actions is very powerful (I've worked with such a setup) but it has its own headaches and if you don't _need_ custom tooling better to not have it.
I can wish for improvements on the native setup without reaching out for the sledgehammer.
Above a certain level of complexity, sure. But having nothing in between is an annoying state of affairs. I use anchors in Gitlab pipelines and I hardly curse their names.
OPs main argument seems to be "I don't have a use for it and find it hard to read so it should be removed".
I don't think this is a fair characterization: it's not that I don't have a use for it, but that I think the uses are redundant with existing functionality while also making static and human analysis of workflows harder.
YAML anchors are a standard that I can learn and use in a lot of places.
The idiosyncrasies of GitHub actions aren’t really useful for me to learn.
Just my $0.02
That in turn means that there's no way to construct a source span back to the anchor itself, because the parsed representation doesn't know where the anchor came from (only that it was flattened).
I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)
Yes, it's just difficult. The point made in the post isn't that it's impossible, but that it significantly changes the amount of of "ground work" that static analysis tools have to do to produce useful results for GitHub Actions.
> I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)
It's worth noting that GitHub doesn't support other parts of the YAML spec: they intentionally use their own bespoke YAML parser, and they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.
All in all, I think conformance with YAML is a red herring here: GitHub Actions is already its own thing, and that thing should be easy to analyze. Adding anchors makes it harder to analyze.
maybe, but not entirely sure. 'Two wrongs don't make a right' kind of thinking on my side here.
But if they call it GFY and do what they want, then that would probably be better for everyone involved.
> they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.
I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys
--
(though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)
Yes, agreed.
> I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys
I think GitHub has been pretty ambiguous about this: it's not clear to me at all that they intend to support either version of the spec explicitly. Part of the larger problem here is that programming language ecosystems as a whole don't consistently support either 1.1 or 1.2, so GitHub is (I expect) attempting to strike a happy balance between their own engineering goals and what common language implementations of YAML actually parse (and how they parse it). None of this makes for a great conformance story :-)
> (though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)
The error report includes source spans, so the tool needs to map back to the original location of the anchor rather than its unrolled document position.
(This is table stakes for integration with formats like SARIF, which expect static analysis results to have physical source locations. It's not good enough to just say "there's a bug in this element and you need to find out where that's introduced," unfortunately.)
Or in other words: if your problem is DRYness, GitHub should be fixing or enhancing the ~dozen other ways in which the components of a workflow shadow and scope with each other. Adding a new cross-cutting form of interaction between components makes the overall experience of using GitHub Actions less consistent (and less secure, per points about static analysis challenges) at the benefit of a small amount of deduplication.
So true, not the first time, not the last time.
I'm at the point of exploring Gerrit as an alternative
(As the post notes, neither I nor GitHub appears to see full compliance with YAML 1.1 to be an important goal: they still don't support merge keys, and I'm sure they don't support all kinds of minutiae like non-primitive keys that make YAML uniquely annoying to analyze. Conforming to a complex specification is not inherently a good thing; sometimes good engineering taste dictates that only a subset should be implemented.)
That's a long way to say "yes, actually"
"Because I don't like it" makes it sound like I don't have a technical argument here, which I do. Do you think it's polite or charitable to reduce peoples' technical arguments into "yuck or yum" statements like this?
When you say something comes down to engineering "taste" then you've already reduced your own argument.
If it makes things easier for you, you can substitute between these two.
Kind of a hard disagree here; if you don't want to conform to a specification, don't claim that you're accepting documents from that specification. Call it github-flavored YAML (GFY) or something and accept a different file extension.
https://github.com/actions/runner/issues/1182
> YAML 1.1 to be an important goal: they still don't support merge keys
right, they don't do merge keys because it's not in YAML 1.2 anymore. Anchors are, however. They haven't said that noncompliance with YAML 1.2 spec is intentional
Sure, I wouldn't be upset if they did this.
To be clear: there aren't many fully conforming YAML 1.1 and 1.2 parsers out there: virtually all YAML parsers accept some subset of one or the other (sometimes a subset of both), and virtually all of them emit the JSON object model instead of the internal YAML one.
(This post is written from my perspective as a static analysis tool author. It's my opinion from that perspective that the benefits of anchors are not worth their costs in the specific context of GitHub Actions, for the reasons mentioned in the post.)
This also means that, if you use an off-the-shelf implementation to parse these files, you're "doing it wrong", as you are introducing a parser differential: I can put code in one of these files that one tool uses and another tool ignores. (Hopefully, the file just gets entirely rejected if I use the feature, but I do not remember what the experience I had was when I tried using the feature myself; but, even that is a security issue.)
> Except: GitHub Actions doesn’t support merge keys! They appear to be using their own internal YAML parser that already had some degree of support for anchors and references, but not for merge keys.
Well, hopefully they also prioritize fixing that? Doing what GitHub did, is apparently still doing, and what you are wanting them to keep doing (just only in your specific way) is not actually using "YAML": it is making a new bespoke syntax that looks a bit like YAML and then insisting on calling it "YAML" even though it isn't actually YAML and you can neither read the YAML documentation nor use off-the-shelf YAML libraries.
Regardless, it sounds like your tool already supports YAML anchors, as your off-the-shelf implementation of YAML (correctly) supports YAML anchors. You are upset that this implementation doesn't provide you source map attribution: that was also a problem with C preprocessors for a long time, but that can and should be fixed inside of the parser, not by deciding the language feature shouldn't exist because of library limitations.
but there isn't a single YAML spec, there are at least 2 in common use: yaml 1.1, and 1.2, which have discrete specs and feature-sets. re: anchor stuff specifically, 1.1 supports merge keys whereas 1.2 explicitly does not, so that's one thing
and github actions does not actually specify which yaml spec/version it uses when parsing workflow yaml files
it's unfortunately just not the case that "YAML means something" that is well-defined in the sense that you mean here
Sure, agreed. Another comment notes that GitHub probably should call this their own proprietary subset of YAML, and I wouldn't object to that.
> Well, hopefully they also prioritize fixing that?
I expect they won't, since it's not clear what version of YAML they even aim to be compatible with.
However, I don't understand why engineers who wouldn't jump off of a bridge because someone told them to would follow a spec to the dot just because it exists. Specifications are sometimes complicated and bad, and implementing a subset is sometimes the right thing to do!
GitHub Actions, for example, doesn't make use of the fact that YAML is actually a multi-document format, and most YAML libraries don't gracefully handle multiple documents in a single YAML stream. Should GitHub Actions support this? It's entirely unclear to me that there would be any value in them doing so; subsets are frequently the right engineering choice.
Half the argument against supporting YAML anchors appears to boil down some level of tool breakage. While you can rely on simplifying assumptions, you take a risk that your software breaks when that assumption is invalidated. I don't think that's a reason to stop evolving software.
I've never seen a project use any of the tools the author listed, but I have seen duplicated config. That's not to say the tools have no value, but rather I don't want to be artificially restricted to better support tools I don't use. I'll grant that the inability to merge keys isn't ideal but, I'll take what I can get.
aliases:
common-env: &common-env
key1: value1
key2: value2
tasks:
- key: some-task
run: ...
env:
<<: *common-env
https://www.rwx.com/docs/mint/aliases- The complaint is Github using a non-standard, custom fork of yaml
- This makes it harder to develop linters/security tools (as those have to explicitly deal with all features available)
- The author of this blogpost is also the author of zizmor, the most well-known Github Actions security linter (also the only one I'm aware of)
Using anchors would have improved the security of this, as well as the maintenance. The examples cited don't remotely demonstrate the cases where anchors would have been useful in GA.
I agree that YAML is a poor choice of format regardless but still, anchor support would have benefitted a number of projects ages ago.
(Also, as a personal bias, merge keys are really bad because they are ambiguous, and I haven't implemented them in my C++ yaml library (yaml-cpp) because of that.)
[1]: https://ktomk.github.io/writing/yaml-anchor-alias-and-merge-...
E.g.
(#1=(a b) c d e #1#)
encodes ((a b) c d e (a b))
where the two (a b) occurrences are one object. It can express circular structures: #1=(a b c . #1#)
encodes an infinite circular list (a b c a b c a b c ...)
The object to be duplicated is prefixed with #<decimal-integer>=. This associates the object with the integer. The integer is later referenced as #<decimal-integer># to replicate it.The thing is, you don't see a lot of this in human-written files, whether they are source code or data.
This is not the primary way that Lisp systems use for specifying replicated data in configurations, let alone code.
Substructure sharing occurs whether you use the notation or not due to interned symbols. (Plus compilers can deduplicate strings and such.) In (a a a) there is only one object a, a symbol.
If you feed the implementation circular source code though, ANSI CL says the behavior is undefined. Some interpreters can handle it under the right circumstances. In particular ones that don't try to do a full macro-expanding code walk before running the code. Compilers, not so much.
Turns out it does report values at their targets (which is desirable) but doesn't know or indicate that they're anchors (undesirable).
Also tested something with yq - if you tell it to edit a node that is actually from a yaml anchor, it updates the original anchor without warning you that that's what you're doing. Yikes.
(For anyone who wants to test it: https://pypi.org/project/yamlgrep/)
One day we might even see for-loop in CSS...
I hypothesize a TC-complete language for something like CSS that included deep tracking under the hood for where values are coming from and where they are going would be very useful, i.e., you would have the ability to point at a particular part of the final output and the language runtime could give you a complete accounting of where it came from and what went into making the decisions, could end up giving us the auditability that we really want from the "declarative" languages while giving us the full power of the programming langauges we clearly want. However I don't have the time to try to manifest such a thing myself, and I don't know of any existing language that does what I'm thinking of. Some of the more powerful languages could theoretically do it as a library. It's not entirely unlike the auditing monad I mention towards the end of https://www.jerf.org/iri/post/2958/ . It's not something I'd expect a general-purpose language to do by default since it would have bad general-purpose performance, but I think for specialized cases of a TC-complete configuration langauge it could have value, and one could always run it as an debugging option and have an optimized code path that didn't track the sources of everything.
TBH it's getting a bit exhausting watching us go through this hamster wheel again and again and again.
In GitLab, where YAML anchors have been supported for years, I personally find them very useful —it's the only way of "code" reuse, really. In GitLab there's a special edtor just for .gitlab-ci.yml, which shows the original view and the combined read-only view (with all anchors expanded).
I agree that it's hard to point to the specific line of the source code, but it's enough — in case of an error — to output an action name, action property name, and actual property value that caused an error. Based on these three things, a developer can easily find the correct line.
not really. You can also use include/extends pattern. If that is not enough, there is dynamic pipeline generation feature.
An example of how I normally use them and why I still find them useful:
# Imports whole pipeline architecture jobs
include:
- project: company/ci-templates
ref: "master"
file: "languages/stack.yaml"
# Define a command I need to use exactly the same
# across different jobs I'm going to override
.vpn_connect: &vpn_connect
- cmd1
- cmd2
- cmd3 &
job1:
extends: imported1
script:
- ....
- *vpn_connect # 2nd command
- ...
job2:
extends: imported2
script:
- ...
- ...
- *vpn_connect # 3rd command
- ...
Interesting, although to me it looks more like a way to split one file into several (which is rather useful).
> extends
What's the difference with anchors? Looks the same, except works with include (and doesn't work with any other yaml tool).
> dynamic pipeline generation
Which is even harder to reason about compared to anchors, although certainly powerful.
This is a terrible advice from security endpoint - given that env variables are often used for secrets data, you really _don't_ want them to set them at the top level. The secrets should be scoped as narrow as possible!
For example, if you have a few jobs, and some of them need to download some data in first step (which needs a secret), then your choices are (a) copy-paste "env" block 3 times in each step, (b) use the new YAML anchor and (c) set secret at top-level scope. It is pretty clear to me that (c) is the worst idea, security wise - this will make secret available to every step in the workflow, making it much easier for malware to exfiltrate.
Custom YAML anchors with custom support and surprise corner cases: bad.
Or I could use a YAML anchor.
First, he can just not use the feature, not advocate for its removal.
Second, his example alternative is wrong: it would set variables for all steps, not just those 2, he didn't think of a scenario where there are 3 steps and you need to have common envs in just 2 of them.
> First, he can just not use the feature, not advocate for its removal.
I maintain a tool that ~thousands of projects use to analyze their workflows and actions. I can avoid using anchors, but I can't avoid downstreams using them. That's why the post focuses on static analysis challenges.
> Second, his example alternative is wrong: it would set variables for all steps, not just those 2, he didn't think of a scenario where there are 3 steps and you need to have common envs in just 2 of them.
This is explicitly addressed immediately below the example.
If you have two workflows... one to handle a PR creation/update and another to address the merge operation, it is like pulling teeth to get the final commit properly identified so you can grab any uploaded artifacts from the PR workflow.
Once you allow setting and reading of variables in a configuration file, you lose the safety that makes the format useful. You might as well be using a bash script at that point.
I think allowing both setting and reading of variables directly in the configuration file is a problem.
Not reading variables that have been set outside of the configuration file alone.
I think they should be supported because it's surprising and confusing if you start saying 'actually, it's a proprietary subset of YAML', no more reason needed than that.
There, that's better.
Give me a proper platform that I can run locally on my development machine.
but, if those anchors are a blessed standard YAML feature that YAML tools will provide real assertions about unlike the ${{}} stuff that basically you're doing a commit-push-run-wait-without any proper debug tools besides prints?
Then yes, they should use them.