It seems as though this project is a sort of system for creating Makefiles, and that would be great for folks who are unfamiliar with them.
I'm not sure of the audience, though. At least in my research area, there are mainly two groups.
1. People who use latex (and scripting languages) are comfortable with writing Makefiles. 2. People who work in excel (etc) and incorporate results into msword.
Neither group seems a likely candidate for this (admittedly intriguing) software.
I spent time getting some mathematicians working together via version control rather than email, it was a bit of a mixed bag even using something simpler (e.g. svn). Eventual we moved back to email, except the rule was email me your update as a reply to the version you edited, and I scripted something to put it all into a repo on my end to manage merges etc. Worked ok. Better than the version where we locked access for edit but people forgot to unlock and went off to a conference...
If I was doing the same now, I'd probably set up on github, give each person a branch off main, and give them scripts for "send my changes" and "update other changes" - then manage all the merges behind the scenes for anyone who didn't want to bother.
I think expecting everyone in a working group to acquire the skills to deal with merge issues properly etc. is too far if they don't do any significant software work already. In the latter case., teach them.
The trouble with make is that unless you're very disciplined or very lucky, if you build the images and documents on your machine and I do the same on mine, we're going to get artifacts that look similar but hash differently, if for no other reason than that there's a timestamp showing up somewhere and throwing it off (though often for more concerning reasons involving the versions of whatever your Makefile is calling).
That prevents any kind of automated consensus finding about the functional dependency between the artifacts. Now reviewers must rebuild the artifacts themselves and compare the outputs visually before they can be assured that the data and visualizations are indeed connected.
So if we want to get to a place where papers can be more readily trusted--a place where the parts of research that can be replicated automatically, are replicated automatically, then we're going to need something that provides a bit more structure than make (something like nix, with a front end like Jacquard lab notebook).
The idea that we could take some verifiable computational step and represent it in a UI such that the status of that verification is accessible, rather than treating the makefile as an authoritative black box... I think it's rather exciting. Even if I don't really care about the UI so much, having the computational structure be accessible is important.
In physical science, no one commits academic fraud by manipulating a difference between the graphs they publish and the data they collected...they just enter bad data to start with. Or apply extremely invalid statistical methods or the like.
You can't fix this by trying to attest the data pipeline.
Recently I found one where the authors must've mislabeled something because the data for mutant A actually corresponded with the plot for mutant B.
Other times it'll take days of tinkering just to figure out which versions of the dependencies are necessary to make it work at all.
None of that sort of sleuthing should've required a human in the loop at all. I should be able to hold one thing constant (be it the data or the pipeline), change the other, and rebuild the paper to determine whether the conclusions are invariant to the change I made.
Human patience for applying skepticism to complex things is scarce. I want to automate as much of replication as possible so that what skepticism is available is applied more effectively. It would just be a nicer world to live in.
I have a day job, but spend a lot of thought about ways to improve academic/technical publishing in the modern era. There are a lot problems with our current academic publishing model: a lot of pay-walled articles / limited public access to research, many articles have no/limited access to the raw data or analytical code, articles don't make use of modern technology to enhance communication (interactive plots, animations, CAD files, video, etc.).
Top level academic journals are trying to raise the bar on research publication standards (partially to avoid the embarrassment of publishing fraudulent research) but they are all stuck not want to kill the golden goose. Academic publishing is a multi-billion dollar affair and making research open, etc. would damage their revenue model.
We need a GitHub for Science... not in the sense of Microsoft owning a publishing platform but in the sense of what GitHub provides for computer science; a platform for public collaboration on code and computer science ideas. We need a federated, open platform for managing experiments and data (i.e. an electronic lab notebook) and communicating research to the public (via code, animations, plots, written text in Typst/LaTeX/Markdown, video, audio, presentations, etc. Ideally this platform would also have an associated discussion forum for discussion and feedback on research.
Anyway, it’s a really cool project, and I’m looking forward to seeing how it grows.
[1] https://quarto.org/docs/manuscripts/authoring/rstudio.html
“This ‘Dropbox’ project of yours looks neat, but why wouldn’t people just use ftp and rsync?”
When I was working on PhD thesis 20 years ago, I had a giant makefile that generated my graphs and tables then generated the thesis from LaTeX.
All of it was in version control, made it so much easier, but no way anyone other than someone that uses those tools would be able figure it out.
I've always been impressed by the amount of effort that people are willing to put in to avoid using version control. I used mercury about 18 years ago, and then moved to git when that took off, and I never write much text for work or leisure without putting it in git. I don't even use branches at all outside of work - it's just so that the previous versions are always available. This applies to my reading notes, travel plans, budgeting, etc.
It just looks like "conf_paper1.tex" "conf_paper3.tex" "conf_paper_friday.tex" "conf_paper_20240907.tex" "conf_paper_last_version.tex" "conf_paper_final.tex"
...
"conf_paper_final2.tex"
Oh, and the figures reference files on local dir structure.
And the actual, eventually published version, only exists in email back and forth with publisher for style files etc.