FilterHN

IncRnd

11 months ago

[-]

From the top of the page:

find + mkdir is Turing complete (retracted) The proof is flawed and I retract the claim that I proved that find + mkdir is Turing complete. See https://news.ycombinator.com/item?id=41117141. I will update the article if I could fix the proof.

11 months ago

[-]

It's already fixed.

https://www.danieltemkin.com/Esolangs/Folders/

vanderZwan

11 months ago

[-]

So can you implement Folders with it?

dspillett

11 months ago

[-]

> In Windows, folders are entirely free in terms of disk space! For proof, create say 352,449 folders and get properties on it.

To be that guy for a moment: well, hackchewally…¹

Directory entries do take up space in the MFT, but that doesn't show up in explorer which is only counting allocated blocks elsewhere. You will eventually hit a space issue creating empty directories as the MTF grows to accept their allocation.

You can do similar tricks with small files. Create an empty text file and check, it will show 0 bytes size and 0 bytes on disk. Put in ~400 bytes of text and check again: explorer will show 400 bytes in length but 0 size on disk because the data is in the directory entry in the pre-allocated MFT. Double up that data, and it will be big enough that a block on the disk is allocated: in properties in Explorer you'll now see 800 bytes length and 4,096 bytes (one block) on disk. Drop it back to 400 bytes and it won't move the data back into the MFT, you'll now see 400 bytes length, 4096 bytes consumed on disk.

[1] though don't let this put you off enjoying the splendid thing overall!

chungy

11 months ago

[-]

NTFS is very much an old school file system design, where most/all metadata is stuffed into predefined regions on the disk. NTFS's MFT, in particular, can and does grow as needed (though it never shrinks!), but as entries in the MFT vanish, that space is just waiting for future allocations. It makes it a bit difficult to tell what an empty directory or file "really" takes, because when it's just part of the MFT, there's (usually) no change in used/free disk space just by creating them.

In the Unix land, the same sort of thing exists with file systems such as ext4 and UFS2: they also depend on predefined regions for metadata. If you'd like to venture into ZFS, however, (almost) all metadata is dynamically created and destroyed on an as-needed basis, and as such, ZFS always reports the on-disk usage by counting both user data and metadata. It's easy to see a directory grow to many megabytes by just creating a lot of empty files inside of it.

woodrowbarlow

11 months ago

[-]

interesting! does that mean a windows exe could repeatedly create empty files, stuff a bunch of metadata in them, and eventually eat up all the disk space in a way that can't be fixed without re-installing windows? have viruses been known to exploit this?

dspillett

11 months ago

[-]

> and eventually eat up all the disk space in a way that can't be fixed without re-installing windows
Not quite, as the MFT can't consume all the device, so you can't fill the whole lot that way, but you could cause significant inconvenience that is not undoable without migrating to another filesystem. That migration might be possible without a reinstall as such, though it would take a lot of manual jigger-pokery (and I don't know of any tools that would help, it isn't something I expect someone has felt the need to write tools for) so the reinstall would likely be easier.

> have viruses been known to exploit this?*

I doubt it. If the virus is intended to extract something from the user (exfiltrating data, ransom, making them part of a bot-net, etc.) then it wants to work in the background not causing inconvenience like that (until it fires in the case of encryption & ransom, but that is a different sort of inconvenience), and if the goal of the virus is just to cause a mess then there are more effective methods of doing so.

vanderZwan

11 months ago

[-]

Remember that esolangs are arguably the "purest" medium of artistic expression for programming, and that artworks often are about challenging or confronting implicit assumptions. Or to put it differently: yes, that is indeed the joke :).

Thank you for elaborating the technical details though (plus I did not know the small file trick, that's a neat bit of trivia).

sumtechguy

11 months ago

[-]

Also MFT can also grow in size if you exceed the current allocated size. Depending on your version of windows that growth rate is different. Also once the MFT grows it will not shrink. The tools for MFT cleanup are rather poor. With the usual recommendation of 'just format a new drive and start over'.

philistine

11 months ago

[-]

Apple finally managed to switch its file system and escape the pains of late 80s technology. When will Microsoft replace NTFS?

Negitivefrags

11 months ago

[-]

In theory ReFS is the replacement. It hasn’t seen large uptake though.

deepsun

11 months ago

[-]

"Don't touch if it works" (c) /s

lelandfe

11 months ago

[-]

Master File Table

moffkalast

11 months ago

[-]

"Disk manufacturers hate this simple trick for free data storage."

dspillett

11 months ago

[-]

Oh, that might make a great April 1st release: a mirror filesystem module for WinFSP that splits files into 500 byte chunks on disk. “See, we saved that 4Mbyte photo to the new filesystem, and it, using the NTFS infinite space for small files trick, made it take absolutely zero disk space! Now we'll make a sub-folder and drop a few more files in that, and look, they show as taking no space in the magic backing store but can be properly opened as normal!”.

Drop it on youtube or tt, get your popular friends (this might scupper me, if anyone I know is an online influenza they have the good sense not to let me know about such proclivities!) to make a review of it, and see how far and wide it spreads with people either in on the joke or idiots just parroting it for views.

devbent

11 months ago

[-]

Back in the days of dos, I use one of the disK compression utilities to store 20 MB on a regular 3.5-in floppy. The entire 20 MB consisted of metadata for the compression system!

Moosturm

11 months ago

[-]

This is phenomenal

11 months ago

[-]

I don't understand how this shows Turing completeness. The implementation of the rule 110 automaton seems to be limited by both width (not Turing complete because there is a finite number of states of a given width) and iteration limit (not be Turning complete because it always terminates).

Can you write an implementation of rule 110 with arbitrary (i.e. unbounded) width and depth?

viraptor

11 months ago

[-]

It's still ok if the implementation limits it rather than the concept. I mean, your computer has finite memory rather than infinite tape, so it doesn't meet that requirement either regardless of language/method.

11 months ago

[-]

I am not talking about limitations of find or mkdir like other commenters are.

I can write a Python program that simulates rule 110 with unbounded state width and unbounded iteration depth. I might not be able to execute it on any computer (it's going to run out of memory at some point), but I can still reason about it and its behavior with a (theoretical) infinite memory. After reading the blog post, I am not convinced I can write such a program using `find` and `mkdir`, since the provided example uses explicit limits for WIDTH and ITER in the program itself.

11 months ago

[-]

The same argument would make C non turing complete. Because the size of pointers is a compile time constant and because everything needs to have an address that puts a large, but hard limit on tape length.

There are ways to argue arround that, e.g. C might be able to interface with a infinite tape file via the stantard library, and maybe strict aliasing and pointer provenance let's you create a system where bit identical pointers can be different. But the mental model most people have of C wouldn't be turing complete.

11 months ago

[-]

While strictly speaking true I don't think it is the same argument at all. You are talking about a restriction of the runtime (much like mkdir argument length or maximum filesystem depth), even though it leaks into the standard because standard people care about physical hardware, not theoretical ones.

The WIDTH and ITER limit being actual constants that are part of the program makes all the difference compared to C pointer limitations that are part of the execution environment.

Joker_vD

11 months ago

[-]

But C is not Turing-complete, because its numbers (and pointers) are required to have an arbitrary limit which itself must be representable and useable in C. Python, on the other hand, has no such requirement and you could imagine an implementation that could grow indefinitely (well, until it ran out of the physical universe). C is prohibited from doing that, there is always a theoretical limit present. You can set that limit hella high, but it's still there.

a_cardboard_box

11 months ago

[-]

It's still not comparable to mkdir and find.

The arbitrary limit in C is not fixed by the code. So if you run out of space on a 32-bit machine with sizeof(size_t) == 4, you can run the same code on a 64-bit machine with sizeof(size_t) == 8. With mkdir and find, you have to change the code to do this.

You can translate any Turing Machine into a single C program, which will behave identically so long as you have enough memory. You cannot do this if you need to change the program when the amount of memory changes.

11 months ago

[-]

I'd argue that the process of taking a C program and compiling and running it on ever larger pointer sizes it turing complete, but not a single iteration of this process.

lupire

11 months ago

[-]

"Python" doesn't exist though. It's silly to claim that Python is more powerful just because it doesn't tell you what it's going to do (run on some limited hardware) and C lets you choose. Reality is fundamentally, necessarily different from theory. Everything real is a mere approximation of theory, and vice versa. Even Turing's machine is limited by all the paper in the universes, unless you ignore reality (which is fine!). It's false precision to say that C in reality with bounded pointers is different from C with unbounded pointers, but Python in reality with secretly bounded pointers is the same as Python with unbounded pointers.

Joker_vD

11 months ago

[-]

No, it's not a false precision. The C requires the numbers (and memory size) to have an upper limit which it is obliged to tell you. The Python doesn't require such limitations to exist. They will, of course, exist in practice since it's impossible to build a Turing machine but only a sufficiently huge finite-state machine, but that is the practical consideration. In practice, all language implementations are FSMs. But C is a FSM even in theory, unlike Python.

You would have better luck trying to argue that there is a reading of standard that allows for unboundedly huge file streams and that fread()/fwrite()/fseek() then could be used to faithfully implement Turing machine.

majorexception

11 months ago

[-]

I don't think there's real difference between Python and C. With Python you can't make your program use more than 4 GB of address space if you run it with a 32-bit interpreter. You have to swap the interpreter. In C the same goes for the compiler. And yes, you can inspect the upper limit by looking at the width of size_t, but it will be seen differently with 32 and 64-bit compilers although the program will be the same. And you _can_ make program behave differently basing on size_t's width, but you're not required to. It doesn't change that fundamentally Python is no more Turing-complete than C just because you can't do it in Python (that's my assumption, I don't know Python well enough actually).

Maybe it all boils down to how CPUs work, and maybe it's safe to say that the incompleteness comes from the CPU implementation? You can of course argue that Python interpreters are written in C/C++, but of course we can imagine they can be written in assembly.

Edit: after I read some other comments I think I see the point - that indeed the problem is the implementation (on CPU).

11 months ago

[-]

> But C is a FSM even in theory, unlike Python

Write a python interpreter in C and it's clear to see why your logic fails. You've reaped your claimed benefits of Python while remaining in C.

11 months ago

[-]

Python-the-language can be Turing-complete even if Python-as-actually-implemented is not.

11 months ago

[-]

Then C-the-language can be Turing complete, even if C-as-actually-implemented is not. Just implement a python interpreter. (Or you can also just implement bignums in C and use those for your computation)

11 months ago

[-]

You can't implement a Python interpreter with access to infinite memory in C as specified. That is the point.

11 months ago

[-]

Why not? Whether you're running python code in your C interpreter or just running C code, the same memory restrictions will apply based on your hardware. CPython doesn't place a lower bound on bignums over a non-C based implementation

EDIT: See the GMP library, which states "There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on"[0]

https://gmplib.org/#WHAT

11 months ago

[-]

The C specification limits programs to addressing a finite amount of memory, though it can be made arbitrarily large by an implementation. The Python specifications do not imply this though real interpreters do.

11 months ago

[-]

> though it can be made arbitrarily large by an implementation

Yes, this is my entire point

Why should I care what the language specification states in a computability theory discussion? There only needs to exist a method to accomplish our goal-Whether the method conforms to specification or not doesn't seem relevant to me.

sambaumann

11 months ago

[-]

Would it be fair to say then that "Python" is Turing complete, while CPython/PyPy implementations are not turing complete, because they will always implicitly run up against C's memory limitations, therefore they do have a hard limit. Python itself as a language is turing complete because it does not place realistic limitations on the user like C does?

layer8

11 months ago

[-]

A C program which doesn’t inspect its pointer representations can conceivably use unlimited memory. For pointer values whose representation can be statically determined to be uninspected by the program, the C implementation can use memory locations outside of the representable address space, and thus make unlimited memory accessible. Functionally there is no need for self-contained C programs to ever inspect their pointer representation.

11 months ago

[-]

The difference is very small though, you could say that WIDTH amd ITER must be defined in the execution enviroment (shell)before execution and that the rest of the code is the program, and we are at the same situation as in C.

11 months ago

[-]

If you define WIDTH and ITER prior to execution you are just giving arguments to your program.

Maybe using the term "environment" was not the best choice; what I mean is that WIDTH and ITER are program variables that impact program behavior and output (appear in regexes etc.) whereas (most) C programs don't actually reference or depend on the pointer width (other than crashing if it's too small); it is an internal detail of the C compiler and underlying hardware that only happens to be visible to the programmer due to leaky abstractions. I don't think those are comparable.

horsawlarway

11 months ago

[-]

I mean... isn't this just `sizeof` in c?

Honestly, I'm struggling to think of any real world code base I've worked with in C that didn't care about pointer size.

tomsmeding

11 months ago

[-]

On the other hand, a C running on a machine with a significantly larger address space would have appropriately larger pointers. The C standard does not specify any particular pointer bitwidth. With these things together, C as a language has a decent claim to Turing-completeness.

11 months ago

[-]

Yes, but you still configure (choose a compiler) it to a fixed size before running, that is in my mind no different than specifying a fixed tape size, like in the find + mkdir example.

oecumena

11 months ago

[-]

For any C program there is a number N, that depends on the program, compiler, architecture, etc., but does not depend on the program input, such that the program won't be able to access more than N bits of state at any moment in any of its possible executions. Hence, the program is equivalent to a finite state automaton.

oecumena

11 months ago

[-]

C is definitely not Turing complete. The standard library provides no escape, because file sizes are also limited (due to ftell (3)), and there is no chdir in the C standard library, so the total number of files is also limited. I have a recollection of an attempt to construct a possible Turing-complete interpretation of the C standard involving recursion and va_arg, but I don't think it went anywhere.

11 months ago

[-]

C is indeed not Turing-complete for more or less this reason.

ori_b

11 months ago

[-]

Neither is the universe: we have (as far as we know) a limited number of matter and energy that can be converted to computation, which limits the size of an implementable system.

Real Turing completeness is necessarily theoretical.

layer8

11 months ago

[-]

We don’t know that at all. It’s a sensible assumption that the universe is infinite in size, and we have no indication to the contrary. The biggest impediments are the accelerating expansion of the universe (which however isn’t fully explained and thus may not be inevitable) and the heat death, which limits time for meaningful causal interaction.

11 months ago

[-]

Yes. C is not Turing-complete even in theory. Other languages are. It doesn't especially matter.

11 months ago

[-]

Thank you for your comment on my article. I think I've managed to fix the proof by implementing a tag system. Would you (anyone) mind reviewing my code [1][2]? The key point is using back references, which I think gave us capabilities beyond regular expressions to achieve Turing completeness. However, I'm not very familiar with tag systems, and I'm worried I might have missed something.

[1] https://onecompiler.com/bash/42mux2442 [2] https://onecompiler.com/bash/42mux3nf8

11 months ago

[-]

I am not an expert in either tag systems (or `find`), but this seems right. Using backreferences to handle copies sounds right (it does add expressive power to regular expressions but does not give Turing completeness, afaik).

I think your first example is missing the "any word of length < 2 is a halting word" condition but it is present in your second example.

11 months ago

[-]

Thank you! I've updated the article with a link to my comment.

uzerfcwn

11 months ago

[-]

The definition of tag systems says "add P(x) to the end" but doesn't define x.

I can't spot errors in the logic or code, though I'm not an expert either.

11 months ago

[-]

Thank you! I have fixed the sentence and some edge case handling.

thih9

11 months ago

[-]

Looks like you're treating Python as a spec; in this case I'd say we should treat mkdir+find as a spec too.

Programming language implementations often have some hard limits built in. E.g.: https://peps.python.org/pep-0611/

11 months ago

[-]

I'm treating mkdir+find as a spec. The values for WIDTH and ITER are in the program itself and will impact any implementation of mkdir and find you use, including theoretical ones.

thih9

11 months ago

[-]

Thanks for explaining. I missed that WIDTH and ITER are in the program.

cryptonector

11 months ago

[-]

I interpreted `-maxdepth` as a safety device.

golol

11 months ago

[-]

I think it's not that simple. It's always confusing to talk about Turing machines and the requirement of infinite memory vs. the reality of finite memory. I think "Turing completeness" is not so obvious to define rigorously and the way people use it does maybe not exactly capture the idea of "arbitrary computation" being possible. I'll try to clarify some things for myself and maybe others.

First of all, recall that a dynamical system is a set X with a map f: X -> X. The evolution of the system is given by the iterated application of f. A dynamical system is finite if the set X is finite.

I think it is useful to broaden this concept and define an IO-system as three sets X and I and O with a map f: I × X -> O × X. This means at every evolution step an "input" value i ∈ I has to be provided and an "output" value o ∈ O is obtained.

A Turing machine m consists of a finite alphabet A of symbols and a finite IO-system h: A × S -> O × S, where O = {move left, move right, print symbol a ∈ A}. This represents how the "head" of the Turing machine updates its internal state s ∈ S when reading a symbol from the alphabet I. We call this IO-system h the head of the Turing machine. You could specify the Turing machine with the data T = (A, S, O, h).

You now couple this Turing machine with another IO-system, which we call the "tape". It is either an infinite (N = ∞) tape or a finite, circular tape of length N. It has states X = {1, ..., N} × I × ... × I where the product I × ... × I has length N. It's set of inputs is the set O and its set of outputs is A. It's operation is given by a function t: O × X -> A × X, which describes the intended reaction of the type to the instructions from the head, i.e. depending on the instruction in O it either moves the "position counter" of the tape to the left, to the right, or it prints a symbol onto the tape. After it has performed this it reads the symbol at the current position and gives this output back to the head.

We can now combine the head h and the tape t into a "machine" dynamical system m: X × S × O -> X × S × O where h(x, s, o) = (t(o, x)_X, h(t(o, x)_A, s)_S, h(t(o, x)_A, s)_O). This represents the evolution of the Turing machine together with the tape. We call this the [machine dynamicals system with memory N of the Turing machine T].

Definition 1. Let's say that [the dynamical system f: X -> X simulates another dynamical system g: Y -> Y] if there exists an injective map u: Y -> X such that g(y) = f(u(y)). In order to compute the evolution g(g(...(g(y))...)) we can instead compute f(u(f(u(...(f(u(y))...)) and use injectivity of u to get back a result in Y.

Lemma 2. Any finite dynamical system is simulated by the machine dynamical system of some Turing machine with tape length N = 1. proof: Just set the head of the Turing machine to be the desired dynamical system and trivialize all the other objects.

This is a triviality result and tells us that this is not a good attempt to investigate universality of Turing machines in a "finite memory" setting.

False Hypothesis 3. There exists a universal Turing machine U in the sense that this Turing machine has the property that its machine dynamical system with infinite memory simulates the machine dynamical system with infinite memory of any other Turing machine T.

As far as I know this hypothesis is false because the sense of simulation mentioned above is far too strong. At this point I think there are many definitions one can make so let's stick with the one of Alan Turing.

Definition 4. We say that [the dynamical system f: X -> X simulates another dynamical system g: Y -> Y with respect to the "result" functions R: X -> {null, 0, 1} and Q: Y -> {null, 0, 1}] if there exists an injective map u: Y -> X such that the sequences Q(g^n(y)) and R((f ∘ u)^n(y)) are "result equivalent", meaning they are equal if you delete all instances of "null".

We now extend the concept of a Turing machine T by adding to it a result function r: O -> {null, 0, 1}.

Definition 5 (A. Turing, 1936). We say that [the Turing machine T with result function r: O -> {null, 0, 1} (N,M)-simulates another Turing machine T' with result function r': O' -> {null, 0, 1}] if the machine dynamical system of T with memory N simulates the machine dynamical system of T' with memory M, with respect to the result functions R: X × S × O -> {null, 0, 1} given by R(x, s, o) = r(o) and R': X' × S' × O' -> {null, 0, 1} given by R'(x, s, o) = r'(o).

Definition 6. We say that [a Turing machine U with result function r is (N,M)-universal] if it (N,M)-simulates any other Turing machine with result function.

Theorem 7 (A. Turing, 1936). There exists a (∞,∞)-universal Turing machine.

Definition 8. We say that [a Turing machine U with result function r is finite-weakly universal] if for any finite M there exists some finite N such that it (N,M) simulates any other Turing machine with result function.

Now it gets difficult becasue I don't actually know the answers anymore. I am pretty sure that any (∞,∞)-universal Turing machine is also finite-weakly universal. Even more so, it might be the case that finite-weak universality is equivalent to (∞,∞)-universality. Most certainly finite-weak universality is not a trivial concept and captures an interesting aspect of the concept of computation. I want to make the point that in my opinion infinite memory should not be seen as requirement in order to talk about these concepts of computation like Turing machines and universality.

It is also unclear how exactly to define the "Turing completeness" of a system, as I don't think there exists a definition of Turing completeness for dynamical systems. You have to specify how you are allowed to put an input into the dynamical system at least. I think that in some sense one could use what OP found and prove a rigorous result that with `find` + `mkdir` one can somehow construct a finite-weakly universal Turing machine.

upwardbound

11 months ago

[-]

I don't know this field very well so I might be misunderstanding, but I think this is different than "infinite tape" in Turing Machines. As I understand it, the proof of universality for Rule 110 required that the program code which is called the "production rules" be repeated infinitely on the tape even for a finite size program. https://en.wikipedia.org/wiki/Rule_110#:~:text=An%20infinite...

If you had a halting problem oracle to tell you how much runtime is needed to run a certain program to completion, you could get away with having only a finite number of repetitions of the "production rules", and simply pretending that they're infinitely repeated. This would only work for programs that halt.

If I understand correctly, any program that loops forever, if implemented within Rule 110 Cyclic Tags, requires infinite repetition of the production rules. I think this is a difference of Rule 110 vs Turing Machine tape. If I understand correctly, a Turing Machine with finite, even quite small, tape can loop forever. But a Rule 110 program must have infinitely sized tape to be able to loop forever.

Basically (if I understand correctly), Rule 110 Cyclic Tags essentially "consume" tape symbols as basically a non-renewable resource, like an electrical computer server powered by the burning of coal. Infinite runtime (looping forever) requires infinite repetition of the tape symbols (both the "production rules" and the "clock pulses" - see the Wiki page above). I believe this is unlike Turing Machines, which can loop forever without "consuming" any non-renewable resource.

To clearly state this again: Running a simple "while(true)" loop in a Turing Machine only needs finite tape, but requires infinite tape in Rule 110.

klyrs

11 months ago

[-]

  +[->+]

Turing machines can also eat tape infinitely. If they're allowed such an appetite, why would we forbid it for rule 110?

To be fair I've never been 100% sold on the Turing-completeness of rule 110, but your argument isn't landing with me either.

https://en.wikipedia.org/wiki/Turing_machine#:~:text=Cells%2...

upwardbound

11 months ago

[-]

The initial state of a Turing Machine tape is that beyond the finite-length input, all of the rest of the infinite tape is blank.

Whereas the Rule 110 Cyclic Tags engine requires the infinite tape to contain infinite repetitions of structured patterns, even in order to simply run "while(true)". That's a key difference.

I also agree with zamadatix's sibling comment.

klyrs

11 months ago

[-]

Oh, I see what you mean. That sounds quite easy to work around, actually. Due to the speed of light, only a section containing the disruptions to the repeated pattern need be considered for the initial state; and then you can compute outward from that.

zamadatix

11 months ago

[-]

"Allowed" is probably covering too wide a meaning in your description. Just because something is capable of defining infinite consumption does not mean it was allowed to do so in the proof.

adrianN

11 months ago

[-]

C is maybe technically not Turing compete either: https://cs.stackexchange.com/questions/60965/is-c-actually-t...

FartyMcFarter

11 months ago

[-]

According to the second answer, C99 is Turing complete.

safeimp

11 months ago

[-]

The author has since updated their post:

> The proof is flawed and I retract the claim that I proved that find + mkdir is Turing complete. See https://news.ycombinator.com/item?id=41117141. I will update the article if I could fix the proof.

indigo0086

11 months ago

[-]

Retracted >The proof is flawed and I retract the claim that I proved that find + mkdir is Turing complete

aabhay

11 months ago

[-]

I thought that this was going to use some interesting form of lambda calculus but instead it simply relies on the regex parser of find to compute things.

userbinator

11 months ago

[-]

Not the first time someone has coerced a regex into doing some nontrivial computation; here's a memorable example:

http://realgl.blogspot.com/2013/08/battlecode.html (scroll down to "Regular Expression Pathfinding"

niederman

11 months ago

[-]

I suspect this proof could be greatly simplified by the use of tag systems (https://en.m.wikipedia.org/wiki/Tag_system) rather than cellular automata.

usr1106

11 months ago

[-]

Of course no implementation is infinite, but in this case PATH_MAX with 4096 as a typical value seems particularly low.

Bootvis

11 months ago

[-]

Check out section “Expected questions and answers”. For GNU it seems to work with path lengths larger than 4096.

Arch-TK

11 months ago

[-]

The blog post addresses this by using relative paths. Tested up to a path length of 30k apparently.

https://github.com/llvm/llvm-project/issues/43825

omoikane

11 months ago

[-]

I believe somewhere after 30k is when they will run into file system limits. Despite being able to create directories nested to arbitrary depth with short file names, find() might not be able to read a directory if the total path length is too long.

I found this out when I couldn't open a file with the path "./././name.h" where there are lots of "./" in front. And the reason why I got so many "./" was due to a clang preprocessor bug that modifies __FILE__:

peter_d_sherman

11 months ago

[-]

Observation: Any piece of software/service or piece of software/service used in a software/service chain which implements and/or consumes Regular Expressions (aka RE's, RegExp's) -- is potentially Turing Complete, and should be audited for Turing completeness if security in that context is a concern...

acchow

11 months ago

[-]

Speaking strictly, the original definition of Regex required only a finite state machine with zero stacks.

You need 2 stacks for Turing completeness.

Tho a lot of regex libraries can support much more than just “regex”

https://en.wikipedia.org/wiki/Rule_110

peter_d_sherman

11 months ago

[-]

>"You need 2 stacks for Turing completeness."

I am not completely sure about that assertion...

We know that Rule 110 is Turing complete:

>"Rule 110 with a particular repeating background pattern is known to be Turing complete.[2]"

So if Rule 110 = Turing completeness, then we could either prove Turing completeness by proving Turing completeness OR we could prove Turing completeness by proving Rule 110 equivalence...

Next we have Markov algorithms:

https://en.wikipedia.org/wiki/Markov_algorithm

>"a Markov algorithm is a string rewriting system that uses grammar-like rules to operate on strings of symbols.

Markov algorithms have been shown to be Turing-complete

, which means that they are suitable as a general model of computation and can represent any mathematical expression from its simple notation."

(Note that Markov algorithms do not use stacks (nor do Turing machines, nor does Rule 110, nor do stackless "Turing tarpit" esoteric languages, nor does Langton's Ant or other Turing complete cellular automata).)

RegExp's are basically a "string rewriting system that uses grammar-like rules to operate on strings of symbols"

So if a such a string rewriting system used in conjunction with a Regular Expression functionality can be proved to be a Markov algorithm, then we have automatic proof that it is also Turing complete, with no need for stacks!

Why not read the following:

Simplest Turing-complete ruleset for Markov algorithm

https://cs.stackexchange.com/questions/44717/simplest-turing...

And possibly this:

https://esolangs.org/wiki/Nopfunge

>"Nopfunge is a fungeoid designed by Hubert Lamontagne in 2015. It is a two-dimensional esoteric programming language based on a severely restricted subset of the well known Befunge language. Its goal is to show that having access to a sufficiently flexible program geometry is indeed the only thing that is needed to achieve Turing completeness."

[...]

>"The ONLY valid commands in Nopfunge are the PC direction change commands < > v ^ and empty space (which are the same as in Befunge). This means that Nopfunge has no stack, no numbers and no conditionals: there are

NO stack manipulation commands

and NO commands to store or retrieve data from the program grid. There are no variables or data storage or functions or objects of any kind. The ONLY thing that ever happens in Nopfunge is PC movement.

In spite of this, Nopfunge is Turing complete."

Point is: If it were me, and I were designing a system, then I'd be highly careful (perhaps "circumspect" is a better word) about code that implements or evaluates, produces or consumes Regular Expressions (or implements any text rewrite rules for that matter!) if the system which that code was to be part of, was intended to be as secure as possible...

11 months ago

[-]

> >"You need 2 stacks for Turing completeness."

> I am not completely sure about that assertion...

What GP means is that a finite state machine is not Turing-complete, and neither is a finite state machine with a single stack (pushdown automaton / stack automation).

klyrs

11 months ago

[-]

Piet is another near-exception to this -- the language only has a single "stack" but the "stack" is equipped with a 'roll' operation that cannot be implemented with a proper stack and O(1) memory.

anonymoushn

11 months ago

[-]

> do not use stacks (nor do Turing machines

A Turing machine has two stacks. They're the part of the tape to the left of the head and the part of the tape to the right of the head.

The other Turing complete systems described use arbitrarily large amounts of storage that are addressed more often, and for example Langton's Ant uses a two-dimensional tape, which is "not two stacks" in the sense that it is more complex than two stacks.

lupire

11 months ago

[-]

More complex how? Both are abstract and can simulate each other. The difference in complexity would wend into practicality arguments that don't apply in Turing world.

Turing machine uses a tape, which is equivalent to two stacks, and also equivalent to a 2-dimensional tape (Farey Sequence), and (I guess, per previous comment) equivalent to Rule 110.

The word "Equivalent" always carries some load, though. For example, the Langton Ant tape and Rule 110 use a more interesting version of "blank" initial state, similar to the "send a message by flipping a coin on a chess board with an arbitrarily flipped coin on each square" puzzle.

nibbula

11 months ago

[-]

My find is not only provably Turing complete, but not even a Turing tarpit and compiles to native code.

behnamoh

11 months ago

[-]

What's the implication of this for users of these commands?

langcss

11 months ago

[-]

Not much practically other than watch out for non-terminating find queries.

PokestarFan

11 months ago

[-]

I mean I had never thought about the case where the command that find executes creates subfolders of the directory it's executed in.

zmmmmm

11 months ago

[-]

well, it also supports `-exec` so it would imply rather a big problem with the host OS if that wasn't turing complete

kachnuv_ocasek

11 months ago

[-]

Why would an OS need to be Turing complete?

eru

11 months ago

[-]

Yes, that wouldn't necessarily be a problem.

Of course, getting a computer that's useful in practice out of this would require some thought.

A simple model: you could only allow programs written in Coq (or similar), ie progams that come with a proof of termination (or a slight generalisation, that allows for infinite event loops, as long as each run threw the loop behaves well, in some sense).

There's a trivial escape hatch, where you just take your normal unproven program but forcefully terminate it after 2^64 steps. That's strictly speaking not Turing complete, but you wouldn't be able to tell the difference during the lifetime of the computer.

XorNot

11 months ago

[-]

Neat now we just need a compiler for a scripting language...

anthk

11 months ago

[-]

Check awka for posix awk.

Zambyte

11 months ago

[-]

They probably meant one that targets the environment described in the post :-)

adrian_b

11 months ago

[-]

I have found interesting this in the parent article:

"The proof leverages a common technique: showing the system can execute Rule 110."

because I was not aware about "Rule 110".

Nevertheless, reading the Wikipedia page about "Rule 110", I find it astonishing that "Rule 110" not only has been the subject of a research paper, but that paper has been even the ground for a legal affair based on a non-disclosure agreement with Wolfram Research, which has blocked the publication of the paper for several years.

The demonstration that "Rule 110" is capable of universal computation is completely trivial and it requires no more than a sentence. It cannot be the subject of a research paper of the last decades.

There are several known pairs of functions that are sufficient for computing any Boolean functions, for example AND and NOT, OR and NOT, OR and XOR, AND and XOR. The last pair is a.k.a. multiplication and addition modulo 2.

Whenever there is a domain where all the possible functions can be expressed as combinations of a finite set of primitives, it is also possible to express all the members of the finite set of primitives by using a single primitive function that combines all the other primitives in such a way that composing that function with itself in various ways can separate each of the original primitives from the compound primitive.

Applying this concept to Boolean functions it is possible to obtain various choices for a single primitive function that can generate all Boolean functions, for instance NAND, which combines NOT and AND or NOR, which combines NOT and OR.

In general all the ado about how various kinds of computational domains can be reduced to a single primitive function is not warranted and it is not interesting at all. The reason is that such combined primitives do not change in any way the actual number of primitives. They just replace N distinct simple primitives with 1 compound primitive that must be used in N distinct ways. This does not change in any way the complexity of the domain and it does not make it easier to understand in any way.

"Rule 110" is just another banal example of this technique. Like NAND combines NOT and AND in a separable way, "Rule 110" combines multiplication and addition modulo 2, a.k.a. AND and XOR, in a separable way. Therefore it can express any Boolean function, therefore, by encoding, also any computable function.

There is absolutely no advantage in showing that some system can compute "Rule 110". It is simpler and clearer to show that it can compute AND and XOR, or AND and NOT.

eigenket

11 months ago

[-]

As far as I can tell from your comment you have the terms "functional complete" and "Turing complete" confused. These are emphatically not the same thing.

A circuit of (e.g.) NAND gates defines a mathematical function over a fixed, finite number of variables (the number of input wires to your circuit) and with a fixed number of outputs (likewise the output wires).

A Turing complete computer accepts inputs which are unbounded in length, I.e. it accepts an input of at least length n for any natural number n. It can also output unbounded strings.

These two are fundamentally completely different. Functional completeness for a set of gates doesn't tell you much about Turing completeness. For all of the interesting stuff to do with Turing machines you need this unbounded input size so you can do things like consider descriptions of other Turing machines as inputs to your Turing machine.

Essentially what you need is something equivalent to looping or recursion. Note that the Halting problem is completely trivial for NAND circuits, exactly because there is no looping.

adrian_b

11 months ago

[-]

Perhaps you have not read TFA.

Of course "functional complete" is not a sufficient condition for being Turing complete (because a Turing machine is not reduced to its arithmetic-logic unit, which must be functionally complete, but it is a complete automaton with memories).

However, "functional complete" is a necessary condition for being Turing complete.

Most proofs of Turing completeness do not go as low as the Boolean functions, but they suppose the availability of higher level functions that ensure the functional completeness, i.e. incrementing, decrementing and testing if a value is zero (which in real hardware must be implemented by using Boolean functions).

My comment was based on the line that I have quoted from the parent article, which was one of its first lines.

Moreover, a Turing machine with infinite memory has a fundamental difference only in theory, i.e. in the set of problems that can be solved with it, in comparison with a machine having an identical structure, but finite memory.

For practical purposes, the difference between a machine that is identical with a Turing machine, except by having a finite memory, and other simpler machines, like an automaton with one stack or a finite-state automaton, is much more fundamental.

Because an infinite memory is irrealizable, all the proofs that some real system is "Turing complete" are proofs that the system is equivalent with a Turing machine whose infinite memory is replaced by a finite memory, which is actually the only kind of computing machine that can be made.

So in such a context, any discussion about the infinite memory of a Turing machine is pointless.

eigenket

11 months ago

[-]

Specifically I was responding to this

>There is absolutely no advantage in showing that some system can compute "Rule 110". It is simpler and clearer to show that it can compute AND and XOR, or AND and NOT.

Which is explicitly wrong. You need extra stuff (roughly) equivalent to looping as well as the ability to interact with an unbounded inputs (110 does this by emulating a tag system). Fixed-width boolean circuits implement AND and XOR, but they are not Turing complete.

Showing a system can implement rule 110 is a lot stronger than showing it can implement AND and XOR or AND and NOT. You can even have a system with a single unbounded stack and access to those functions and it still won't be able to implement rule 110 (it will be a pushdown automaton).

adrian_b

11 months ago

[-]

That sentence did not contain any reference to Turing machines.

"Rule 110" by itself is just a Boolean function, which, as I have mentioned is completely equivalent with the pair AND + XOR.

By using a suitably initialized memory and an automaton that besides other features that are needed to address the memory (a.k.a. "move the tape", when the memory is modeled as a tape or shift register) is able to compute "Rule 110", it is possible to build the equivalent of a Turing machine.

My point is that using "Rule 110" does not bring any simplification or any other advantage instead of just using the pair AND + XOR. The machine using "Rule 110" and which is equivalent with a Turing machine is completely equivalent with an otherwise identical machine, except that the "Rule 110" function is replaced by the AND + XOR pair. The only effect of "Rule 110" is to make the description of the machine more complicated and more obscure and if that machine were implemented in real hardware or software it would be more inefficient, due to redundant gates or computations.

Even using the equivalent machine with AND + XOR does not bring any advantage instead of the traditional definition of a Turing machine, which uses higher-level functions, whose implementation details are not relevant for proving Turing completeness.

eigenket

11 months ago

[-]

Rule 110 is not a simple boolean function. It's a cellular automata. The boolean function is part of its description but not the whole thing.

For example if you take the standard rule 110, but run it with a different background pattern (for example the one where every cell is by default in state 0) it isn't Turing complete any more.

I suggest you take a look at the proof that 110 is Turing complete (pdf here http://www.complex-systems.com/pdf/15-1-1.pdf). It doesn't just follow from elementary properties of the boolean gates AND and XOR.

lupire

11 months ago

[-]

To be fair, it's pretty nasty that the Rules are named ambiguously, where a critical part of the "Rule+" of interest is something (the background pattern) in the CA system but outside the Rule system. It's fixable, but still gross, and plays into Wolfram's style of making things seem more profound by hiding the ball.