FilterHN

6 months ago

[-]

Most explanations of C++'s std::move fail because they don't focus on its actual effect: controlling function overloading.

Most developers have no trouble getting the idea of C++'s function overloading for parameter types that are totally different, e.g. it's clear what foo("xyz") will call if you have:

   void foo(int x);
   void foo(std::string x);

It's also not too hard to get the idea with const and mutable references:

   void foo(std::string& x);
   void foo(const std::string& x);

Rvalue references allow another possibility:

   void foo(std::string&& x);
   void foo(const std::string& x);

(Technically it's also possible to overload with rvalue and non-const regular references, or even all three, but this is rarely done in practice).

In this pairing, the first option would be chosen for a temporary object (e.g. foo(std::string("xyz")) or just foo("xyz")), while the second would be chosen if passing in a named variable (std::string x; foo(x)). In practice, the reason you bother to do this is so the the first overload can pilfer memory resources from its argument (whereas, presumably, the second will need to do a copy).

The point of std::move() is to choose the first overload. This has the consequence that its argument will probably end up being modified (by foo()) even though std::move() itself does not contain any substantial code.

All of the above applies to constructors, since they are functions and they can also be overloaded. Therefore, the following function is very similar in most practical situations since std::string has overloaded copy and move constructors:

   void foo(std::string x);

rocqua

6 months ago

[-]

To clarify, you are saying the point of std::move is that it returns an rvalue reference, allowing the called function to pick the overload variant that is allowed to trample and destroy it's argument?

Specifically, what you did not make clear is the return type of std::move.

6 months ago

[-]

Yes that's exactly right.

ryanianian

6 months ago

[-]

std::move is just a cast operation. A better name might be std::cast_as_rvalue to force the overload that allows it to forward to move constructors/etc that intentionally "destroy" the argument (leave it in a moved-from state).

tialaramex

6 months ago

[-]

They don't destroy the argument - this is of course a big problem because the semantic programmers actually wanted (even when C++ 98 didn't have move and papers were proposing this new feature) was what C++ programmers now call "destructive move" ie the move Rust has. This is sometimes now portrayed as some sort of modern idea, but it actually was clearly what everybody wanted 15-20 years ago, it's just that C++ didn't deliver that.

What they go was this awful compromise, it's not destroyed, C++ promises that it will only finally be destroyed when the scope ends, and always then, so instead some "hollowed out" state is created which is some state (usually unspecified but predictable) in which it is safe to destroy it.

Creating the "hollowed out" new state for the moved-from object so that it can later be destroyed is not zero work, it's usually trivial, but given that we're not gaining any benefit by doing this work it's pure waste.

This constitutes one of several unavoidable performance leaks in modern C++. They're not huge, but they're a problem when you still have people who mistake C++ for a performance language rather than a language like COBOL focused intently on compatibility with piles of archaic legacy code.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n13...

6 months ago

[-]

Thanks for pointing this out. It's an absolute myth that C++ move semantics are due to backwards compatibility. The original paper on move semantics dating back 2002 explicitly mentions destructive move semantics by name:

It does bring up an issue involving how to handle destructive moves in a class hierarchy, and while that's an issue, it's a local issue that would need careful consideration only in a few corner cases as opposed to the move semantics we have today which sprinkle the potential for misuse all over the codebase.

uvas_pasas_per

6 months ago

[-]

I started a new project recently and chose C++ because I wanted cross platform, and a language that let me write the highest performance code I could imagine. C is so lacking in abstractions, I don't think I can deal with it. But C++ is such a pain, I keep looking at Rust and feeling temptation. I'm doing some number crunching, and geometric algorithms, among other things. Not sure if Rust is as good as C++ there.

tialaramex

6 months ago

[-]

I'm the wrong person to ask probably because for me Rust seemed like home almost immediately and that's not most people's reaction.

The brute optimisation for Rust is being done by LLVM, just like if you used Clang to compile C++, so your pure number crunching ought to be fine. If anything you may find it's easier to end up correctly expressing the thing you meant with good performance in Rust. If you rely on a C++ library of geometric algorithms, clearly "I can't find an equivalent in Rust" would be a showstopper and so it's worth stopping past crates.io to try a few searches for whatever keywords are in your head

Also, if you know that learning new stuff fogs up your process, you might not want to try to both learn Rust and work on this novel project simultaneously. Some people thrive pairing learning a language with a new project, others hate that and would rather pick, either do something old in a new language, or do something new in an existing one.

If you decide this isn't the right time but keep feeling a twinge, I encourage you to try it for something else, not everybody is going to like Rust, but it's a rare C++ programmer who spends serious time learning Rust and then decides there was nothing they valued from the experience -- particularly if you have no experience in an ML (F# or Ocaml are modern examples)

uvas_pasas_per

6 months ago

[-]

Thanks. I've learned a lot of languages and enjoy doing it, especially when much of it is a step up, so not a problem there. I may need to just dive in and try it out on a larger project. It was only after doing that with C++ where I really understood what I liked and what I didn't. A lot of the latter is the tooling/IDEs, which doesn't show up reading about the language. One thing I'm not sure about with Rust is porting a UI class hierarchy from C++. Base class `View`, sub classes `Button`, `VStack`, `TextField`, etc. I see how to replace virtual functions with a trait and impls for the various types. But for stuff (fields or methods) shared in the base class, this looks like one area where Rust is uglier than C++.

shortrounddev2

6 months ago

[-]

You can trample and destroy a regular lvalue reference as well. The point of casting to an rvalue reference (and invoking the rvalue reference constructor) is to copy an pointer to the underlying data of one container to a new container and then delete the pointer on the original container (set it to null, not destroy the data). This has the effect of transferring ownership of the underlying data from one container to the other. You can do this with an lvalue reference as well, but the semantics are different.

This is useful for copying the data of a temporary string to another string without actually copying each byte of the data. Since the underlying characters live in the heap, there's no point in copying each byte to a new area in the heap. Instead, use move semantics to transfer ownership of the pointer to a new string container

bluescarni

6 months ago

[-]

> So apparently, move does not prevent generation of a copy, but the empty string instead of expected text “Dave” is very interesting. Apparently, after termination of show after the move, the object is invalidated. This does not affect the Person object, but only the string object. Recognize that I speak about a factual behavior on the hardware. I think we have undefined behavior here. And no compilation error.

There is a lot of wrong in this paragraph:

- a "copy" was not generated, at least not in the sense that the actual content of the string was copied anywhere;

- there's no undefined behaviour here and no invalidation of the string. Standard library types are required to be left in an unspecified but valid state after move. "Valid" here means that you can go on and inspect the state of the string after move, so you can query whether it is empty or not, count the number of characters, etc. etc. "Unspecified" means that the implementation gets to decide what is the status of the string after move. For long enough strings, typical implementation strategy is to set the moved-from string in an empty state.

flohofwoe

6 months ago

[-]

> at least not in the sense that the actual content of the string was copied anywhere

...unless it's a short string within the limits of the small-string-optimization capacity.

I think what confuses many people is that a C++ move assignment still can copy a significant amount of bytes since it's just a flat copy plus 'giving up' ownership of dangling data in the source object.

For a POD struct, 'move assignment' and 'copy assignment' are identical in terms of cost.

nemetroid

6 months ago

[-]

The same is true of Rust. I have no idea why the author decided to print addresses only for C++ and not for Rust.

  // (1)
  struct Person {
      name: String,
      age: u8,
  }
  
  fn show(person: Person) {
      println!("Person record is at address  {:p}", &person);
      println!("{} is {} years old", person.name, person.age);
  }
  
  fn main() {
      let p = Person { name: "Dave".to_string(), age: 42 }; // (2)
      println!("Person record is at address  {:p}", &p);
      show(p); // (3)
  }

Its output is:

  Person record is at address  0x7ffcfb2b4e40
  Person record is at address  0x7ffcfb2b4ec0
  Dave is 42 years old

6 months ago

[-]

I feel like that's a pedantic detail. True, yes, but irrelevant. You may as well also point out that the return address is going to be copied to the instruction pointer when the constructor returns.

6 months ago

[-]

It's a real semantic difference, not a pedantic detail: It means that there is a practical reason that the moved-from object could be non-empty.

A few standard library types do guarantee that the moved-from object is empty (e.g., the smart pointer types).

For some others (basically, all containers except string), it is not explicitly stated that this is the case but it is hard to imagine an implementation that doesn't (due to time complexity and iterator invalidation rules). Arguably, this represents a bigger risk than string'e behaviour, but it's still interesting.

6 months ago

[-]

>It's a real semantic difference, not a pedantic detail

What's the semantic difference? Of course moving a class will involve some amount of copying. How could it be any other way? If you have something like struct { int a[1000]; }, how are you supposed to move the contents of the struct without copying anything? What, you take a pair of really tiny scissors and cut a teeny tiny piece of the RAM, then glue the capacitors somewhere else?

Joker_vD

6 months ago

[-]

> how are you supposed to move the contents of the struct without copying anything?

By taking the physical page this one struct resides in, and mapping it into the virtual address space the second time. This approach is usually used in the kernel-level development, but there has been a lot of research done since the seventies on how to use it in runtimes for high-level programming languages.

Now, it does involve copying an address of this struct from one place to another, that I cede.

6 months ago

[-]

Sure. At the cost of needing >=4K per object, since otherwise "moving" an object involves also moving the other objects sharing the same page.

Asraelite

6 months ago

[-]

I think it's a worthwhile distinction to bring up because it highlights a common misconception people have about strings and vectors. A string value is not the string content itself, just a small struct containing a pointer and other metadata. If we're talking about the in-depth semantics of a language then it's important to point out that this struct is the string, and the array of UTF-8 characters it points to is not. C++ obfuscates this distinction because of how it automatically deep copies vectors and strings for you in many cases.

epcoa

6 months ago

[-]

> then it's important to point out that this struct is the string, and the array of UTF-8 characters it points to is not.

So then under this model, what’s the difference between a string and a string_view?

otabdeveloper4

6 months ago

[-]

> So then under this model, what’s the difference between a string and a string_view?

string_view doesn't do any deep copying.

Asraelite

6 months ago

[-]

...one is a string and one is a string view?

I'm not sure what you're getting at. They're both small structs holding pointers to char data, they just operate on that data differently.

6 months ago

[-]

Exactly, thinking about things in terms of their implementations is usually not a good way to actually understand what that thing is. By arguing that std::string is just the struct itself, which consists of who knows what... you fail to appreciate the actual semantics of std::string and how those semantics are really what defines the std::string.

std::string_view also has implementation details that in principle could be similar to std::string, it's a pointer with a size, but the semantics of std::string_view are very different from the semantics of std::string.

And that's the crux of the issue, it's better to understand classes in terms of their semantics, how they operate, rather than their implementations. Implementations can change, and two very separate things can have the same or very similar implementations.

A std::string is not just some pointers and some record keeping data; a std::string is best understood as a class used to own and manage a sequence of characters with the various operations that one would expect for such management. A std::string_view is non-owning, read-only variation of such a class that operates on an existing sequence of characters.

How these are implemented and their structural details is not really what's important, it's how someone is expected to use them and what can be done with them that counts.

Asraelite

6 months ago

[-]

My original comment was just saying that it's useful to point out to people that the concrete representation of a string in memory is a struct when relevant, since some people might not realize that. I'm not claiming anything about the best way to think about it overall.

> How these are implemented and their structural details is not really what's important

Usually this isn't important, unless you're talking about low level details impacting performance, which is exactly what the article is about.

epcoa

6 months ago

[-]

> Usually this isn't important, unless you're talking about low level details impacting performance,

And if you’re going down that path, the string may not have a pointer at all.

“A string value is not the string content itself”, but in most cases it is if the string is short enough, implementation dependent disclaimer and all that.

epcoa

6 months ago

[-]

That I think the description “the array is not the string” isn’t very elucidating for someone that doesn’t understand the nuance of the ownership/lifetime and move semantics (the topic of the article).

“C++ obfuscates this distinction because of how it automatically deep copies vectors and strings”

It does this because it has to, to guarantee its interface invariants. That “array” (if there is one) really is the string. Just because there might be an indirection doesn’t change that.

> they just operate on that data differently.

Well they operate on the memory “array” of the char data differently (well in the latter not at all).

Also a nitpick: std::string unlike String in Rust or other languages is not married to an encoding. And C++ managed to fuck that one up even more so recently.

jvanderbot

6 months ago

[-]

It should be, but it's very much not in the real world at least as far as I've seen.

Using std::move for anything other than "unique ownership without pointers" really messes things up. People put std::move everywhere expecting performance gains, just like we used to put "&" everywhere expecting performance gains. It's a bit of cargo cultism that can be nicely dispelled by realizing std::move is just std::copy with a compiler-defined constructor invocation potentially run to determine the old value. With that phrasing, it's hard to hallucinate performance gains that might come automatically.

6 months ago

[-]

> std::move is just std::copy with a compiler-defined constructor invocation potentially run to determine the old value

I have no idea what that means.

std::move is a cast to an rvalue reference. That can potentially trigger a specific overloaded function to be selected and possibly, ultimately, a move constructor or assignment operator to be called.

For an explicit move to be profitable, an expression would have otherwise chosen a copy constructor for a type with an expensive copy constructor and a cheap move constructor.

std::copy is a range algorithm, not sure what's the relevance.

jvanderbot

6 months ago

[-]

Yes, typed too fast. I meant the explicit copy constructor. Luckly, HN will hide my garbage text quickly enough. Thanks for the correction!

https://devblogs.microsoft.com/oldnewthing/20231124-00/?p=10...

colejohnson66

6 months ago

[-]

In fact, using std::move everywhere can actually make your performance worse!

jvanderbot

6 months ago

[-]

The real gem of the article is the interlude. E.g., reaching back to C days and pointing out that "It's either copy, or pointer". Once someone has that mental model solidly in hand, all the syntax sugar in the world cannot harm you.

Also "It was an ergonomic advancement." hides a lot of the overwrought syntax sugar in C++ that causes it to be such a weird language if you come from elsewhere. But still an excellent insight into the state of affairs.

I think the "Apparently" language makes it seem like this is some kind of accident that nobody would know about, when really the author was probably just being a creative writer, and the example was fundamental to the post.

6 months ago

[-]

You can think of a c++ move as a shallow copy that takes ownership of all objects originally owned by the source.

mort96

6 months ago

[-]

I mean it'll copy 3 pointers worth of data in all cases. It's just that for short strings, those 3 pointers worth of data contains the text of the string.

bluGill

6 months ago

[-]

there is a lot wrong but your analisys misses the elephant: the function takes a copy and so a copy must be generated. std::move will move if possible but in this case move isn't possible and so a copy will be made.

Move is allowed to not move because in generic code you don't want to have to check for if move is possible for the type in question.

GrantMoyer

6 months ago

[-]

In the case of the example, there is a move, and std::move works in the example.

The function, show, doesn't take a copy, it takes a Person object. Persons can be copy constructed or move constructed (both constructors are implicit, since there's no user-defined constructors). std::move returns an r-value reference to main's p, so Person's implicit move constructor is called, and show's p argument is move constructed from main's p. The reported address changes because moving creates a new object in C++, but the moved-to object may take ownership of the heap allocated memory and other resources from the moved-from object.

In this case, the moved-to Person takes ownership of the heap allocation from the moved-from Person's string member and sets the moved-from Person's string member to an empty string. Without std::move, show's p is copy constructed, including its string member.

littlestymaar

6 months ago

[-]

C++ making the most inscrutable semantic possible, speedrun any %.

virtualritz

6 months ago

[-]

> "Unspecified" means that the implementation gets to decide what is the status of the string after move. For long enough strings, typical implementation strategy is to set the moved-from string in an empty state.

Thusly, what happens in code that accesses the string after the move is UB.

In the implementation of C++ the article uses the string was just empty. But for all we know it may still contain a 1:1 copy of the original or 20 copies or a gobbledygook of bytes.

Any code that relies on the string being something (even empty) may behave different if it isn't. That's the very definition of UB.

"A typical implementation strategy" is meaningless for someone writing code against a language specification.

You're then writing code against a specific compiler/std lib and that's fine. But let's be honest about it.

UncleMeat

6 months ago

[-]

That's not what UB means. "This will behave differently on different implementations" is implementation defined behavior. Compilers are not allowed to assume that implementation defined behavior never occurs or reject your program if they can prove that it happens.

Undefined behavior is a stronger statement and says that if the behavior occurs then the entire program is simply not valid. This allows the compiler to make vastly more aggressive changes to your program.

6 months ago

[-]

There is nothing in the standard or definition of C++ that states that undefined behavior renders a program invalid.

On the contrary the actual C++ standard explicitly states that permissible undefined behavior includes, and I quote "behaving during translation or program execution in a documented manner characteristic of the environment".

It's also worth noting that numerous well known and used C++ libraries explicitly make use of undefined behavior, including boost, Folly, Qt. Furthermore, as weird and ironic as this sounds, implementing cryptographic libraries is not possible without undefined behavior.

6 months ago

[-]

"valid program" is not really a term that is used in the standard (I only count one normative usage). What the standard does say is:

"A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)."

I.e. a program the contains UB is undefined.

Of course, as you observer, an implementation can go beyond the standard and extend the abstract machine to give defined semantics to those undefined operations.

That's still different from implementation defined behaviour, where a conforming implementation must give defined semantics.

bluescarni

6 months ago

[-]

> Thusly, what happens in code that accesses the string after the move is UB.

No, it is implementation-defined behaviour.

> In the implementation of C++ the article uses the string was just empty. But for all we know it may still contain a 1:1 copy of the original or 20 copies or a gobbledygook of bytes.

Yes, and if you want to make sure that the string is empty before you do something else with it, you just use a clear() (which will be optimised away by the compiler anyway).

Or, if you prefer, you can assign another string to it, or anything else really.

> Any code that relies on the string being something (even empty) may behave different if it isn't. That's the very definition of UB.

No it is not.

> "A typical implementation strategy" is meaningless for someone writing code against a language specification.

Then don't rely on that specific implementation detail and make sure that the string is in the state you want or, even better, don't touch the moved-from string ever again.

nayuki

6 months ago

[-]

Some basic things in the article appear to be factually wrong.

> Then we ask us the following questions:

> 1. When we passed Dave to show, did we create a copy?

> 2. If so, how do we avoid creating a copy?

> C++ example

> 1. Yes. You can insert cout << "Person record is at address " << &p << endl; before the call of show as well as the beginning of show. This reveals different memory addresses of the record.

Judging copies by the object's address is incorrect methodology. In both C++ and Rust, "moving" an object will still copy the struct fields, but will avoid copying any of the pointees (such as the variable-size array that the string owns).

> 2. Replace void show(Person person) with void show(Person& person). So only the function needs to change. The caller does not have to adapt to it.

Passing by reference is a different concept to moving. While the author used this approach for C++, they did not use the same approach for Rust. This is comparing apples to oranges.

ajross

6 months ago

[-]

> In both C++ and Rust, "moving" an object will still copy the struct fields, but

Most people consider a shallow copy a "copy", certainly a shallow copy isn't a "reference"! One of the big problems in this space is in fact the divergence of terminology that leads to arguments like this.

The introduction of move semantics to C++ was a terrible, terrible mistake; not because it doesn't solve a real problem but because the language is objectively much worse now as a routine tool for general developers. People used to hack on code to implement features, now they get confused over and argue about how many "&" characters they need in a function signature.

It was a problem that was best left unsolved, basically.

webnrrd2k

6 months ago

[-]

Re: "problem that was best left unsolved"

This is a good example of a hard-won life lesson... There might be a solution to a problem, but the solution is worse than the original problem. I semi-jokingly call this "the healing power of apathy". The reality of it is that, sometimes, there are problems in life where benign neglect is the best response.

otabdeveloper4

6 months ago

[-]

> now they get confused

Sounds like a skill issue. Maybe they should go shopping.

Jokes aside though, yeah, move semantics is taught bad. Once you start using it (say, with a unique_ptr in a container) it will quickly start making sense.

bluetomcat

6 months ago

[-]

This is a shallow understanding of C++. It happens because the Person object is a POD type that doesn't define a move constructor, and the compiler creates a default one that calls the move constructors of the members. The string member has a well-defined move constructor, but the primitive uint8_t type doesn't.

flohofwoe

6 months ago

[-]

A move constructor/operator for POD or primitive types doesn't make any sense in the first place though (also AFAIK an object that contains a std::string - like Person - is definitely not a POD?). Even if Person had a manually provided move-constructor and move-assignment-operator, a move would still perform a flat copy from the source to the destination object.

6 months ago

[-]

Correct on all accounts. It is definitely not a POD nor a standard layout type (the modern version of POD).

mort96

6 months ago

[-]

Person has an implicitly generated constructor and destructor which calls std::string's constructor and destructor. It's non-POD.

bluetomcat

6 months ago

[-]

> It's non-POD.

For a stricter definition of POD which requires that byte-by-byte copies are possible. More informally, it's a POD because it only defines members and all the constructors and destructors are implicitly generated.

jcranmer

6 months ago

[-]

The historical notion of POD is that it's a class type that has no C++ shenanigans going on, and thus works like it does in C. As a result, while there are a few slightly different definitions of POD, all of them share the commonality that having a non-POD member makes the class non-POD; in other words, POD-ness has a recursive quality.

It doesn't make a lot of sense to not have this recursive quality to POD-ness, because the fact that C++ shenanigans are involved doesn't go away just because it's implicitly handled for you by the compiler.

flohofwoe

6 months ago

[-]

I've never seen this definition of 'POD' tbh, 'Plain Old Data' kinda implies that it behaves the same as a C struct when copying and destructing (e.g. the compiler is able to use a memcpy for copying, and destruction is a no-op - both is not the case when there's an embedded std::string object).

[1] https://en.cppreference.com/w/cpp/language/classes#POD_class

mort96

6 months ago

[-]

I haven't heard your personal informal definition of POD before. I've only concerned myself with the standard's definition of POD. If you were using a different definition of POD than the standard, you should have specified that. Or better yet, not used the term "POD", since it is widely understood to mean what the standard refers to as "POD".

EDIT: It seems I've had a slightly incorrect impression of "POD": what makes 'Person' non-POD isn't that it has an implicitly defined constructor but simply that it contains a non-POD type. The requirements for POD classes[1] includes "has no non-static data members of type non-POD class (or array of such types)". std::string is certainly a non-POD class, which makes discussion about Person's constructors and destructors moot. Not that it changes anything, but I don't wanna spread misinformation.

elteto

6 months ago

[-]

POD means you can memcpy without incurring undefined behavior, same as you would in C to copy a struct.

6 months ago

[-]

you are probably confusing POD with aggregate.

6 months ago

[-]

Q: What's "Hylo"? Should I have heard of it?

A: It's a niche programming language the author is involved with. It's not widely-used enough to get its own Wikipedia page. It used to be called "Val". See: https://www.hylo-lang.org/

https://www.youtube.com/watch?v=5lecIqUhEl4

amaurose

6 months ago

[-]

Its the brain child of Dave Abrahams, who is rather big in C++.

Gualdrapo

6 months ago

[-]

Maybe it's just me, but am no fan of they using the keyword ´fun´ to define a function. Nor Rust's ´fn´.

Also is it a bit strange they wrote "rust" along all the article instead of "Rust"?

6 months ago

[-]

Well, you're no fun :-(

Anyway, that's pure bikeshedding. "function" is a full word in English, but almost 3x the length.

consp

6 months ago

[-]

And thus 3x more readable than fn. And otherwise it's bkshd for a 150% reduction.

zozbot234

6 months ago

[-]

To be fair, "fun" is also a full word in English. Also, it's just plain fun.

diggan

6 months ago

[-]

Personally I prefer `defn` for defining functions. `fn` is just a function that hasn't been declared or defined, obviously.

6 months ago

[-]

>Apparently, after termination of show after the move, the object is invalidated. This does not affect the Person object, but only the string object. Recognize that I speak about a factual behavior on the hardware. I think we have undefined behavior here. And no compilation error.

The std::string is not invalidated, it's reset to its empty state (i.e. null pointer and zero length). Standard classes are all in defined, valid states after being moved, such that using them again is safe. User-defined classes may be coded to be left in either valid or invalid states after being moved. It's the responsibility of the programmer to decide which is appropriate according to the situation. There are valid reasons to want to reuse a moved object. For example, you might want to force the release an object's internal memory:

std::string() = std::move(s);

It's somewhat unfortunate that there's no way to signal to the compiler than an object is not safe for reuse, though.

6 months ago

[-]

> >Apparently, after termination of show after the move, the object is invalidated. This does not affect the Person object, but only the string object. Recognize that I speak about a factual behavior on the hardware. I think we have undefined behavior here. And no compilation error.

You're right to pick up on this. The author of the article is confused here, or at least using incorrect terminology. There's certainly no "undefined behaviour" going on.

But your corrections aren't quite right either, or at least use slightly odd definitions.

> User-defined classes may be coded to be left in either valid or invalid states after being moved.

No, even user defined classes have to be valid after a move, because their destructor will still be run. If you had your own vector-like class that points to invalid memory (or the same memory as the moved-to object) then you will get corruption when its destructor tries to free that memory.

Ok, it's true that you could manually define an "invalid" state in your class, perhaps by adding an internal Boolean flag which you set when the object is moved from. Then you could throw an exception or abort or whatever when any method (except the destructor) is called with this flag set. But you'd have to go out of your way to do this and I've never seen it done. I don't think this is what most people would understand your statement to mean.

> The std::string is not invalidated, it's reset to its empty state (i.e. null pointer and zero length).

I'm not sure whether you're implying this is a strict requirement or just happens to be what happened in this case. In fact, the standard does not require this: the string could be left in any (valid, whatever that means) state. It could be empty, unchanged, or anything else. As other comments have noted, if the string's length is below the short string optimisation threshold then it's quite likely the original string will retain its value unchanged. Only a few specific types in the standard library have the guarantee that they will be empty after being moved from, and string isn't one of those.

6 months ago

[-]

>No, even user defined classes have to be valid after a move, because their destructor will still be run.

By "valid" I mean that you can use the object like normal; being able to destruct the object is not enough. If the destructor is unsafe to run (for example because the object ends up owning a dangling pointer) you just have an outright bug. An invalid state would be one where any further use of the object (other than destroying it) is an error.

>I'm not sure whether you're implying this is a strict requirement or just happens to be what happened in this case.

Yes, I'm saying that's what happened in that case. The string was not invalidated, it was reset.

stonemetal12

6 months ago

[-]

>No, even user defined classes have to be valid after a move, because their destructor will still be run.

So the compiler will complain and not compile your program?? Nope. It should be if you want a program that functions correctly, but have to? No, C++ doesn't force that on you.

alkonaut

6 months ago

[-]

This sounds like an enormous footgun (but as I understand it there are warnings that will tell you). An object isn't "valid" in any reasonable business logic sense just because the fields are initialized to anything at all, such as their default state? If the valid state of a Person is "the name is not empty " and this is enforced by a constructor then I don't want the program to ever have Person object floating around with a blank name? I either want a compiler error (good) or an immediate crash at runtime (bad), but at least I don't want an invalid object in a still running program (worse). Maybe I misunderstand what the reset was or how big this risk is though.

6 months ago

[-]

>An object isn't "valid" in any reasonable business logic sense just because the fields are initialized to anything at all, such as their default state

That very much depends on your use case.

>If the valid state of a Person is "the name is not empty " and this is enforced by a constructor then I don't want the program to ever have Person object floating around with a blank name

If you have such strict requirements then you shouldn't be moving around Persons to begin with. You should just be using std::make_unique() and then moving the pointer. Person should not even have a move constructor defined. If you code your class such that it's possible to let it reach an invalid state, that's no one's fault but your own.

Joker_vD

6 months ago

[-]

Even if the std::string was guaranteed to hold a "SORRY, THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL STRING SUPPLIER" string in it after being moved from, I doubt this would actually help that much with the overall correctness of the application.

There are very, very few cases where it is "sensible" to do anything with such an "arbitrarily conjured" state except than disposing of/overwriting it. In fact, the only example I can vaguely remember of (and can't for the life of me to google) is that one scheme of storing some sort of lookup index in two arrays that store indices into each other, and it's not necessary to zero out those arrays before using them because the access algorithm is cleverly arranged in such a way that no matter what numbers are stored in the unused parts of the arrays, it will still work correctly.

6 months ago

[-]

>Even if the std::string was guaranteed to hold a "SORRY, THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL STRING SUPPLIER" string in it after being moved from

That's a rather weak "even if", given most implementations just reset to the empty string after moving.

>I doubt this would actually help that much with the overall correctness of the application.

Like I said, it depends on your use case. A pattern I use frequently when processing input is to have an accumulator that I build up progressively, and then when ready I move it into a result container, and since that resets the accumulator I can simply keep using it. If my algorithm required the initial state "SORRY, THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL STRING SUPPLIER" rather than the empty string, such an idiosyncratic post-move value would be rather convenient.

alkonaut

6 months ago

[-]

> If you code your class such that it's possible to let it reach an invalid state, that's no one's fault but your own.

I don't know C++ so I was given the impression in the article that the person writing the class could try very hard to make it impossible to reach an invalid state, but that this work could be ignored elsewhere by making a move of this kind which would work without any special requirements on the type itself.

6 months ago

[-]

You can delete the move constructor and the move assignment operator from a class, making it completely impossible to move its objects (other than through pointer arithmetic). If you have really specific class invariants it's what you should be doing. OR, the move functions should leave the moved-from members in valid states according to your invariants.

6 months ago

[-]

The lack of so called destructive moves in C++ is not great. You either add a proper empty state to your type and make it properly part of the invariant, which is not always possible or meaningful, or you need a special moved from state for which your object invariant doesn't hold, which is "less than ideal" to say the least.

account42

6 months ago

[-]

While the language doesn't forbid use after move, occurences of it are most likely a programmer error. Which is why clang-tidy has the bugprone-use-after-move check.

w10-1

6 months ago

[-]

Hasn't a language feature failed if even experts disagree on it? How would lay developers ever use it? This is not an algorithmic nicety; it's supposed to be second nature to write and automatic to read.

And it seems weird to omit Swift from this comparison, since Swift seems to have the most user-friendly (but incomplete?) implementation of move-only types.

6 months ago

[-]

Not even the people who implement C++ compilers can agree on how certain C++ features are supposed to work.

Night_Thastus

6 months ago

[-]

I can't say examples like this sell me on Rust, coming from C++. I need to manually to_string(), every single time I want to use strings?

And that bizarre scoping of Person p feels very un-intuitive. How would you work around that if you need to keep using it after show()? (Which is an extremely common use case)

winrid

6 months ago

[-]

to_string() gives you an owned string (like std::string) vs a borrowed string slice (kind of like char*). If you already have an owned string you don't need to do that obviously

If you need to keep using Person after calling show() then don't pass ownership to show() - you can pass a reference or a mutable reference, or use Rc<> etc

aseipp

6 months ago

[-]

A raw string literal gets embedded into the binary's data section at compile time, just like it would in C or C++. What this means is that the type of the string literal is actually a reference (to an underlying memory address). And so it has type '&str' which reflects the fact you are using a reference to a value that exists somewhere else.

The type 'String' is instead an "owned" type, which means that it is not a reference, and instead a complete value and has a copy of the data. to_string() will create a String (owned value) from a &str (reference) by copying it. This is no different than if you had a global static compile-time string in C and you wanted to modify or update it: you would memcpy the global (statically allocated) string into a local buffer of the appropriate size and then modify it and pass it onward to other things that need it. You would not modify the static string in place.

In short, no, you do not need to_string() every time you want to work with a string. You need it to convert a reference type to an owned type. Rust's type system is just used here to codify the more implicit parts of C or C++'s behavior that you are already familiar with, but the underlying bits and bytes behave as you would expect coming from C++.

> And that bizarre scoping of Person p feels very un-intuitive. How would you work around that if you need to keep using it after show()

You take a reference just like you would in C++. Possibly a mutable reference if you want to modify the thing and then use it afterwords. This is in the article as the "Advanced rust example" at the end, it's right there and not hidden or anything.

It isn't really bizarre honestly; it's a matter of defaults. The difference is that Rust uses move-by-default, not copy-by-default or ref-by-default. Every time you write `x = y` for a given owned type, you are doing a move of `y` and into `x` and thus making `y` invalid.

    let g: &str   = "Austin";      // statically allocated string
    let x: String = g.to_string(); // do a copy
    let y: String = x;             // no copy, x is moved

Once you internalize this a lot more stuff will make sense, or at least it did for me.

Slyfox33

6 months ago

[-]

"Dave" by itself is basically the same as in c++, just a pointer to a string literal. Dave.to_string() is like std::string {"Dave"}, it allocates a heap based string from said literal. So you can use "Dave" perfectly fine if you just want a string literal.

saghm

6 months ago

[-]

> I think before rust, language designers mixed up the various properties these values can have. As a result, many incomprehensible designs were the result. rust models the most important memory-related properties through its two call conventions (passing or borrowing). And Hylo moves even more properties into the call conventions. Namely, Hylo uses the keywords let, set, sink, and inout. This way Hylo additionally represents e.g. initialization (rust models this with a separate type).

Is anyone able to clarify what's meant by "initialization" here and what "separate type" Rust uses for this (e.g. something defined specifically for each type getting passed this way, or a generic warpper type in the standard library)? Offhand, my understanding is that three of the Hylo keywords listed correspond to passing by ownership, shared reference, or mutable reference in Rust, and whichever doesn't correspond to one of those is something that a separate type if used for in Rust, but I'm not confident that my understanding is correct because the only thing I can think of that might be related to "initialization" is constructors, which Rust notably does _not_ have any formal concept of in the language, since functions that return types are just like any other function implemented on a type without a self parameter.

I'm also not completely sure what the intended distinction is being made between whatever separate type is and references in Rust, since a reference is also a separate type than the type of the value of references. I could imagine someone might think that references are different than user-defined types in a way that other standard library types like Box and Arc aren't, but I'd argue that the unique syntax that references have is actually not that significant, and semantically being located inside std makes them far closer to references in terms of potentially behaving in special ways due to them having access to certain unstable APIs around things like allocations and fact that std is developed in tandem with the compiler, which leaves the door open for those types to take advantage of any additional internal APIs that get added in the future.

Measter

6 months ago

[-]

They mean whether the value is properly initialized, as in all the bytes that make up that value have set values that are valid for that type. For example, in Rust the only valid values a boolean can have are 0 and 1, anything else is invalid. Notably, in the abstract machine, bytes actually have 257 values: 0-255 and uninitialized. Uninitialized means that an initialized value was never written to it. Reading a value that is not properly initialized is undefined behaviour, and optimization passes can result in unpredictable changes in behaviour of the code.

The type they mentioned is MaybeUninit (https://doc.rust-lang.org/std/mem/union.MaybeUninit.html), which is used to represent values that are not fully initialized. It's worth reading the documentation for that type.

saghm

6 months ago

[-]

Ah, I see. Since I don't touch unsafe Rust very much at all, I completely forgot about this type. It makes sense that having a "safe" way of dealing with this would be useful, especially for a "C++ successor" language.

hmry

6 months ago

[-]

My best guess is they're referring to writing functions that initialize something using an "out" parameter in Hylo, which would be equivalent to a "&mut MaybeUninit<...>" parameter in Rust.

pjmlp

6 months ago

[-]

> We learned that working on pointers directly often leads to memory bugs. So we introduced references.

Minor pedantic correction, references predate having pointers all over the place, in most systems languages.

C adopting pointers for all use cases isn't as great as they thought.

khold_stare

6 months ago

[-]

I see some confusion in the comments about C++ moves. I wrote an article in 2013 after it clicked for me: https://kholdstare.github.io/technical/2013/11/23/moves-demy... . It goes over motivation, how it works under the hood etc, has diagrams if you are a more visual learner.

https://docs.hylo-lang.org/language-tour/bindings

enugu

6 months ago

[-]

In this discussion of a specific point in the post, the promise of Hylo language and mutable value semantics can be overlooked.

Namely, we get a lot of the convenience of functional programming (mutating one variable doesn't change any other variable) with the performance of imperative languages (purely functional data structures have higher costs relative to in-place mutation and are more gc-intensive).

fuhsnn

6 months ago

[-]

Copy or move for C++ is just choosing which constructor/assignment overload to call. I believe it's possible to make C++ move-by-default if one go through the trouble of overloading every class you use with custom move procedures.

eterevsky

6 months ago

[-]

In C++ you can force the move of the parameter by wrapping it with std::move() this should take care of unnecessarily cloning the argument in the example.

masklinn

6 months ago

[-]

std::move does not force anything , it is a cast to an rvalue reference (a movable-from).

Whether the object is moved depends on whether the target / destination / sink cares.

Thorrez

6 months ago

[-]

>I compiled the C++ examples with godbolt with “x86-64 gcc (trunk)” and “-Wall -Wextra -Wno-pessimizing-move -Wno-redundant-move”.

Edit: everything below is incorrect.

-Wno-pessimizing-move is automatically enabled by -Wall, so doesn't need to be specified manually. -Wno-redundant-move is automatically enabled by -Wextra, so doesn't need to be specified manually.

quuxplusone

6 months ago

[-]

-Wno-foo is turning off those warnings, not turning them on.

https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/C_002b_002b-Dia...

Thorrez

6 months ago

[-]

Wow, thanks. The gcc documentation appears to have a problem.

It lists -Wreorder as a warning, and says it's enabled by -Wall . It lists -Wno-pessimizing-move as a warning, and says it's enabled by -Wall .

I think the documentation should be edited to not list -Wno-pessimizing-move , and instead list -Wpessimizing-move .

cpp_noob

6 months ago

[-]

  struct Person {
    string name;
    uint8_t age;
  };

isn't this missing a move constructor?

  Person::Person(Person&& p) : name(std::move(p.name)), age(p.age) {}

or is C++ able to make these implicitly now?

6 months ago

[-]

The move and copy constructors are implicit.

6 months ago

[-]

Not sure why the author compares Rust's:

    println!("{} is {} years old", person.name, person.age);

with C++:

    cout << person.name << " is " << unsigned(person.age)
         << " years old" << endl;

... while C++ actually has:

    println("{} is {} years old", person.name, person.age);

essentially identical to Rust. See: https://en.cppreference.com/w/cpp/io/println

Aurelius108

6 months ago

[-]

It’s very new to the standard library (latest version of GCC this year was the first version to support it). Additionally, I’ve found that println adds 30+ seconds to my compile time even for hello world so I’ll be avoiding it unless I need it

6 months ago

[-]

> It’s very new

True, but Hylo is so new that it's not even an established language. Plus using this should serve to higlight the differences the author actually cares about between the languages.

6 months ago

[-]

https://godbolt.org/z/MTo11voes > println takes 9 seconds https://godbolt.org/z/he6Phr7nG > cout takes 6 seconds

What machine / compiler are you on where the difference between these are 30 seconds? GCC is also quite a bit faster based off a quick tests in godbolt.

nicce

6 months ago

[-]

> https://godbolt.org/z/MTo11voes > println takes 9 seconds https://godbolt.org/z/he6Phr7nG > cout takes 6 seconds

That is 50% increase.

6 months ago

[-]

I don't believe I claimed anywhere it is not a 50% increase. The OC said 30 second difference.

nicce

6 months ago

[-]

I missed the "Hello, world!" mention, but otherwise you only need to have 10 prints in your whole project to have the 30 second increase. That is pretty significant.

6 months ago

[-]

It is not linear on number or prints. 1 vs 2 prints will likely have zero noticeable affect.

cjfd

6 months ago

[-]

Some people are noticing that println is very new. But there already is https://github.com/fmtlib/fmt and it has been there quite a long time.

account42

6 months ago

[-]

Using random libraries in example code isn't good practice though.

Still, even (C) printf would have been better than the iostreams monstrosity.

tovej

6 months ago

[-]

fmt is not a random library, it's the inspiration and reference implementation for std::format

Philpax

6 months ago

[-]

That would require introducing a dependency, which is a digression from the point of the article and would complicate reproduction for the reader.

6 months ago

[-]

I can assure you that using a new language is a substantially greater task than introducing a dependency (or using -std=c++23). So you might as well show off the latest and greatest for all the competitors.

vlovich123

6 months ago

[-]

Well C++23 is fairly new so they probably just didn't know about it?