Most developers have no trouble getting the idea of C++'s function overloading for parameter types that are totally different, e.g. it's clear what foo("xyz") will call if you have:
void foo(int x);
void foo(std::string x);
It's also not too hard to get the idea with const and mutable references: void foo(std::string& x);
void foo(const std::string& x);
Rvalue references allow another possibility: void foo(std::string&& x);
void foo(const std::string& x);
(Technically it's also possible to overload with rvalue and non-const regular references, or even all three, but this is rarely done in practice).In this pairing, the first option would be chosen for a temporary object (e.g. foo(std::string("xyz")) or just foo("xyz")), while the second would be chosen if passing in a named variable (std::string x; foo(x)). In practice, the reason you bother to do this is so the the first overload can pilfer memory resources from its argument (whereas, presumably, the second will need to do a copy).
The point of std::move() is to choose the first overload. This has the consequence that its argument will probably end up being modified (by foo()) even though std::move() itself does not contain any substantial code.
All of the above applies to constructors, since they are functions and they can also be overloaded. Therefore, the following function is very similar in most practical situations since std::string has overloaded copy and move constructors:
void foo(std::string x);
Specifically, what you did not make clear is the return type of std::move.
What they go was this awful compromise, it's not destroyed, C++ promises that it will only finally be destroyed when the scope ends, and always then, so instead some "hollowed out" state is created which is some state (usually unspecified but predictable) in which it is safe to destroy it.
Creating the "hollowed out" new state for the moved-from object so that it can later be destroyed is not zero work, it's usually trivial, but given that we're not gaining any benefit by doing this work it's pure waste.
This constitutes one of several unavoidable performance leaks in modern C++. They're not huge, but they're a problem when you still have people who mistake C++ for a performance language rather than a language like COBOL focused intently on compatibility with piles of archaic legacy code.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n13...
It does bring up an issue involving how to handle destructive moves in a class hierarchy, and while that's an issue, it's a local issue that would need careful consideration only in a few corner cases as opposed to the move semantics we have today which sprinkle the potential for misuse all over the codebase.
The brute optimisation for Rust is being done by LLVM, just like if you used Clang to compile C++, so your pure number crunching ought to be fine. If anything you may find it's easier to end up correctly expressing the thing you meant with good performance in Rust. If you rely on a C++ library of geometric algorithms, clearly "I can't find an equivalent in Rust" would be a showstopper and so it's worth stopping past crates.io to try a few searches for whatever keywords are in your head
Also, if you know that learning new stuff fogs up your process, you might not want to try to both learn Rust and work on this novel project simultaneously. Some people thrive pairing learning a language with a new project, others hate that and would rather pick, either do something old in a new language, or do something new in an existing one.
If you decide this isn't the right time but keep feeling a twinge, I encourage you to try it for something else, not everybody is going to like Rust, but it's a rare C++ programmer who spends serious time learning Rust and then decides there was nothing they valued from the experience -- particularly if you have no experience in an ML (F# or Ocaml are modern examples)
This is useful for copying the data of a temporary string to another string without actually copying each byte of the data. Since the underlying characters live in the heap, there's no point in copying each byte to a new area in the heap. Instead, use move semantics to transfer ownership of the pointer to a new string container
There is a lot of wrong in this paragraph:
- a "copy" was not generated, at least not in the sense that the actual content of the string was copied anywhere;
- there's no undefined behaviour here and no invalidation of the string. Standard library types are required to be left in an unspecified but valid state after move. "Valid" here means that you can go on and inspect the state of the string after move, so you can query whether it is empty or not, count the number of characters, etc. etc. "Unspecified" means that the implementation gets to decide what is the status of the string after move. For long enough strings, typical implementation strategy is to set the moved-from string in an empty state.
...unless it's a short string within the limits of the small-string-optimization capacity.
I think what confuses many people is that a C++ move assignment still can copy a significant amount of bytes since it's just a flat copy plus 'giving up' ownership of dangling data in the source object.
For a POD struct, 'move assignment' and 'copy assignment' are identical in terms of cost.
// (1)
struct Person {
name: String,
age: u8,
}
fn show(person: Person) {
println!("Person record is at address {:p}", &person);
println!("{} is {} years old", person.name, person.age);
}
fn main() {
let p = Person { name: "Dave".to_string(), age: 42 }; // (2)
println!("Person record is at address {:p}", &p);
show(p); // (3)
}
Its output is: Person record is at address 0x7ffcfb2b4e40
Person record is at address 0x7ffcfb2b4ec0
Dave is 42 years old
A few standard library types do guarantee that the moved-from object is empty (e.g., the smart pointer types).
For some others (basically, all containers except string), it is not explicitly stated that this is the case but it is hard to imagine an implementation that doesn't (due to time complexity and iterator invalidation rules). Arguably, this represents a bigger risk than string'e behaviour, but it's still interesting.
What's the semantic difference? Of course moving a class will involve some amount of copying. How could it be any other way? If you have something like struct { int a[1000]; }, how are you supposed to move the contents of the struct without copying anything? What, you take a pair of really tiny scissors and cut a teeny tiny piece of the RAM, then glue the capacitors somewhere else?
By taking the physical page this one struct resides in, and mapping it into the virtual address space the second time. This approach is usually used in the kernel-level development, but there has been a lot of research done since the seventies on how to use it in runtimes for high-level programming languages.
Now, it does involve copying an address of this struct from one place to another, that I cede.
So then under this model, what’s the difference between a string and a string_view?
string_view doesn't do any deep copying.
I'm not sure what you're getting at. They're both small structs holding pointers to char data, they just operate on that data differently.
std::string_view also has implementation details that in principle could be similar to std::string, it's a pointer with a size, but the semantics of std::string_view are very different from the semantics of std::string.
And that's the crux of the issue, it's better to understand classes in terms of their semantics, how they operate, rather than their implementations. Implementations can change, and two very separate things can have the same or very similar implementations.
A std::string is not just some pointers and some record keeping data; a std::string is best understood as a class used to own and manage a sequence of characters with the various operations that one would expect for such management. A std::string_view is non-owning, read-only variation of such a class that operates on an existing sequence of characters.
How these are implemented and their structural details is not really what's important, it's how someone is expected to use them and what can be done with them that counts.
> How these are implemented and their structural details is not really what's important
Usually this isn't important, unless you're talking about low level details impacting performance, which is exactly what the article is about.
And if you’re going down that path, the string may not have a pointer at all.
“A string value is not the string content itself”, but in most cases it is if the string is short enough, implementation dependent disclaimer and all that.
“C++ obfuscates this distinction because of how it automatically deep copies vectors and strings”
It does this because it has to, to guarantee its interface invariants. That “array” (if there is one) really is the string. Just because there might be an indirection doesn’t change that.
> they just operate on that data differently.
Well they operate on the memory “array” of the char data differently (well in the latter not at all).
Also a nitpick: std::string unlike String in Rust or other languages is not married to an encoding. And C++ managed to fuck that one up even more so recently.
Using std::move for anything other than "unique ownership without pointers" really messes things up. People put std::move everywhere expecting performance gains, just like we used to put "&" everywhere expecting performance gains. It's a bit of cargo cultism that can be nicely dispelled by realizing std::move is just std::copy with a compiler-defined constructor invocation potentially run to determine the old value. With that phrasing, it's hard to hallucinate performance gains that might come automatically.
I have no idea what that means.
std::move is a cast to an rvalue reference. That can potentially trigger a specific overloaded function to be selected and possibly, ultimately, a move constructor or assignment operator to be called.
For an explicit move to be profitable, an expression would have otherwise chosen a copy constructor for a type with an expensive copy constructor and a cheap move constructor.
std::copy is a range algorithm, not sure what's the relevance.
https://devblogs.microsoft.com/oldnewthing/20231124-00/?p=10...
Also "It was an ergonomic advancement." hides a lot of the overwrought syntax sugar in C++ that causes it to be such a weird language if you come from elsewhere. But still an excellent insight into the state of affairs.
I think the "Apparently" language makes it seem like this is some kind of accident that nobody would know about, when really the author was probably just being a creative writer, and the example was fundamental to the post.
Move is allowed to not move because in generic code you don't want to have to check for if move is possible for the type in question.
The function, show, doesn't take a copy, it takes a Person object. Persons can be copy constructed or move constructed (both constructors are implicit, since there's no user-defined constructors). std::move returns an r-value reference to main's p, so Person's implicit move constructor is called, and show's p argument is move constructed from main's p. The reported address changes because moving creates a new object in C++, but the moved-to object may take ownership of the heap allocated memory and other resources from the moved-from object.
In this case, the moved-to Person takes ownership of the heap allocation from the moved-from Person's string member and sets the moved-from Person's string member to an empty string. Without std::move, show's p is copy constructed, including its string member.
Thusly, what happens in code that accesses the string after the move is UB.
In the implementation of C++ the article uses the string was just empty. But for all we know it may still contain a 1:1 copy of the original or 20 copies or a gobbledygook of bytes.
Any code that relies on the string being something (even empty) may behave different if it isn't. That's the very definition of UB.
"A typical implementation strategy" is meaningless for someone writing code against a language specification.
You're then writing code against a specific compiler/std lib and that's fine. But let's be honest about it.
Undefined behavior is a stronger statement and says that if the behavior occurs then the entire program is simply not valid. This allows the compiler to make vastly more aggressive changes to your program.
On the contrary the actual C++ standard explicitly states that permissible undefined behavior includes, and I quote "behaving during translation or program execution in a documented manner characteristic of the environment".
It's also worth noting that numerous well known and used C++ libraries explicitly make use of undefined behavior, including boost, Folly, Qt. Furthermore, as weird and ironic as this sounds, implementing cryptographic libraries is not possible without undefined behavior.
"A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)."
I.e. a program the contains UB is undefined.
Of course, as you observer, an implementation can go beyond the standard and extend the abstract machine to give defined semantics to those undefined operations.
That's still different from implementation defined behaviour, where a conforming implementation must give defined semantics.
No, it is implementation-defined behaviour.
> In the implementation of C++ the article uses the string was just empty. But for all we know it may still contain a 1:1 copy of the original or 20 copies or a gobbledygook of bytes.
Yes, and if you want to make sure that the string is empty before you do something else with it, you just use a clear() (which will be optimised away by the compiler anyway).
Or, if you prefer, you can assign another string to it, or anything else really.
> Any code that relies on the string being something (even empty) may behave different if it isn't. That's the very definition of UB.
No it is not.
> "A typical implementation strategy" is meaningless for someone writing code against a language specification.
Then don't rely on that specific implementation detail and make sure that the string is in the state you want or, even better, don't touch the moved-from string ever again.
> Then we ask us the following questions:
> 1. When we passed Dave to show, did we create a copy?
> 2. If so, how do we avoid creating a copy?
> C++ example
> 1. Yes. You can insert cout << "Person record is at address " << &p << endl; before the call of show as well as the beginning of show. This reveals different memory addresses of the record.
Judging copies by the object's address is incorrect methodology. In both C++ and Rust, "moving" an object will still copy the struct fields, but will avoid copying any of the pointees (such as the variable-size array that the string owns).
> 2. Replace void show(Person person) with void show(Person& person). So only the function needs to change. The caller does not have to adapt to it.
Passing by reference is a different concept to moving. While the author used this approach for C++, they did not use the same approach for Rust. This is comparing apples to oranges.
Most people consider a shallow copy a "copy", certainly a shallow copy isn't a "reference"! One of the big problems in this space is in fact the divergence of terminology that leads to arguments like this.
The introduction of move semantics to C++ was a terrible, terrible mistake; not because it doesn't solve a real problem but because the language is objectively much worse now as a routine tool for general developers. People used to hack on code to implement features, now they get confused over and argue about how many "&" characters they need in a function signature.
It was a problem that was best left unsolved, basically.
This is a good example of a hard-won life lesson... There might be a solution to a problem, but the solution is worse than the original problem. I semi-jokingly call this "the healing power of apathy". The reality of it is that, sometimes, there are problems in life where benign neglect is the best response.
Sounds like a skill issue. Maybe they should go shopping.
Jokes aside though, yeah, move semantics is taught bad. Once you start using it (say, with a unique_ptr in a container) it will quickly start making sense.
This is a shallow understanding of C++. It happens because the Person object is a POD type that doesn't define a move constructor, and the compiler creates a default one that calls the move constructors of the members. The string member has a well-defined move constructor, but the primitive uint8_t type doesn't.
For a stricter definition of POD which requires that byte-by-byte copies are possible. More informally, it's a POD because it only defines members and all the constructors and destructors are implicitly generated.
It doesn't make a lot of sense to not have this recursive quality to POD-ness, because the fact that C++ shenanigans are involved doesn't go away just because it's implicitly handled for you by the compiler.
EDIT: It seems I've had a slightly incorrect impression of "POD": what makes 'Person' non-POD isn't that it has an implicitly defined constructor but simply that it contains a non-POD type. The requirements for POD classes[1] includes "has no non-static data members of type non-POD class (or array of such types)". std::string is certainly a non-POD class, which makes discussion about Person's constructors and destructors moot. Not that it changes anything, but I don't wanna spread misinformation.
[1] https://en.cppreference.com/w/cpp/language/classes#POD_class
A: It's a niche programming language the author is involved with. It's not widely-used enough to get its own Wikipedia page. It used to be called "Val". See: https://www.hylo-lang.org/
Also is it a bit strange they wrote "rust" along all the article instead of "Rust"?
Anyway, that's pure bikeshedding. "function" is a full word in English, but almost 3x the length.
The std::string is not invalidated, it's reset to its empty state (i.e. null pointer and zero length). Standard classes are all in defined, valid states after being moved, such that using them again is safe. User-defined classes may be coded to be left in either valid or invalid states after being moved. It's the responsibility of the programmer to decide which is appropriate according to the situation. There are valid reasons to want to reuse a moved object. For example, you might want to force the release an object's internal memory:
std::string() = std::move(s);
It's somewhat unfortunate that there's no way to signal to the compiler than an object is not safe for reuse, though.
You're right to pick up on this. The author of the article is confused here, or at least using incorrect terminology. There's certainly no "undefined behaviour" going on.
But your corrections aren't quite right either, or at least use slightly odd definitions.
> User-defined classes may be coded to be left in either valid or invalid states after being moved.
No, even user defined classes have to be valid after a move, because their destructor will still be run. If you had your own vector-like class that points to invalid memory (or the same memory as the moved-to object) then you will get corruption when its destructor tries to free that memory.
Ok, it's true that you could manually define an "invalid" state in your class, perhaps by adding an internal Boolean flag which you set when the object is moved from. Then you could throw an exception or abort or whatever when any method (except the destructor) is called with this flag set. But you'd have to go out of your way to do this and I've never seen it done. I don't think this is what most people would understand your statement to mean.
> The std::string is not invalidated, it's reset to its empty state (i.e. null pointer and zero length).
I'm not sure whether you're implying this is a strict requirement or just happens to be what happened in this case. In fact, the standard does not require this: the string could be left in any (valid, whatever that means) state. It could be empty, unchanged, or anything else. As other comments have noted, if the string's length is below the short string optimisation threshold then it's quite likely the original string will retain its value unchanged. Only a few specific types in the standard library have the guarantee that they will be empty after being moved from, and string isn't one of those.
By "valid" I mean that you can use the object like normal; being able to destruct the object is not enough. If the destructor is unsafe to run (for example because the object ends up owning a dangling pointer) you just have an outright bug. An invalid state would be one where any further use of the object (other than destroying it) is an error.
>I'm not sure whether you're implying this is a strict requirement or just happens to be what happened in this case.
Yes, I'm saying that's what happened in that case. The string was not invalidated, it was reset.
So the compiler will complain and not compile your program?? Nope. It should be if you want a program that functions correctly, but have to? No, C++ doesn't force that on you.
That very much depends on your use case.
>If the valid state of a Person is "the name is not empty " and this is enforced by a constructor then I don't want the program to ever have Person object floating around with a blank name
If you have such strict requirements then you shouldn't be moving around Persons to begin with. You should just be using std::make_unique() and then moving the pointer. Person should not even have a move constructor defined. If you code your class such that it's possible to let it reach an invalid state, that's no one's fault but your own.
There are very, very few cases where it is "sensible" to do anything with such an "arbitrarily conjured" state except than disposing of/overwriting it. In fact, the only example I can vaguely remember of (and can't for the life of me to google) is that one scheme of storing some sort of lookup index in two arrays that store indices into each other, and it's not necessary to zero out those arrays before using them because the access algorithm is cleverly arranged in such a way that no matter what numbers are stored in the unused parts of the arrays, it will still work correctly.
That's a rather weak "even if", given most implementations just reset to the empty string after moving.
>I doubt this would actually help that much with the overall correctness of the application.
Like I said, it depends on your use case. A pattern I use frequently when processing input is to have an accumulator that I build up progressively, and then when ready I move it into a result container, and since that resets the accumulator I can simply keep using it. If my algorithm required the initial state "SORRY, THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL STRING SUPPLIER" rather than the empty string, such an idiosyncratic post-move value would be rather convenient.
I don't know C++ so I was given the impression in the article that the person writing the class could try very hard to make it impossible to reach an invalid state, but that this work could be ignored elsewhere by making a move of this kind which would work without any special requirements on the type itself.
And it seems weird to omit Swift from this comparison, since Swift seems to have the most user-friendly (but incomplete?) implementation of move-only types.
And that bizarre scoping of Person p feels very un-intuitive. How would you work around that if you need to keep using it after show()? (Which is an extremely common use case)
If you need to keep using Person after calling show() then don't pass ownership to show() - you can pass a reference or a mutable reference, or use Rc<> etc
The type 'String' is instead an "owned" type, which means that it is not a reference, and instead a complete value and has a copy of the data. to_string() will create a String (owned value) from a &str (reference) by copying it. This is no different than if you had a global static compile-time string in C and you wanted to modify or update it: you would memcpy the global (statically allocated) string into a local buffer of the appropriate size and then modify it and pass it onward to other things that need it. You would not modify the static string in place.
In short, no, you do not need to_string() every time you want to work with a string. You need it to convert a reference type to an owned type. Rust's type system is just used here to codify the more implicit parts of C or C++'s behavior that you are already familiar with, but the underlying bits and bytes behave as you would expect coming from C++.
> And that bizarre scoping of Person p feels very un-intuitive. How would you work around that if you need to keep using it after show()
You take a reference just like you would in C++. Possibly a mutable reference if you want to modify the thing and then use it afterwords. This is in the article as the "Advanced rust example" at the end, it's right there and not hidden or anything.
It isn't really bizarre honestly; it's a matter of defaults. The difference is that Rust uses move-by-default, not copy-by-default or ref-by-default. Every time you write `x = y` for a given owned type, you are doing a move of `y` and into `x` and thus making `y` invalid.
let g: &str = "Austin"; // statically allocated string
let x: String = g.to_string(); // do a copy
let y: String = x; // no copy, x is moved
Once you internalize this a lot more stuff will make sense, or at least it did for me.Is anyone able to clarify what's meant by "initialization" here and what "separate type" Rust uses for this (e.g. something defined specifically for each type getting passed this way, or a generic warpper type in the standard library)? Offhand, my understanding is that three of the Hylo keywords listed correspond to passing by ownership, shared reference, or mutable reference in Rust, and whichever doesn't correspond to one of those is something that a separate type if used for in Rust, but I'm not confident that my understanding is correct because the only thing I can think of that might be related to "initialization" is constructors, which Rust notably does _not_ have any formal concept of in the language, since functions that return types are just like any other function implemented on a type without a self parameter.
I'm also not completely sure what the intended distinction is being made between whatever separate type is and references in Rust, since a reference is also a separate type than the type of the value of references. I could imagine someone might think that references are different than user-defined types in a way that other standard library types like Box and Arc aren't, but I'd argue that the unique syntax that references have is actually not that significant, and semantically being located inside std makes them far closer to references in terms of potentially behaving in special ways due to them having access to certain unstable APIs around things like allocations and fact that std is developed in tandem with the compiler, which leaves the door open for those types to take advantage of any additional internal APIs that get added in the future.
The type they mentioned is MaybeUninit (https://doc.rust-lang.org/std/mem/union.MaybeUninit.html), which is used to represent values that are not fully initialized. It's worth reading the documentation for that type.
Minor pedantic correction, references predate having pointers all over the place, in most systems languages.
C adopting pointers for all use cases isn't as great as they thought.
Namely, we get a lot of the convenience of functional programming (mutating one variable doesn't change any other variable) with the performance of imperative languages (purely functional data structures have higher costs relative to in-place mutation and are more gc-intensive).
Whether the object is moved depends on whether the target / destination / sink cares.
Edit: everything below is incorrect.
-Wno-pessimizing-move is automatically enabled by -Wall, so doesn't need to be specified manually. -Wno-redundant-move is automatically enabled by -Wextra, so doesn't need to be specified manually.
It lists -Wreorder as a warning, and says it's enabled by -Wall . It lists -Wno-pessimizing-move as a warning, and says it's enabled by -Wall .
I think the documentation should be edited to not list -Wno-pessimizing-move , and instead list -Wpessimizing-move .
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/C_002b_002b-Dia...
struct Person {
string name;
uint8_t age;
};
isn't this missing a move constructor? Person::Person(Person&& p) : name(std::move(p.name)), age(p.age) {}
or is C++ able to make these implicitly now? println!("{} is {} years old", person.name, person.age);
with C++: cout << person.name << " is " << unsigned(person.age)
<< " years old" << endl;
... while C++ actually has: println("{} is {} years old", person.name, person.age);
essentially identical to Rust. See: https://en.cppreference.com/w/cpp/io/printlnTrue, but Hylo is so new that it's not even an established language. Plus using this should serve to higlight the differences the author actually cares about between the languages.
What machine / compiler are you on where the difference between these are 30 seconds? GCC is also quite a bit faster based off a quick tests in godbolt.
That is 50% increase.
Still, even (C) printf would have been better than the iostreams monstrosity.
std::cout << std::format(....) ;
has been available since C++20. Still not really the point of the article.