(n & ((n & 1) - 1)) + ((n ^ (n >> 1)) & 1)
Or a much more readable version [ n, 1, n + 1, 0 ][n % 4]
which makes it clear that this function cycles through a pattern of length four.Why this works can be seen if we start with some n that is divisible by four, i.e. it has the two least significant bits clear, and then keep XORing it with its successors. We start with xxxxxx00 which is our n. Then we XOR it with n + 1 which is xxxxxx01 and that clears all the x's and leaves us with 00000001. Now we XOR it with n + 2 which is xxxxxx10 and that yields xxxxxx11 which is n + 3. The cycle finishes when we now XOR it it with n + 3 which yields 00000000. So we get n, 1, n + 3, 0 and then the cycle repeats as we are back at zero and at n + 4 which is again divisible by four.
My offhand solution not using xor is to subtract from the sum of 1 to n, which has a closed form solution. The closed form roughly halves the execution time, as we only have to iterate over the range once.
Good to know there's a similar speedup available on the xor path...
xxxxxxx0 ^ xxxxxxx1 = 00000001
If we start at a number divisible by four and do this twice, we get one twice. xxxxxx00 ^ xxxxxx01 = 00000001
xxxxxx10 ^ xxxxxx11 = 00000001
And combining the two of course yields zero and we are right back at the start. xor =: (16 + 2b0110) b.
f =: 3 : 'xor/ y + i. 4'
f"0 ] 2 * 1 + i. 100
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Summing a hundred millions: +/ f"0 ] 2 * i 100000000 gives zero (it takes a few seconds). So it seems the stated property holds for every even n.
xxxxxxx0 ^ xxxxxxx1 = 00000001
Doing this twice with four consecutive numbers then also cancels the remaining one. That also means that you do not have to use consecutive numbers, you can use two arbitrary pairs 2m ^ 2m + 1 ^ 2n ^ 2n + 1
and for example 16 ^ 17 ^ 42 ^ 43
should be zero.So the cycle of (N, 1, N+3, 0) corresponds to (A) and (B) being: (0,0), (0,1), (1,1), (1, 0) - i.e. the 4 possible combinations of these states.
I think the following is true: For even k the cycle is k^2 long and for odd k is k long. Why? because units' place of generalized xor from 1 to k-1 is (k^2-k)/2 and therefore zero mod k if k is odd, if k is even then if we repeat it twice we get zero. For the second digit, k times the same digit will always give zero. Thus for odd k we have a zero when n is divisible by k and for even k we have a zero when n is divisible by 2k and the smallest power of k divisible by 2k is k^2 so it must be the cycle length.
>n + 3 is the value after n XOR n + 1 XOR n + 2, so the n in the array index expression is n + 2 from the explaination and n + 3 results from (n + 2) + 1.
The reason I think it's off is that array index expression the start of the sum is 1, but in the explanation the start of the sum is n. So I don't think it's as simple as the ending being different by 2.
n = 8 [ 8, 1, 9, 0 ][ 8 % 4] = 8 = n
n = 9 [ 9, 1, 10, 0 ][ 9 % 4] = 1
n = 10 [ 10, 1, 11, 0 ][10 % 4] = 11 = n + 1
n = 11 [ 11, 1, 12, 0 ][11 % 4] = 0
n = 8 n = 8 = 8 = n
n ^ n + 1 = 8 ^ 9 = 1
n ^ n + 1 ^ n + 2 = 8 ^ 9 ^ 10 = 11 = n + 3
n ^ n + 1 ^ n + 2 ^ n + 3 = 8 ^ 9 ^ 10 ^ 11 = 0
So in the explanation we get 11 = n + 3 still with reference to the starting value n = 8, in the array expression on the other hand we have moved on to n = 10 when we pull 11 = n + 1 out of the array.Also because we end up back at zero when ever n % 4 == 3, there is some flexibility about the starting point. I wrote 1 to n because that is what the article used, but it would be mathematically cleaner to start at zero which actually changes nothing because XORing with zero does nothing. And we do not have to start at zero at all, we can start at any number divisible by four and less than n because the running XOR sum will become zero just before each multiple of four. So XORing together 0...n, 1...n, 4...n, 8...n or generally 4k...n will give the same result. The explanation part looked at one cycle starting at 4k and ending at 4k + 3 with the running XOR sum being back at zero. Maybe this would have been the less confusing explanation, just using 4k instead of using n again with the constraint that it is divisible by four.
[(n & ~3), 1, (n & ~3) + 3, 0][n % 4]
where the (n & ~3) makes sure those lower 2 bits are cleared. But note that we only ever can look at the first element when n % 4 == 0. In that case, (n & ~3) == n already. And further, we only ever can look at the third element when n % 4 == 2. In that case (n & ~3) == n - 2, so (n & ~3) + 3 == n + 1. Hence the array can be simplified to the one given in the other comment.
The problem is that in vector sets, the HNSW graph has the invariant that each node has bidirectional links to a set of N nodes. If A links to B, then B links to A. This is unlike most other HNSW implementations. In mine, it is required that links are reciprocal, otherwise you get a crash.
Now, combine this with another fact: for speed concerns, Redis vector sets are not serialized as
element -> vector
And then reloaded and added back to the HNSW. This would be slow. Instead, what I do, is to serialize the graph itself. Each node with its unique ID and all the links. But when I load the graph back, I must be sure it is "sane" and will not crash my systems. And reciprocal links are one of the things to check. Checking that all the links are reciprocal could be done with an hash table (as in the post problem), but that would be slower and memory consuming, so how do we use XOR instead? Each time I see a link A -> B, I normalize it swapping A and B in case A>B. So if links are reciprocal I'll see A->B A->B two times, if I use a register to accumulate the two IDs and XOR them, at the end, if the register is NOT null I got issues: some link may not be reciprocal.However, in this specific case, there is a problem: collisions. The register may be 0 even if there are non reciprocal links in case they are fancy, that is, the non-reciprocal links are a few and they happen to XOR to 0. So, to fix this part, I use a strong (and large) hash function that will make the collision extremely unlikely.
It is nice now to see this post, since I was not aware of this algorithm when I used it a few weeks ago. Sure, at this point I'm old enough that never pretend I invented something, so I was sure this was already used in the past, but well, in case it was not used for reciprocal links testing, this is a new interview questions you may want to use for advanced candidates.
For every normalized link id x:
y = (x << k) | h(x) # append a k-bit hash to the id
acc ^= y
If acc is zero, all links are reciprocal (same guarantee as before).If acc is non-zero, split it back into (x', h'):
* Re-compute h(x').
* If it equals h', exactly one link is unpaired and x' tells you which one (or an astronomically unlikely collision). Otherwise there are >= 2 problems.
This has collision-resistance like the parent comment and adds the ability to pinpoint a single offending link without a second pass or a hash table.
The XOR solution was a valid answer, but not the only answer we would have happily accepted.
The interview question was chosen such that it's very easy to understand and quick to solve, meaning it would indicate the candidate knew at least the basics of programming in C#. Almost surprisingly, we actually had candidates applying for "senior" level positions who struggled with this.
It could be solved in a multitude of ways, e.g:
- XOR as above
- Use of a HashSet<int>
- Use for loop and List which contains a number and its count.
- Use LINQ to group the numbers or something and then find the one with the count.
As long as what they did worked, it was a "valid" answer, we could then often discuss the chosen solution with the candidate and see how they reacted when we let them know of other valid solutions.
It was really great for not being a "one clever trick" question and could act as a springboard to slightly deeper discussions into their technical thought processes and understanding.
a := a + b;
b := a - b;
a := a - b;
I'm still proud of little me and I always remember this solution when I encounter XOR tricks. I didn't knew about bitwise arithmetic at that time, but sometimes simple `+` can work just as well.It's an array of integers so it fits in memory (otherwise it wouldn't be called an array). As it fits in memory, n cannot be that big. I'd still ask for more requirements, TopCoder problem style: I want to know how big n can be that the array fits in memory.
I didn't know that XOR trick. My solution would be a bit arrays with n bits and two for loops: one to light each bit corresponding to a number and one for loop to find the missing number.
And if my bit array doesn't fit in memory, then neither does the array from the problem (and certainly not the HashSet etc.).
That makes the problem harder which makes it more interesting, a lot of the solutions wouldn't work anymore (this isn't necessarily a good interview question though)
> We can thus search for u by applying this idea to one of the partitions and finding the missing element, and then find v by applying it to the other partition.
Since you already have u^v, you need only search for u, which immediately gives you v.
tromp is saying the last step can be simplified. There is no need to use the "XOR of all elements" method on the second partition to find v, since the earlier steps have given us u^v and u, so simply XORing those two values together gives v.
Basically xor swapping a[i] with a[j] triggered the evil logic when i was equal to j.
The state of RC4 consists of a random permutation of bytes. Whenever it outputs a value, it further permutes the state by swapping some bytes of the state. Th xor swap trick sets one of these values to zero, whenever RC4 attempts to swap the same item within the permutation. This gradually zeros out the state, until RC4 outputs the plaintext.
If you’re working on an architecture where a single multiplication and a bit shift is cheaper than N xor’s, and where xor, add, and sub are all the same cost, then you can get a performance win by computing the sum as N(N+1)/2; and you don’t need a blog post to understand why it works.
[n, 1, n+1, 0][n%4]
(https://en.wikipedia.org/wiki/Abelian_group -- I'll use ⋆ as the Abelian group's operation, and ~ for inversion, below.)
I believe you are implying:
(g(1) ⋆ ... ⋆ g(n)) ⋆ ~(g(i(1)) ⋆ g(i(2)) ⋆ ... ⋆ g(i(n-1))) = g(m)
where "m" is the group element index that is not covered by "i".
However, for this to work, it is requried that you can distribute the inversion ~ over the group operation ⋆, like this:
~(g(i(1)) ⋆ g(i(2)) ⋆ ... ⋆ g(i(n-1))) = ~g(i(1)) ⋆ ~g(i(2)) ⋆ ... ⋆ ~g(i(n-1))
because it is only after this step (i.e., after the distribution) that you can exploit the associativity and commutativity of operation ⋆, and reorder the elements in
g(1) ⋆ ... ⋆ g(n) ⋆ ~g(i(1)) ⋆ ~g(i(2)) ⋆ ... ⋆ ~g(i(n-1))
such that they pairwise cancel out, and leave only the "unmatched" (missing) element -- g(m).
However, where is it stated that inversion ~ can be distributed over group operation ⋆? The above wikipedia article does not spell that out as an axiom.
Wikipedia does mention "antidistributivity":
https://en.wikipedia.org/wiki/Distributive_property#Antidist...
(which does imply the distributivity in question here, once we restore commutativity); however, WP says this property is indeed used as an axiom ("in the more general context of a semigroup with involution"). So why is it not spelled out as one for Abelian groups?
... Does distributivity of inversion ~ over operation ⋆ follow from the other Abelian group axioms / properties? If so, how?
It does. For all x and y:
(1) ~x ⋆ x = 0 (definition of the inverse)
(2) ~y ⋆ y = 0 (definition of the inverse)
(3) (~x ⋆ x) ⋆ (~y ⋆ y) = 0 ⋆ 0 = 0 (from (1) and (2))
(4) (~x ⋆ ~y) ⋆ (x ⋆ y) = 0 (via associativity and commutativity)
In (4) we see that (~x ⋆ ~y) is the inverse of (x ⋆ y). That is to say, ~(x ⋆ y) = (~x ⋆ ~y). QED.Example 1: Find the missing number
xor =: (16 + 2b0110) b.
iota1000 =: (i. 1000)
missingNumber =: (xor/ iota1000) xor (xor/ iota1000 -. 129)
echo 'The missing number is ' , ": missingNumber
This print 'The missing number is 129'Example 2: Using a random permutation, find the missing number.
permuted =: (1000 ? 1000)
missingNumber = (xor/ permuted) xor (xor/ permuted -. ? 1000)
Example 3: find the missing number in this matrix. _ (< 2 2) } 5 5 $ (25 ? 25)
12 9 1 20 19
6 18 3 4 8
24 7 _ 15 23
11 21 10 2 5
0 16 17 22 14
Final test: repeat 10 times the example 3 (random matrices) and collect the time it takes you to solve it in a list of times, then compute the linear regression best fit by times %. (1 ,. i. 10)
Did you get better at solving it by playing more times?I am not affiliated with J, but in case you want to try some J code there is a playground: https://jsoftware.github.io/j-playground/bin/html2/
Edited: It seems I am procrastinating a lot about something I have to do but don't want to.
f =: ]`1:`>:`0:@.(4&|)"0
Then: (,. ; #: ; [: #: f) i.16
0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 1
2 0 0 1 0 0 0 1 1
3 0 0 1 1 0 0 0 0
4 0 1 0 0 0 1 0 0
5 0 1 0 1 0 0 0 1
6 0 1 1 0 0 1 1 1
7 0 1 1 1 0 0 0 0
8 1 0 0 0 1 0 0 0
9 1 0 0 1 0 0 0 1
....The parent's comment (also mine) has a style that was designed not to scare non J programmers. One should also consider that some people dislike J code so downvotes are the usual result except when the post provides some additional insight.
Finally, thank you for this small J lesson, is a pleasure to find here fellow J programmers.
> If more than two elements are missing (or duplicated), then analyzing the individual bits fails because there are several combinations possible for both 0 and 1 as results. The problem then seems to require more complex solutions, which are not based on XOR anymore.
If you consider XOR to be a little bit more general, I think you can still use something like the partitioning algorithm. That is to say, considering XOR on a bit level behaves like XOR_bit(a,b)=a+b%2, you might consider a generalized XOR_bit(a,b,k)=a+b%k. With this I think you can decide partitions with up to k missing numbers, but I'm too tired to verify/implement this right now.
That is, one should also prove a ^ (b ^ c) = (a ^ b) ^ c. Instinctive, but non-trivial.
In a typical error-correcting code usage, you have an encoder which takes your message, and adds some extra symbols at the end which are calculated so that the syndrome is zero. Then when receiving your message, the receiver calculates the syndrome and if it's not zero, they know that at least one error has occurred. By using the code's decoding algorithm, they can figure out the fewest (and thus hopefully most likely) number of changes which would result in that error syndrome, and use this information to (hopefully) correct the transmission error.
For the missing numbers problem, you can set x_i to "how many times does the number i appear?". Then since the syndrome is sum(x_i * G_i), you can compute the syndrome on an unordered list of the i's. You are expecting the syndrome to be the same as the syndrome of full set 1...n, so when it is not, you can figure out which few x_i's are wrong that would lead to the syndrome you observed. You have an advantage because you know how many numbers are missing, but it's only a slight one.
The author's solution is called the Hamming code: you set F(i) = i, and you do the additions by xoring. Using error-correcting codes generalize to more missing numbers as well, including using xor, but the math becomes more complicated: you would want to use a fancier code such as a BCH or Goppa code. These also use xor, but in more complicated ways.
XOR is equivalent to addition over the finite field F_2^m. So, in this field, we're calculating the sum. If we have two numbers missing, we calculate the sum and sum of squares, so we know:
x + y
x^2 + y^2
From which we can solve for x and y. (Note all the multiplications are Galois Field multiplications, not integer!)
Similarly for k numbers we calculate sums of higher powers and get a higher order polynomial equation that gives our answer. Of course, the same solution works over the integers and I'd imagine modular arithmetic as well (I haven't checked though).
This is also how BCH error-correction codes work (see https://en.wikipedia.org/wiki/BCH_code): a valid BCH codeword has sum(x^i where bit x is set in the codeword) = 0 for t odd powers i=1,3,5, ... Then if some bits get flipped, you will get a "syndrome" s_i := sum(x^i where bit x was flipped) for those odd powers. Solving from the syndrome to get the indices of the flipped bits is the same problem as here.
The general decoding algorithm is a bit involved, as you can see in the Wikipedia article, but it's not horribly difficult:
• First, extend the syndrome: it gives sum(x^i) for odd i, but you can compute the even powers s_2i = s_i^2.
• The syndrome is a sequence of field values s_i, but we can imagine it as a "syndrome polynomial" S(z) := sum(s_i z^i). This is only a conceptual step, not a computational one.
• We will find a polynomial L(z) which is zero at all errors z=x and nowhere else. This L is called a "locator" polynomial. It turns out (can be checked with some algebra) that L(z) satisfies a "key equation" where certain terms of L(z) * S(z) are zero. The key equation is (almost) linear: solve it with linear algebra (takes cubic time in the number of errors), or solve it faster with the Berlekamp-Massey algorithm (quadratic time instead, maybe subquadratic if you're fancy).
• Find the roots of L(z). There are tricks for this if its degree is low. If the degree is high then you usually just iterate over the field. This takes O(#errors * size of domain) time. It can be sped up by a constant factor using Chien's search algorithm, or by a logarithmic factor using an FFT or AFFT.
You can of course use a different error-correcting code if you prefer (e.g. binary Goppa codes).Edit: bullets are hard.
Further edit just to note: the "^" in the above text refers to powers over the finite field, not the xor operator.
> constant factor using Chien's search algorithm
Chien's search is only really reasonable for small field sizes... which I think doesn't really make sense in this application, where the list is long and the missing elements are relatively few.
Fortunately in characteristic 2 it's quite straight forward and fast to just factor the polynomial using the berlekamp trace algorithm.
L(z) = z^2 - (x+y)z + xy.
You already have x+y, but what's xy? You can compute it as ((x+y)^2 - (x^2 + y^2))/2. This technique generalizes to higher powers, though I forget the exact details: basically you can generate the coefficients of L from the sums of powers with a recurrence.Then you solve for the roots of L, either using your finite field's variant of the quadratic formula, or e.g. just by trying everything in the field.
* But wait, this doesn't actually work! *
Over fields of small characteristic, such as F_2^m, you need to modify the approach and use different powers. For example, in the equations above, I divided by 2. But over F_2^m in the example shown above, you cannot divide by 2, since 2=0. In fact, you cannot solve for (x,y) at all with only x+y and x^2 + y^2, because
(x+y)^2 = x^2 + y^2 + 2xy = x^2 + y^2 + 0xy (since 2=0) = x^2 + y^2
So having that second polynomial gives you no new information. So you need to use other powers such as cubes (a BCH code), or some other technique (e.g. a Goppa code). My sibling comment to yours describes the BCH case.And sometimes even faster than a load immediate, hence XOR AX, AX instead of MOV AX, 0.
Shorter usually mean faster, even if the instruction itself isn't faster.
Shorter basically means you can fit more in instruction cache, which should in theory improve performance marginally.
IIRC, Intel said a mov was the way to go for some now ancient x86 CPUs, though.
> Shorter usually means faster
It depends, so spouting generalities doesn't mean anything. Instruction cache line filling vs. cycle reduction vs. reservation station ordering is typically a compiler constraints optimization problem(s).
Because Arm64 has a zero register, and Arm32 has small immediates, and all instructions are uniformly long.
So you don't actually need the first loop (at least for the set of integers 1..n example), but bringing that up is probably out of scope for this article.
It's all theoretical though. On real world data sets that aren't small I don't see why you wouldn't hand these tasks off to C/Zig/Rust unless you're only running them once or twice.
I probably would have written it with a single loop, using the `enumerate` iterator adapter. But in Python, two loops is almost certainly more efficient.
Having only one loop gives you a better memory access pattern, because it's 2 XOR operations in between each memory access. Two loops is the same number of instructions, but it spends one loop ignoring memory and then another loop doing rapid-fire memory accesses. In python there's enough overhead that it's unlikely to matter. But in a faster language running on a busy machine that could make a real difference.
Is pretty much a standard for loop. Between that and
for n in numbers:
You can do pretty much the same things as a more conventional language.
You could also solve it pretty simply like this:
expected_sum = (n * (n + 1)) / 2
missing_num = expected_sum - sum(numbers)
This only iterates the list once. This would probably be my solution if I was given this task in an interview.
You seem to believe that "O(2n)"
for value in range(1, n + 1):
result ^= value
for value in A:
result ^= value
is slower than "O(n2)" for value in range(1, n + 1):
result ^= value
result ^= A[value-1]
simply because the latter has one "for loop" less. Am I misunderstanding you, or if not, why would this matter for speed?A "for" loop in Python isn't particularly cheap. It compiles to some static overhead to set up the iterator, then each loop iteration compiles to the "FOR_ITER" opcode, a "STORE_FAST" opcode to assign the iteration value to a variable, and the body of the loop.
"FOR_ITER" calls the "__next__()" method of the iterator (which is on top of the interpreter object stack), catches the StopIteration exception to know when to terminate the loop (by jumping past the loop body), and stores the iterator value to the top of the stack. What "__next__()" does is totally opaque - we don't know what kind of object A is - but since we've added the overhead of a function call already it wouldn't matter if it was a super tight bit of machine code, we're already paying a (relatively) hefty runtime cost.
A particularly bad implementation of "__next__()" for some custom iterable collection might be so stupid as to walk through the collection until it reaches the current index's item and returns that, so "for value in A" could in fact be O(n^2).
Plus, "result ^= A[value-1]" is substantially more work than "result ^= value", so even just on the loop bodies the two examples aren't very similar at all. Evaluating "A[value-1]" may wind up calling a "__getitem__()" method on A.
If A is, say, a linked list or binary tree, iterating it is very cheap but indexing it is O(n), so the second loop might be O(n^2) where the first is O(n).
So maybe we be a bit more Pythonic, and do:
for i, value in enumerate(A)
result ^= i
result ^= value
One loop, no indexing of A! But we've not actually saved anything: the __next__() method of enumerate's iterator will increment its index then call the __next__() method of A's iterator, (approximately) the same work as if we'd done two FOR_ITER, one for an index and one for A.Why would this matter for speed? I don't know. Unless 'n' is pretty big a human won't even notice the execution time of any of this code.
Each iteration of a for loop performs one index update and one termination comparison. For a simple body that is just an XOR, that's the difference between performing 5 operations (update, exit check, read array, XOR with value, XOR with index) per N elements in the one loop case versus 7 operations (update, exit, read array, XOR with value, then update, exit, XOR with index) in the two loop case. So we're looking at a 29% savings in operations.
It gets worse if the looping structure does not optimize to a raw, most basic for loop and instead constructs some kind of lazy collection iterator generalized for all kinds of collections it could iterate over.
The smaller the loop body, the higher the gains from optimizing the looping construct itself.
Whether it's a win in Python to use one or two loops isn't so clear, as a lot is hidden behind complex opcodes and opaque iterator implementations. Imperative testing might help, but a new interpreter version could change your results.
In any case, if we want to nitpick over performance we should be insisting on a parallel implementation to take advantage of the gobs of cores CPUs now have, but now we're on a micro-optimisation crusade and are ignoring the whole point of the article.
It's silly to as ask a web dev these questions and expect these XOR approaches.
Low-level developers ("bare metal" as the kids say), on the other hand? They should have a deep enough understanding of binary representation and bitwise operations to approach these problems with logic gates.
The epitome of turning technical interviews into a trivia contest to make them feel smart. Because isn't that the point of a tech interview?
I haven't even read the article so I don't know what this is about really but if an interviewer seriously asked me about some obscure xor trick I'd laugh at them.
I believe you under-estimate what a good interviewer is trying to do with questions such as these:
Either you've seen the trick before and you get an opportunity to show the interviewer that you're an honest person by telling him you have. Huge plus and the interview can move on to other topics.
Either you haven't and you can demonstrate to the interviewer your analytical skills by dissecting the problem step by step and understanding what the code actually does and how.
Bonus if you can see the potential aliasing problem when used to swap two variables.
Not a trivia question at all.
You shouldn't expect it to be possible during the course of the interview for those who don't know it already, it makes no sense to expect that.
At best, the question will check if someone memorized such stuff. But I don't see a lot of value in that.
Stop asking these asinine questions and ask questions relevant to real-world software engineering. Software engineers are their own worst enemies.
One aspect of XOR is that it is the same as binary addition without carry, and therefore it does not overflow.
They're really evil on modern CPUs.
`XOR[0...n] = 0 ^ 1 .... ^ n = [n, 1, n + 1, 0][n % 4]`
XOR[0...x] = (x&1^(x&2)>>1)+x*(~x&1)
Actually I found something through Gemini based on the table mod 4 idea in previous post. Thanks.
xor ax, ax
Than: mov ax, 0h
This is nonsensical, where does the second truth table come from? Instead you just observe that, by definition, 1^0 == 0^1.
For those who do/did assembly, this is the common way to set a register to zero in x86 assembly (probably not only) because the instruction does not need an operand, so is shorter, and executes in one cycle only.
However, then it is clearly still easier to just phrase everything in terms of = (equality) instead!
Equality is for binary inputs is also called XNOR, biconditional, iff, ↔, etc, which is the negation of XOR. But thinking of it immediately as "=" is much more straightforward.
Another advantage of = over ≠/xor is that equality is not just commutative and associative, it's intuitively obvious that it is associative. The associativity of ≠/xor is less obvious. Moreover, equality is also transitive, unlike inequality/xor.
Overall, equality seems a much more natural concept to reason with, yet I don't know of any languages which have a bitwise equality/XNOR/↔ operator, i.e. one that operates on integers rather than Booleans.