FilterHN

1 day ago

[-]

For calculating the XOR of 1 to n there is a closed form solution, so no need to XOR them together in a loop.

  (n & ((n & 1) - 1)) + ((n ^ (n >> 1)) & 1)

Or a much more readable version

  [ n, 1, n + 1, 0 ][n % 4]

which makes it clear that this function cycles through a pattern of length four.

Why this works can be seen if we start with some n that is divisible by four, i.e. it has the two least significant bits clear, and then keep XORing it with its successors. We start with xxxxxx00 which is our n. Then we XOR it with n + 1 which is xxxxxx01 and that clears all the x's and leaves us with 00000001. Now we XOR it with n + 2 which is xxxxxx10 and that yields xxxxxx11 which is n + 3. The cycle finishes when we now XOR it it with n + 3 which yields 00000000. So we get n, 1, n + 3, 0 and then the cycle repeats as we are back at zero and at n + 4 which is again divisible by four.

sdenton4

1 day ago

[-]

Nice!

My offhand solution not using xor is to subtract from the sum of 1 to n, which has a closed form solution. The closed form roughly halves the execution time, as we only have to iterate over the range once.

Good to know there's a similar speedup available on the xor path...

tomtomtom777

1 day ago

[-]

Fascinating. It can see it work but I still can't really wrap my head around where the magic cycle length of 4 comes from.

1 day ago

[-]

Combining two consecutive integers starting with an even one yields one.

  xxxxxxx0 ^ xxxxxxx1 = 00000001

If we start at a number divisible by four and do this twice, we get one twice.

  xxxxxx00 ^ xxxxxx01 = 00000001
  xxxxxx10 ^ xxxxxx11 = 00000001

And combining the two of course yields zero and we are right back at the start.

betasilly

1 day ago

[-]

Another interesting fact is that each time you make the xor of four consecutive numbers, beginning with an even number, the result is zero. Example in J.

  xor =: (16 + 2b0110) b.
  f =: 3 : 'xor/ y + i. 4'
  f"0 ] 2 * 1 + i. 100

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Summing a hundred millions: +/ f"0 ] 2 * i 100000000 gives zero (it takes a few seconds). So it seems the stated property holds for every even n.

1 day ago

[-]

Yes, because XORing two consecutive integers only differing in the least significant bit yields one.

  xxxxxxx0 ^ xxxxxxx1 = 00000001

Doing this twice with four consecutive numbers then also cancels the remaining one. That also means that you do not have to use consecutive numbers, you can use two arbitrary pairs

  2m ^ 2m + 1 ^ 2n ^ 2n + 1

and for example

  16 ^ 17 ^ 42 ^ 43

should be zero.

NickPollard

1 day ago

[-]

There are essentially two bits of information in the 'state' of this iterated algorithm: a) Are all the non-lowest bits zero, or are they the value of the latest N b) the value of the lowest bit

So the cycle of (N, 1, N+3, 0) corresponds to (A) and (B) being: (0,0), (0,1), (1,1), (1, 0) - i.e. the 4 possible combinations of these states.

HappyPanacea

1 day ago

[-]

If we generalize the problem to base k (they are k-1 duplicate of each number except the missing number, find missing one using base k-wise addition) then we can see the cycle is the smallest number such the base k-wise addition from 1 to the number is zero and it is power of k will form a cycle. I'm not sure if all such numbers are power of k if they exists or if there is an upper bound on them. For example in base 4 there appears to be no such cycle.

HappyPanacea

1 day ago

[-]

I made an arithmetical mistake in base 4, so I was wrong. I also wrote they are instead of there are.

I think the following is true: For even k the cycle is k^2 long and for odd k is k long. Why? because units' place of generalized xor from 1 to k-1 is (k^2-k)/2 and therefore zero mod k if k is odd, if k is even then if we repeat it twice we get zero. For the second digit, k times the same digit will always give zero. Thus for odd k we have a zero when n is divisible by k and for even k we have a zero when n is divisible by 2k and the smallest power of k divisible by 2k is k^2 so it must be the cycle length.

Thorrez

1 day ago

[-]

In your array-based equation, you say n+1, but in your explanation you say n+3. Is that a mistake?

1 day ago

[-]

No, that is correct, those two n represent slightly different things. n + 3 is the value after n XOR n + 1 XOR n + 2, so the n in the array index expression is n + 2 from the explaination and n + 3 results from (n + 2) + 1. I thought about how I could make this less confusing but it just became more confusing in my mind, so just used n in both cases.

Thorrez

1 day ago

[-]

I'm now seeing that they're different. However, this sounds a bit off to me:

>n + 3 is the value after n XOR n + 1 XOR n + 2, so the n in the array index expression is n + 2 from the explaination and n + 3 results from (n + 2) + 1.

The reason I think it's off is that array index expression the start of the sum is 1, but in the explanation the start of the sum is n. So I don't think it's as simple as the ending being different by 2.

23 hours ago

[-]

In the array expression the array values and the index depend on n and both vary, in the explanation the n is fixed. Let us do an example starting at say 8.

  n =  8   [  8, 1,  9, 0 ][ 8 % 4] =  8 = n
  n =  9   [  9, 1, 10, 0 ][ 9 % 4] =  1 
  n = 10   [ 10, 1, 11, 0 ][10 % 4] = 11 = n + 1
  n = 11   [ 11, 1, 12, 0 ][11 % 4] =  0

  n =  8   n                         = 8               =  8 = n
           n ^ n + 1                 = 8 ^ 9           =  1
           n ^ n + 1 ^ n + 2         = 8 ^ 9 ^ 10      = 11 = n + 3
           n ^ n + 1 ^ n + 2 ^ n + 3 = 8 ^ 9 ^ 10 ^ 11 =  0

So in the explanation we get 11 = n + 3 still with reference to the starting value n = 8, in the array expression on the other hand we have moved on to n = 10 when we pull 11 = n + 1 out of the array.

19 hours ago

[-]

Amendment to the parent comment. Note that the arrays contain values like 9 and 10 that are not the XOR of 1 to n for any n but they will also never be accessed because when they appear in the array, then the index used will point to a different element.

Also because we end up back at zero when ever n % 4 == 3, there is some flexibility about the starting point. I wrote 1 to n because that is what the article used, but it would be mathematically cleaner to start at zero which actually changes nothing because XORing with zero does nothing. And we do not have to start at zero at all, we can start at any number divisible by four and less than n because the running XOR sum will become zero just before each multiple of four. So XORing together 0...n, 1...n, 4...n, 8...n or generally 4k...n will give the same result. The explanation part looked at one cycle starting at 4k and ending at 4k + 3 with the running XOR sum being back at zero. Maybe this would have been the less confusing explanation, just using 4k instead of using n again with the constraint that it is divisible by four.

skullt

1 day ago

[-]

There's a bit of a trick in that solution: n is assumed to have the lower two bits clear so for an arbitrary n the array would really be:

[(n & ~3), 1, (n & ~3) + 3, 0][n % 4]

where the (n & ~3) makes sure those lower 2 bits are cleared. But note that we only ever can look at the first element when n % 4 == 0. In that case, (n & ~3) == n already. And further, we only ever can look at the third element when n % 4 == 2. In that case (n & ~3) == n - 2, so (n & ~3) + 3 == n + 1. Hence the array can be simplified to the one given in the other comment.

antirez

1 day ago

[-]

About one month ago I applied XOR in a similar (but a bit more complicated way) to Redis Vector Sets implementation, in the context of sanity check of loading a vset value from the RDB file. I believe the way it works is quite interesting and kinda extends the applicability of the trick in the post.

The problem is that in vector sets, the HNSW graph has the invariant that each node has bidirectional links to a set of N nodes. If A links to B, then B links to A. This is unlike most other HNSW implementations. In mine, it is required that links are reciprocal, otherwise you get a crash.

Now, combine this with another fact: for speed concerns, Redis vector sets are not serialized as

    element -> vector

And then reloaded and added back to the HNSW. This would be slow. Instead, what I do, is to serialize the graph itself. Each node with its unique ID and all the links. But when I load the graph back, I must be sure it is "sane" and will not crash my systems. And reciprocal links are one of the things to check. Checking that all the links are reciprocal could be done with an hash table (as in the post problem), but that would be slower and memory consuming, so how do we use XOR instead? Each time I see a link A -> B, I normalize it swapping A and B in case A>B. So if links are reciprocal I'll see A->B A->B two times, if I use a register to accumulate the two IDs and XOR them, at the end, if the register is NOT null I got issues: some link may not be reciprocal.

However, in this specific case, there is a problem: collisions. The register may be 0 even if there are non reciprocal links in case they are fancy, that is, the non-reciprocal links are a few and they happen to XOR to 0. So, to fix this part, I use a strong (and large) hash function that will make the collision extremely unlikely.

It is nice now to see this post, since I was not aware of this algorithm when I used it a few weeks ago. Sure, at this point I'm old enough that never pretend I invented something, so I was sure this was already used in the past, but well, in case it was not used for reciprocal links testing, this is a new interview questions you may want to use for advanced candidates.

hundredwatt

1 day ago

[-]

A neat trick to make the accumulator both collision-resistant and self-diagnosing.

  For every normalized link id x:
      y = (x << k) | h(x)   # append a k-bit hash to the id
      acc ^= y

If acc is zero, all links are reciprocal (same guarantee as before).

If acc is non-zero, split it back into (x', h'):

* Re-compute h(x').

* If it equals h', exactly one link is unpaired and x' tells you which one (or an astronomically unlikely collision). Otherwise there are >= 2 problems.

This has collision-resistance like the parent comment and adds the ability to pinpoint a single offending link without a second pass or a hash table.

jonathanlydall

1 day ago

[-]

This was a go to interview question to be solved in C# at a place I worked at a while back which had developers allocated to projects working on pretty standard line of business systems.

The XOR solution was a valid answer, but not the only answer we would have happily accepted.

The interview question was chosen such that it's very easy to understand and quick to solve, meaning it would indicate the candidate knew at least the basics of programming in C#. Almost surprisingly, we actually had candidates applying for "senior" level positions who struggled with this.

It could be solved in a multitude of ways, e.g:

- XOR as above

- Use of a HashSet<int>

- Use for loop and List which contains a number and its count.

- Use LINQ to group the numbers or something and then find the one with the count.

As long as what they did worked, it was a "valid" answer, we could then often discuss the chosen solution with the candidate and see how they reacted when we let them know of other valid solutions.

It was really great for not being a "one clever trick" question and could act as a springboard to slightly deeper discussions into their technical thought processes and understanding.

dahcryn

1 day ago

[-]

you are missing the most obvious one, no? Sum both lists and take the difference, that's the missing number, since the items are guaranteed unique

vbezhenar

1 day ago

[-]

It is interesting for me to remember my very first programming task. The very first day I was introduced to programming with Pascal (I think I was 14), I was taught variables, assignments, arithmetic and was given a task to switch two variables (swap). I quickly solved it using third variable, but then I was asked to do it without third variable. It was very hard task for me, I spent few hours at home tackling it, but finally I solved it with a trick conceptually similar to XOR:

    a := a + b;
    b := a - b;
    a := a - b;

I'm still proud of little me and I always remember this solution when I encounter XOR tricks. I didn't knew about bitwise arithmetic at that time, but sometimes simple `+` can work just as well.

zeroq

1 day ago

[-]

overflow

criddell

1 day ago

[-]

Would a BigInteger sum still overflow?

Arnavion

1 day ago

[-]

It doesn't matter. Overflow is a non-issue as long as you have wrapping addition and subtraction operators, which C# does - regular `+` and `-` not inside `checked {}`. You don't need to reach for BigInteger.

TacticalCoder

1 day ago

[-]

> "You are given an array A of n - 1 integers"

It's an array of integers so it fits in memory (otherwise it wouldn't be called an array). As it fits in memory, n cannot be that big. I'd still ask for more requirements, TopCoder problem style: I want to know how big n can be that the array fits in memory.

I didn't know that XOR trick. My solution would be a bit arrays with n bits and two for loops: one to light each bit corresponding to a number and one for loop to find the missing number.

And if my bit array doesn't fit in memory, then neither does the array from the problem (and certainly not the HashSet etc.).

williamdclt

1 day ago

[-]

You could make the problem harder with "you are given a stream of n - 1 integers". N could then be any number, unbound by available memory.

That makes the problem harder which makes it more interesting, a lot of the solutions wouldn't work anymore (this isn't necessarily a good interview question though)

Arnavion

1 day ago

[-]

Even with the original formulation, the array doesn't have to fit in available memory. mmap exists.

cluckindan

1 day ago

[-]

You are given a magnetic tape containing a list of n - 1 integers… :-)

jonathanlydall

1 day ago

[-]

In our case we gave the list of numbers for the input which was around a dozen so memory was not a concern, again keeping the problem pretty simple.

tromp

1 day ago

[-]

It's funny how the author fails to apply the XOR trick in the two missing values problem:

> We can thus search for u by applying this idea to one of the partitions and finding the missing element, and then find v by applying it to the other partition.

Since you already have u^v, you need only search for u, which immediately gives you v.

ethan_smith

1 day ago

[-]

Indeed - once you have u^v, finding u in one partition immediately gives you v = (u^v)^u, eliminating the need for the second search.

FabHK

1 day ago

[-]

How can you find u? That's what the author explains next.

Arnavion

1 day ago

[-]

The article says to use the "XOR of all elements" method to find u^v, then do the partitioning, then use the "XOR of all elements" method on the first partition to find u, then use the "XOR of all elements" method on the second partition to find v.

tromp is saying the last step can be simplified. There is no need to use the "XOR of all elements" method on the second partition to find v, since the earlier steps have given us u^v and u, so simply XORing those two values together gives v.

FabHK

1 day ago

[-]

Oh yes, you're right.

praptak

2 days ago

[-]

Fun fact: the xor swap fails when the variables are aliases. This was the trick used in one of the underhanded code competitions.

Basically xor swapping a[i] with a[j] triggered the evil logic when i was equal to j.

CodesInChaos

1 day ago

[-]

The submission by David Wagner, Philipe Biondi at https://bingweb.binghamton.edu/~scraver/underhanded/_page_id...

The state of RC4 consists of a random permutation of bytes. Whenever it outputs a value, it further permutes the state by swapping some bytes of the state. Th xor swap trick sets one of these values to zero, whenever RC4 attempts to swap the same item within the permutation. This gradually zeros out the state, until RC4 outputs the plaintext.

vaylian

1 day ago

[-]

It would set a[i] to zero instead of swapping two values, right?

praptak

1 day ago

[-]

Yes. Now we only need a legit use case for code that swaps values only if they are in different locations, otherwise zeroes the aliased location. Then we can finally do it using the xor swap!

analog31

2 days ago

[-]

The first thing that occurred to me is that if a number is missing from a list, the sum of that list will fall short. But I like XOR's.

anitil

2 days ago

[-]

It really tickles my brain in a lovely way that it avoids all overflow risk as well

repiret

2 days ago

[-]

There is no overflow risk. The trick works on any Abelian group. N-bit values form an Albanian group with xor where 0 is the identity and every element is its own inverse. But N-bit values also form an Abelian group under addition with overflow, where 0 is the identity and 2s-compliment is the inverse.

If you’re working on an architecture where a single multiplication and a bit shift is cheaper than N xor’s, and where xor, add, and sub are all the same cost, then you can get a performance win by computing the sum as N(N+1)/2; and you don’t need a blog post to understand why it works.

ikurei

1 day ago

[-]

I think they meant that XOR avoids the overflow risk, whereas doing the sum of the array to figure out which number could cause an overflow.

jonathrg

1 day ago

[-]

(wrapping) overflow doesn't affect the final result.

lblume

1 day ago

[-]

You can also calculate the XOR-accumulation of all values between 1 and n in O(1) using a lookup table like this:

    [n, 1, n+1, 0][n%4]

https://en.wikipedia.org/wiki/Distributive_property#Antidist...

OjotCewIo

1 day ago

[-]

> The trick works on any Abelian group

(https://en.wikipedia.org/wiki/Abelian_group -- I'll use ⋆ as the Abelian group's operation, and ~ for inversion, below.)

I believe you are implying:

(g(1) ⋆ ... ⋆ g(n)) ⋆ ~(g(i(1)) ⋆ g(i(2)) ⋆ ... ⋆ g(i(n-1))) = g(m)

where "m" is the group element index that is not covered by "i".

However, for this to work, it is requried that you can distribute the inversion ~ over the group operation ⋆, like this:

~(g(i(1)) ⋆ g(i(2)) ⋆ ... ⋆ g(i(n-1))) = ~g(i(1)) ⋆ ~g(i(2)) ⋆ ... ⋆ ~g(i(n-1))

because it is only after this step (i.e., after the distribution) that you can exploit the associativity and commutativity of operation ⋆, and reorder the elements in

g(1) ⋆ ... ⋆ g(n) ⋆ ~g(i(1)) ⋆ ~g(i(2)) ⋆ ... ⋆ ~g(i(n-1))

such that they pairwise cancel out, and leave only the "unmatched" (missing) element -- g(m).

However, where is it stated that inversion ~ can be distributed over group operation ⋆? The above wikipedia article does not spell that out as an axiom.

Wikipedia does mention "antidistributivity":

(which does imply the distributivity in question here, once we restore commutativity); however, WP says this property is indeed used as an axiom ("in the more general context of a semigroup with involution"). So why is it not spelled out as one for Abelian groups?

... Does distributivity of inversion ~ over operation ⋆ follow from the other Abelian group axioms / properties? If so, how?

FBT

1 day ago

[-]

> ... Does distributivity of inversion ~ over operation ⋆ follow from the other Abelian group axioms / properties? If so, how?

It does. For all x and y:

  (1) ~x ⋆ x = 0 (definition of the inverse)
  (2) ~y ⋆ y = 0 (definition of the inverse)
  (3) (~x ⋆ x) ⋆ (~y ⋆ y) = 0 ⋆ 0 = 0 (from (1) and (2))
  (4) (~x ⋆ ~y) ⋆ (x ⋆ y) = 0 (via associativity and commutativity)

In (4) we see that (~x ⋆ ~y) is the inverse of (x ⋆ y). That is to say, ~(x ⋆ y) = (~x ⋆ ~y). QED.

stephencanon

1 day ago

[-]

Right. Another way to see this is that for a general (possibly non-Abelian) group, the inverse of xy is y⁻¹x⁻¹ (because xyy⁻¹x⁻¹ = x1x⁻¹ = xx⁻¹ = 1 [using "1" for the identity here, as is typical for general groups], or more colloquially, "the inverse operation of putting on your socks and shoes is taking off your shoes and socks"). For an Abelian group, y⁻¹x⁻¹ = x⁻¹y⁻¹, and we're done.

OjotCewIo

1 day ago

[-]

Awesome, thanks! :)

analog31

2 days ago

[-]

True, I hadn't thought of that. I'm spoiled by Python. ;-)

meindnoch

1 day ago

[-]

Sum and xor are the same, but over different fields.

iotasilly

1 day ago

[-]

Since J allow you to write short code, here are three example in J. The first use iota1000, the second a random permutation, and the third use matrix notation to create a little guessing game.

Example 1: Find the missing number

  xor =: (16 + 2b0110) b.
  iota1000 =: (i. 1000) 
  missingNumber =: (xor/ iota1000) xor (xor/ iota1000 -. 129) 
  echo 'The missing number is ' , ": missingNumber

This print 'The missing number is 129'

Example 2: Using a random permutation, find the missing number.

   permuted =: (1000 ? 1000)
   missingNumber = (xor/ permuted) xor (xor/ permuted -. ? 1000)

Example 3: find the missing number in this matrix.

  _ (< 2 2) } 5 5 $ (25 ? 25) 

   12  9  1 20 19
    6 18  3  4  8
   24  7  _ 15 23
   11 21 10  2  5
    0 16 17 22 14

Final test: repeat 10 times the example 3 (random matrices) and collect the time it takes you to solve it in a list of times, then compute the linear regression best fit by

  times %. (1 ,. i. 10)

Did you get better at solving it by playing more times?

I am not affiliated with J, but in case you want to try some J code there is a playground: https://jsoftware.github.io/j-playground/bin/html2/

Edited: It seems I am procrastinating a lot about something I have to do but don't want to.

Pompidou

1 day ago

[-]

the top comment trick [ n, 1, n + 1, 0 ][n % 4] can be implemented in J as following :

   f =: ]`1:`>:`0:@.(4&|)"0

Then:

   (,. ; #: ; [: #: f) i.16

 0    0 0 0 0    0 0 0 0    
 1    0 0 0 1    0 0 0 1    
 2    0 0 1 0    0 0 1 1    
 3    0 0 1 1    0 0 0 0    
 4    0 1 0 0    0 1 0 0    
 5    0 1 0 1    0 0 0 1    
 6    0 1 1 0    0 1 1 1    
 7    0 1 1 1    0 0 0 0    
 8    1 0 0 0    1 0 0 0    
 9    1 0 0 1    0 0 0 1

....

[1] https://news.ycombinator.com/item?id=42859077

betasilly

22 hours ago

[-]

Hello fellow J programmer. In statistic you can estimate the population size by coloring fishes that you put in a lake and some time later you fish in that lake, the proportion of colored fishes allow you to estimate the size of the population. Four month ago I posted some J identities [1] and you were the only one which commented, that means that in this capture I only got one fish and it was the colored fish. This imply that there must be very few J programmers HN, or more precisely, very few of them that post J related material.

The parent's comment (also mine) has a style that was designed not to scare non J programmers. One should also consider that some people dislike J code so downvotes are the usual result except when the post provides some additional insight.

Finally, thank you for this small J lesson, is a pleasure to find here fellow J programmers.

akovaski

2 days ago

[-]

The partitioning algorithm to find two missing/duplicate numbers is clever, I wouldn't have thought of that. It should also work if you have a list with 1 missing and 1 duplicate, yeah? You'd probably have to do an extra step to actually find out which number is missing and which is a duplicate after you find the two numbers.

> If more than two elements are missing (or duplicated), then analyzing the individual bits fails because there are several combinations possible for both 0 and 1 as results. The problem then seems to require more complex solutions, which are not based on XOR anymore.

If you consider XOR to be a little bit more general, I think you can still use something like the partitioning algorithm. That is to say, considering XOR on a bit level behaves like XOR_bit(a,b)=a+b%2, you might consider a generalized XOR_bit(a,b,k)=a+b%k. With this I think you can decide partitions with up to k missing numbers, but I'm too tired to verify/implement this right now.

hsfzxjy

2 days ago

[-]

To derive "The XOR trick" I think both *associativity* and communitativity are needed.

That is, one should also prove a ^ (b ^ c) = (a ^ b) ^ c. Instinctive, but non-trivial.

kavouras

1 day ago

[-]

Yeah that's what I was thinking, you need both

OjotCewIo

1 day ago

[-]

yep, you need both; and in fact the definition includes both: https://en.wikipedia.org/wiki/Abelian_group

woadwarrior01

1 day ago

[-]

Anyone interested in bit-level tricks like this, should have a copy of Hacker's Delight on their bookshelf.

nullc

2 days ago

[-]

Generalizing an 'xor accumulator' support set difference of more than one element is interesting: https://github.com/bitcoin-core/minisketch

mrbluecoat

2 days ago

[-]

PTSD for me on this topic due to a week wasted cleaning up PHP malware using XOR for obfuscation and encryption: https://www.godaddy.com/resources/news/php-malware-and-xor-e...

1 day ago

[-]

Adding to some other comments in the thread: finding missing or extra numbers is closely related to error-correcting codes, especially binary linear codes. In an error-correcting code, you have a string of bits or symbols, with symbol x_i appearing at position i. You choose the code so that valid sequences have a certain mathematical property, and then if one or a few symbols are corrupted, then you can use that property to correct the errors. The property is typically that a certain linear function called the "syndrome" is zero, meaning that sum(x_i * G_i) = 0 where each G_i is some strategically chosen vector, particular to the code. The math for how to correct is particular to the chosen G_i, and it's a really interesting field of study.

In a typical error-correcting code usage, you have an encoder which takes your message, and adds some extra symbols at the end which are calculated so that the syndrome is zero. Then when receiving your message, the receiver calculates the syndrome and if it's not zero, they know that at least one error has occurred. By using the code's decoding algorithm, they can figure out the fewest (and thus hopefully most likely) number of changes which would result in that error syndrome, and use this information to (hopefully) correct the transmission error.

For the missing numbers problem, you can set x_i to "how many times does the number i appear?". Then since the syndrome is sum(x_i * G_i), you can compute the syndrome on an unordered list of the i's. You are expecting the syndrome to be the same as the syndrome of full set 1...n, so when it is not, you can figure out which few x_i's are wrong that would lead to the syndrome you observed. You have an advantage because you know how many numbers are missing, but it's only a slight one.

The author's solution is called the Hamming code: you set F(i) = i, and you do the additions by xoring. Using error-correcting codes generalize to more missing numbers as well, including using xor, but the math becomes more complicated: you would want to use a fancier code such as a BCH or Goppa code. These also use xor, but in more complicated ways.

Straw

1 day ago

[-]

One can generalize this to k missing numbers the same way as we typically do for the addition case by using finite fields:

XOR is equivalent to addition over the finite field F_2^m. So, in this field, we're calculating the sum. If we have two numbers missing, we calculate the sum and sum of squares, so we know:

x + y

x^2 + y^2

From which we can solve for x and y. (Note all the multiplications are Galois Field multiplications, not integer!)

Similarly for k numbers we calculate sums of higher powers and get a higher order polynomial equation that gives our answer. Of course, the same solution works over the integers and I'd imagine modular arithmetic as well (I haven't checked though).

1 day ago

[-]

This will depend on the field, and for F_2^m you want odd powers: sum(x), sum(x^3), sum(x^5) etc. Using sum(x^2) won't help because squaring over F_2^m is a field homomorphism, meaning that sum(x^2) = sum(x)^2.

This is also how BCH error-correction codes work (see https://en.wikipedia.org/wiki/BCH_code): a valid BCH codeword has sum(x^i where bit x is set in the codeword) = 0 for t odd powers i=1,3,5, ... Then if some bits get flipped, you will get a "syndrome" s_i := sum(x^i where bit x was flipped) for those odd powers. Solving from the syndrome to get the indices of the flipped bits is the same problem as here.

The general decoding algorithm is a bit involved, as you can see in the Wikipedia article, but it's not horribly difficult:

  • First, extend the syndrome: it gives sum(x^i) for odd i, but you can compute the even powers s_2i = s_i^2.

  • The syndrome is a sequence of field values s_i, but we can imagine it as a "syndrome polynomial" S(z) := sum(s_i z^i).  This is only a conceptual step, not a computational one.

  • We will find a polynomial L(z) which is zero at all errors z=x and nowhere else.  This L is called a "locator" polynomial.  It turns out (can be checked with some algebra) that L(z) satisfies a "key equation" where certain terms of L(z) * S(z) are zero.  The key equation is (almost) linear: solve it with linear algebra (takes cubic time in the number of errors), or solve it faster with the Berlekamp-Massey algorithm (quadratic time instead, maybe subquadratic if you're fancy).

  • Find the roots of L(z).  There are tricks for this if its degree is low.  If the degree is high then you usually just iterate over the field.  This takes O(#errors * size of domain) time.  It can be sped up by a constant factor using Chien's search algorithm, or by a logarithmic factor using an FFT or AFFT.

You can of course use a different error-correcting code if you prefer (e.g. binary Goppa codes).

Edit: bullets are hard.

Further edit just to note: the "^" in the above text refers to powers over the finite field, not the xor operator.

Straw

1 day ago

[-]

Good catch, thank you!

nullc

1 day ago

[-]

Yesterday I linked to an implementation (with complexity quadratic in the number of errors) I helped to create in another comment in this thread.

> constant factor using Chien's search algorithm

Chien's search is only really reasonable for small field sizes... which I think doesn't really make sense in this application, where the list is long and the missing elements are relatively few.

Fortunately in characteristic 2 it's quite straight forward and fast to just factor the polynomial using the berlekamp trace algorithm.

21 hours ago

[-]

Oh yeah, factoring the polynomial is also a good idea. For a long enough list that ought to be better than AFFT too.

noman-land

1 day ago

[-]

Can you explain a bit about how and why the higher powers work?

1 day ago

[-]

If you imagine a polynomial L(z) that's zero at all the missing numbers, you can expand the coefficients out. For example, with 2 missing numbers (x,y), you have:

   L(z) = z^2 - (x+y)z + xy.

You already have x+y, but what's xy? You can compute it as ((x+y)^2 - (x^2 + y^2))/2. This technique generalizes to higher powers, though I forget the exact details: basically you can generate the coefficients of L from the sums of powers with a recurrence.

Then you solve for the roots of L, either using your finite field's variant of the quadratic formula, or e.g. just by trying everything in the field.

* But wait, this doesn't actually work! *

Over fields of small characteristic, such as F_2^m, you need to modify the approach and use different powers. For example, in the equations above, I divided by 2. But over F_2^m in the example shown above, you cannot divide by 2, since 2=0. In fact, you cannot solve for (x,y) at all with only x+y and x^2 + y^2, because

  (x+y)^2   =   x^2 + y^2 + 2xy   =   x^2 + y^2 + 0xy (since 2=0)   =   x^2 + y^2

So having that second polynomial gives you no new information. So you need to use other powers such as cubes (a BCH code), or some other technique (e.g. a Goppa code). My sibling comment to yours describes the BCH case.

burnt-resistor

2 days ago

[-]

In ye olden days, bit manip operations were faster than algebraic operations.

And sometimes even faster than a load immediate, hence XOR AX, AX instead of MOV AX, 0.

GuB-42

2 days ago

[-]

"xor ax, ax" is still in use today. The main advantage is that it is shorter, just 2 bytes instead of 3 for the immediate, the difference is bigger in 32 and 64 bit mode as you have to have all these zeroes in the instruction.

Shorter usually mean faster, even if the instruction itself isn't faster.

sparkie

2 days ago

[-]

In long mode, compilers will typically emit `xor eax, eax`, as it only needs 2 bytes: The opcode and modrm byte. `xor ax, ax` takes 3 bytes due to the operand size override prefix (0x66), and `xor rax, rax` takes 3 bytes due to the REX.W prefix. `xor eax, eax` will still clear the full 64-bit register.

Shorter basically means you can fit more in instruction cache, which should in theory improve performance marginally.

Someone

1 day ago

[-]

Size isn’t everything. You should start by reading the manual for your CPU to see what it advises. The micro-architecture may treat only one of the sequences specially. For modern x64, I think that indeed is the shorter xor sequence, where, internally, the CPU just renames the register to a register that always contains zero, making the instruction independent of any earlier instructions using eax.

IIRC, Intel said a mov was the way to go for some now ancient x86 CPUs, though.

tyfighter

2 days ago

[-]

Modern x86 implementations don't even do the XOR. It just renames the register to "zero".

burnt-resistor

2 days ago

[-]

Barely. x86 is fading. Arm doesn't do this in GCC or Clang.

> Shorter usually means faster

It depends, so spouting generalities doesn't mean anything. Instruction cache line filling vs. cycle reduction vs. reservation station ordering is typically a compiler constraints optimization problem(s).

userbinator

2 days ago

[-]

Arm doesn't do this in GCC or Clang.

Because Arm64 has a zero register, and Arm32 has small immediates, and all instructions are uniformly long.

heisenbit

1 day ago

[-]

And in these modern days it matters that an algorithm can use divide and conquer and can be parallelized. Xor plays nice here. Also the lack of carry bits and less branching help in the crypto space.

2 days ago

[-]

Why do people hate traditional for loops so much? In a conversation about petty micro optimizations, we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"?

ToValueFunfetti

2 days ago

[-]

xor(1..n) = switch(n % 4) { case 0: return n; case 1: return 1; case 2: return n + 1; default: return 0; }

So you don't actually need the first loop (at least for the set of integers 1..n example), but bringing that up is probably out of scope for this article.

devjab

1 day ago

[-]

I think you raise a good question, but Python doesn't have a traditional for loop. To do it in one loop, you'd either have to simulate a traditional for loop with something like range, or you'd have to build a c/zig/rust lib and use it with cffi (or whatever rust uses that I forgot what was named). Or you're going to do it the "pythonic" way and write two loops, probably with a generator. As far as micro optimisation I'd argue that it depends on what you want. Speed or stable memory consumption? The single loop will be faster (for the most part) but the flip side is that there is a limit on how big of a data set it can handle.

It's all theoretical though. On real world data sets that aren't small I don't see why you wouldn't hand these tasks off to C/Zig/Rust unless you're only running them once or twice.

delifue

2 days ago

[-]

Its main benefit is to avoid having extra data structure (like hash map) to find the missing or duplicate, using O(n) time and O(1) space.

2 days ago

[-]

No, again, that's not my point. The code from the article is O(2n) when it could be O(n). I know we're not supposed to care about constant factors, but I've lived in a world where not micro optimizing the ever loving shit out of my software could potentially make people throw up, so this sort of stuff kind of stands out to me.

repiret

2 days ago

[-]

The code in the article is written in Python, whose only for loop is for-each. It is 2N XOR operations, regardless of whether you use one or two loops.

I probably would have written it with a single loop, using the `enumerate` iterator adapter. But in Python, two loops is almost certainly more efficient.

Dylan16807

2 days ago

[-]

You can loop over the range and then do result ^= i ^ A[i]. If adapters are slow you don't need them here.

Having only one loop gives you a better memory access pattern, because it's 2 XOR operations in between each memory access. Two loops is the same number of instructions, but it spends one loop ignoring memory and then another loop doing rapid-fire memory accesses. In python there's enough overhead that it's unlikely to matter. But in a faster language running on a busy machine that could make a real difference.

sfn42

1 day ago

[-]

for i in range(n - 1):

Is pretty much a standard for loop. Between that and

for n in numbers:

You can do pretty much the same things as a more conventional language.

You could also solve it pretty simply like this:

expected_sum = (n * (n + 1)) / 2

missing_num = expected_sum - sum(numbers)

This only iterates the list once. This would probably be my solution if I was given this task in an interview.

perfmode

2 days ago

[-]

real world performance will depend on how much of that N fits in cache. and in what cache it fits (L1, 2, 3). once loaded, you may not pay much cost to access each value a second time.

MindSpunk

2 days ago

[-]

Doing 2 loops over the data means you have to do a full pass over the data twice. If your N doesn’t fit in L3 then you’re going to load N twice instead of once. Loading twice, even out of L1 is still slower than never loading twice at all.

2 days ago

[-]

Exactly. And there's also the fact that sometimes the data stream we're processing is unbounded and ephemeral. For example, reading values from a physical sensor. It may not match up to this specific example, but the point remains that a single "loop" over a data set might be all you get, so pack as much into that loop as you can.

NoahZuniga

2 days ago

[-]

O(2n) doesn't exist. The whole point of big O is that you ignore such "trivial" things as what factor comes before the n

2 days ago

[-]

Did I not say that?

haskellshill

2 days ago

[-]

>we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"

You seem to believe that "O(2n)"

  for value in range(1, n + 1):
    result ^= value
  for value in A:
    result ^= value

is slower than "O(n2)"

  for value in range(1, n + 1):
    result ^= value
    result ^= A[value-1]

simply because the latter has one "for loop" less. Am I misunderstanding you, or if not, why would this matter for speed?

a_t48

2 days ago

[-]

Unless both loops get unrolled it's ever so slightly slower due to having to check for the end value twice. Plus potentially a cache hit at the start of the second loop.

codebje

1 day ago

[-]

None of this is as straightforward as it seems.

A "for" loop in Python isn't particularly cheap. It compiles to some static overhead to set up the iterator, then each loop iteration compiles to the "FOR_ITER" opcode, a "STORE_FAST" opcode to assign the iteration value to a variable, and the body of the loop.

"FOR_ITER" calls the "__next__()" method of the iterator (which is on top of the interpreter object stack), catches the StopIteration exception to know when to terminate the loop (by jumping past the loop body), and stores the iterator value to the top of the stack. What "__next__()" does is totally opaque - we don't know what kind of object A is - but since we've added the overhead of a function call already it wouldn't matter if it was a super tight bit of machine code, we're already paying a (relatively) hefty runtime cost.

A particularly bad implementation of "__next__()" for some custom iterable collection might be so stupid as to walk through the collection until it reaches the current index's item and returns that, so "for value in A" could in fact be O(n^2).

Plus, "result ^= A[value-1]" is substantially more work than "result ^= value", so even just on the loop bodies the two examples aren't very similar at all. Evaluating "A[value-1]" may wind up calling a "__getitem__()" method on A.

If A is, say, a linked list or binary tree, iterating it is very cheap but indexing it is O(n), so the second loop might be O(n^2) where the first is O(n).

So maybe we be a bit more Pythonic, and do:

    for i, value in enumerate(A)
        result ^= i
        result ^= value

One loop, no indexing of A! But we've not actually saved anything: the __next__() method of enumerate's iterator will increment its index then call the __next__() method of A's iterator, (approximately) the same work as if we'd done two FOR_ITER, one for an index and one for A.

Why would this matter for speed? I don't know. Unless 'n' is pretty big a human won't even notice the execution time of any of this code.

1 day ago

[-]

Even assuming python's foreach loop in these cases get optimized down to a very bare for loop, the operations being performed are dominated by the looping logic itself, because the loop body is so simple.

Each iteration of a for loop performs one index update and one termination comparison. For a simple body that is just an XOR, that's the difference between performing 5 operations (update, exit check, read array, XOR with value, XOR with index) per N elements in the one loop case versus 7 operations (update, exit, read array, XOR with value, then update, exit, XOR with index) in the two loop case. So we're looking at a 29% savings in operations.

It gets worse if the looping structure does not optimize to a raw, most basic for loop and instead constructs some kind of lazy collection iterator generalized for all kinds of collections it could iterate over.

The smaller the loop body, the higher the gains from optimizing the looping construct itself.

codebje

2 days ago

[-]

You did, but it might not be an effective strategy to mention asymptotic complexity to help forward your argument that one linear implementation is faster than another.

Whether it's a win in Python to use one or two loops isn't so clear, as a lot is hidden behind complex opcodes and opaque iterator implementations. Imperative testing might help, but a new interpreter version could change your results.

In any case, if we want to nitpick over performance we should be insisting on a parallel implementation to take advantage of the gobs of cores CPUs now have, but now we're on a micro-optimisation crusade and are ignoring the whole point of the article.

iainmerrick

1 day ago

[-]

You said the code from the article is O(2n) when it could be O(n), but those are the same thing.

anitil

2 days ago

[-]

I think it's just an interesting approach to solving particular limited problems. If I needed to solve this I'd end up either using set arithmetic or sorting the list, both of which use more memory and time. Maybe down low in some compiler loop or JVM loop this could be the difference between a sluggish application and a snappy one

2 days ago

[-]

That's not my point. My point is that the exact same code from the original article could be done in a single, traditional for-loop, instead of two for-each loops.

ameliaquining

2 days ago

[-]

Ah, my least favorite technical interview question. (I've been asked it, but only after I first read about it online.)

phendrenad2

2 days ago

[-]

Indeed, it kind of feels like asking if someone knows what the number 5318008 means.

anthomtb

1 day ago

[-]

Horses for courses.

It's silly to as ask a web dev these questions and expect these XOR approaches.

Low-level developers ("bare metal" as the kids say), on the other hand? They should have a deep enough understanding of binary representation and bitwise operations to approach these problems with logic gates.

motorest

1 day ago

[-]

> Ah, my least favorite technical interview question.

The epitome of turning technical interviews into a trivia contest to make them feel smart. Because isn't that the point of a tech interview?

empiko

1 day ago

[-]

Is there any other field where they give you random brain teasers for an interview? My friends outside of IT were laughing their heads off when they hears about the usual interview process.

sfn42

1 day ago

[-]

I've always had reasonable interview questions. Get some data from an API and display it in a table. Make a class that can store car data and get them by plate number. Make a class that calculates tax based on a bracket system.

I haven't even read the article so I don't know what this is about really but if an interviewer seriously asked me about some obscure xor trick I'd laugh at them.

ur-whale

1 day ago

[-]

In what way is that question trivia?

I believe you under-estimate what a good interviewer is trying to do with questions such as these:

Either you've seen the trick before and you get an opportunity to show the interviewer that you're an honest person by telling him you have. Huge plus and the interview can move on to other topics.

Either you haven't and you can demonstrate to the interviewer your analytical skills by dissecting the problem step by step and understanding what the code actually does and how.

Bonus if you can see the potential aliasing problem when used to swap two variables.

Not a trivia question at all.

commandlinefan

1 day ago

[-]

I knew a guy who would ask the binary search question in interviews (i.e. "you have an array of sorted values, what's the fastest way to find if an element is in the array?"). I always felt like this was an unfair question to ask somebody as well - it doesn't seem like something you'd be able to come up with on your own if you hadn't seen it _in an interview situation_. OTOH it's a quick way to screen people who actually did a CS degree.

shmerl

1 day ago

[-]

Imagine asking to prove a relatively difficult theorem. That's a similar type of question and it's a waste of time during an interview. Once you know the proof (know the algorithm), the idea might seem trivial, but coming up with such idea (inventing the algorithm) took people possibly a very long time in the first place.

You shouldn't expect it to be possible during the course of the interview for those who don't know it already, it makes no sense to expect that.

At best, the question will check if someone memorized such stuff. But I don't see a lot of value in that.

snozolli

1 day ago

[-]

It has no connection to modern software engineering. It's a clever and irrelevant trick for 99.999% of programming jobs out there.

Stop asking these asinine questions and ask questions relevant to real-world software engineering. Software engineers are their own worst enemies.

mzs

2 days ago

[-]

I like the 'store prev ^ next' trick for lists that can be walked from the front or from the back.

https://en.wikipedia.org/wiki/XOR_linked_list

canyp

1 day ago

[-]

The more interesting xor trick beyond interview trivia is the xor list:

gblargg

1 day ago

[-]

Gray code is something semi-related. For hardware encoders of a position you want only one transition between states, that is, the XOR of the two to have only one bit set. Normal binary has multiple transitions between some values (e.g. three bit changes between 011 and 100). Gray code could be 000, 001, 011, 010, 110, 111, 101, 100.

https://en.wikipedia.org/wiki/Hamming_distance

anthk

1 day ago

[-]

Findecanor

1 day ago

[-]

I figured out the solution of using addition directly. A caveat with addition is that addition can grow the number of significant bits needed, and thus overflow (for large-enough values of n).

One aspect of XOR is that it is the same as binary addition without carry, and therefore it does not overflow.

gblargg

1 day ago

[-]

Use unsigned (modulo) and overflow doesn't affect the result.

XeO3

2 days ago

[-]

Apart from these applications of XOR, a favourite one is using Bitwise AND to find Even/Odd numbers.

cyberax

1 day ago

[-]

I think it misses the XOR trick for bi-directional lists: https://en.wikipedia.org/wiki/XOR_linked_list

They're really evil on modern CPUs.

gciruelos

1 day ago

[-]

a shameless plug of another blog post containing more xor trivia, for those interested: https://gciruelos.com/xor.html

st0le

2 days ago

[-]

Another fun trick I've discovered.

`XOR[0...n] = 0 ^ 1 .... ^ n = [n, 1, n + 1, 0][n % 4]`

nullc

2 days ago

[-]

Tables yuck :P, maybe

XOR[0...x] = (x&1^(x&2)>>1)+x*(~x&1)

bsdz

1 day ago

[-]

~Is there a simple proof for this type of identity?~

Actually I found something through Gemini based on the table mod 4 idea in previous post. Thanks.

tialaramex

2 days ago

[-]

Right, or in summary, no you don't need to all that extra work up front.

makeset

2 days ago

[-]

Fun fact: you can show that there is another binary operator that performs the same triple assignment swap.

daitangio

1 day ago

[-]

Very well written article! I used xor just as fast clear register :)

lsllc

1 day ago

[-]

Yes! in the old MS-DOS days (circa 286?), it was quicker in terms of cycles to do:

  xor ax, ax

Than:

  mov ax, 0h

ZoomZoomZoom

1 day ago

[-]

> XOR is commutative, meaning we can change the order in which we apply XOR. To prove this, we can check the truth table for both x ^ y and y ^ x

This is nonsensical, where does the second truth table come from? Instead you just observe that, by definition, 1^0 == 0^1.

johnea

2 days ago

[-]

Wow! This is a flashback! Hope you're doing well Andy W!

mytailorisrich

1 day ago

[-]

> XOR on the same argument: x ^ x = 0

For those who do/did assembly, this is the common way to set a register to zero in x86 assembly (probably not only) because the instruction does not need an operand, so is shorter, and executes in one cycle only.

cubefox

1 day ago

[-]

Pet peeve: It is common to describe XOR as a special logical operator ("either or"), but it is arguably easier to just describe it as ≠ (!=, not equal) for Boolean inputs.

However, then it is clearly still easier to just phrase everything in terms of = (equality) instead!

Equality is for binary inputs is also called XNOR, biconditional, iff, ↔, etc, which is the negation of XOR. But thinking of it immediately as "=" is much more straightforward.

Another advantage of = over ≠/xor is that equality is not just commutative and associative, it's intuitively obvious that it is associative. The associativity of ≠/xor is less obvious. Moreover, equality is also transitive, unlike inequality/xor.

Overall, equality seems a much more natural concept to reason with, yet I don't know of any languages which have a bitwise equality/XNOR/↔ operator, i.e. one that operates on integers rather than Booleans.

ur-whale

1 day ago

[-]

One interesting problem related to the trick (which as pointed out elsewhere in the thread, fails spectacularly when the two variables alias to the same memory location) is to find other dyadic functions of integers that have the same property.