Fireworks F1: A Breakthrough in Complex Reasoning with Compound AI
17 points
5 days ago
| 5 comments
| fireworks.ai
| HN
heisenzombie
5 days ago
[-]
The math proof on the front page seems to be wrong. It gets the correct answer, of course, but the reasoning is nonsensical.

It incorrectly says to pick one point A and draw a great circle through it and the “antipodal point” A’ (which is then forgotten about). Then continues and says the great circle “contains A” and one hemisphere must have at least two points. It then asserts that 1+2=4 (!) and therefore 4 points lie on a hemisphere.

Instead, the question should have said to pick two points A and B, and construct a great circle through those. Then continue as before and 2+2=4, QED.

(This is not just a notation thing: A’ can’t, in general be one of the given points since there’s no guarantee that any two points will be antipodal)

reply
adt
5 days ago
[-]
reply
ff7250
5 days ago
[-]
The test time computation with Compound AI marks a significant breakthrough. It seems that model diversity fosters greater innovation and could potentially offer a more comprehensive solution compared to the more narrowly focused O1 approach. Let’s see how this unfolds.
reply
jaggs
5 days ago
[-]
Just failed the Rs in strawberry test.
reply
anonzzzies
5 days ago
[-]
What a clever question!

Let's break down the phrase "many mammas" to count the number of "ms":

many: "m" appears once Total "ms" so far: 1

mammas: "mm" appears together (which means 2 "ms") Total "ms" in "mammas": 2

Adding this to the previous total: 1 (from "many") + 2 (from "mammas") = 3

Therefore, there are 3 "ms" in the phrase "many mammas".

reply
ff7250
5 days ago
[-]
Just tried. it said 'There are 3 Rs in the word "strawberry"."'
reply
jaggs
5 days ago
[-]
Hmm...interesting. Definitely failed for me. Worrying inconsistency?
reply
bartleeanderson
4 days ago
[-]
It would be nice when they give you the fancy, come join us marketing bs that they tell you it is a paid service. No intention of paying for AI that doesn't make me money. Sheesh
reply