Arcee Trinity Mini: US-Trained Moe Model
33 points
3 hours ago
| 3 comments
| arcee.ai
| HN
halJordan
2 hours ago
[-]
Looks like a less good version of qwen 30b3a which makes sense bc it is slightly smaller. If they can keep that effiency going into the large one it'll be sick.

Trinity Large [will be] a 420B parameter model with 13B active parameters. Just perfect for a large Ram pool @ q4.

reply
htrp
2 hours ago
[-]
Trinity Nano Preview: 6B parameter MoE (1B active, ~800M non-embedding), 56 layers, 128 experts with 8 active per token

Trinity Mini: 26B parameter MoE (3B active), fully post-trained reasoning model

They did pretraining on their own and are still training the large version on 2048 B300 GPUs

reply
bitwize
2 hours ago
[-]
A moe model you say? How kawaii is it? uwu
reply
ghc
1 hour ago
[-]
Capitalization makes a surprising amount of difference here...
reply
donw
43 minutes ago
[-]
Meccha at present, but it may reach sugoi levels with fine-tuning.
reply
noxa
2 hours ago
[-]
I hate that I laughed at this. Thanks ;)
reply