ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math
4 points
1 hour ago
| 0 comments
| firethering.com
| HN
No one has commented on this post.