Batched reward model inference and Best-of-N sampling
33 points
4 days ago
| 0 comments
| raw.sh
| HN
No one has commented on this post.