The thing I am curious about is vertical evals. BrowseComp and DRACO are useful, but they are not a finance, health, legal, or procurement workflow. If you have built a domain-specific research eval, I would like to compare notes!