There has been a considerable effort to measure language model performance in academic tasks and chatbot settings but these high-level benchmarks are not applicable to specific industry use cases. Here we start to remedy this by reporting our application-specific findings and live leaderboard results on LegalBench, a large crowd-sourced collection of legal reasoning tasks. — Read More