stanford-crfm-website热榜 - Hot点·热榜

1 HELM Arabic ↗

2 HELM Arabic ↗

3 HELM Long Context ↗

4 Reliable and Efficient Amortized Model-Based Evaluation ↗

5 Surprisingly Fast AI-Generated Kernels We Didn’t Mean to Publish (Yet) ↗

6 BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems ↗

7 HELM Capabilities: Evaluating LMs Capability by Capability ↗

8 General-Purpose AI Needs Coordinated Flaw Reporting ↗

9 HELM Safety: Towards Standardized Safety Evaluations of Language Models ↗

10 Advancing Customizable Benchmarking in HELM via Unitxt Integration ↗