← 返回
Blog on EleutherAI Blog

Blog on EleutherAI Blog

AI
更新于 2026-05-15 01:28 共 50 条
  1. 1 Early Indicators of Reward Hacking via Reasoning Interpolation
  2. 2 Reward Hacking Resarch Update
  3. 3 Pretraining Data Filtering for Open-Weight AI Safety
  4. 4 Attention Probes
  5. 5 Research Update: Applications of Local Volume Measurement
  6. 6 Studying inductive biases of random networks via local volumes
  7. 7 The Common Pile v0.1
  8. 8 Product Key Memory Sparse Coders
  9. 9 SAEs trained on the same data don’t learn the same features
  10. 10 Partially rewriting an LLM in natural language
  11. 11 Third-party evaluation to identify risks in LLMs’ training data
  12. 12 Mechanistic Anomaly Detection Research Update 2
  13. 13 RLHF and RLAIF in GPT-NeoX
  14. 14 The Practitioner's Guide to the Maximal Update Parameterization
  15. 15 Mechanistic Anomaly Detection Research Update
  16. 16 Open Source Automated Interpretability for Sparse Autoencoder Features
  17. 17 Experiments in Weak-to-Strong Generalization
  18. 18 Free Form Least-Squares Concept Erasure Without Oracle Concept Labels
  19. 19 VINC-S: Closed-form Optionally-supervised Knowledge Elicitation with Paraphrase Invariance
  20. 20 Pile-T5
  21. 21 Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times
  22. 22 The Foundation Model Development Cheatsheet
  23. 23 Least-Squares Concept Erasure with Oracle Concept Labels
  24. 24 Diff-in-Means Concept Editing is Worst-Case Optimal
  25. 25 The third New England RLHF Hackers Hackathon
  26. 26 Extending the RoPE
  27. 27 How the Foundation Model Transparency Index Distorts Transparency
  28. 28 Llemma: An Open Language Model For Mathematics
  29. 29 The second New England RLHF Hackers Hackathon
  30. 30 Contributor Spotlight: Mohammad Aflah Khan
  31. 31 The first New England RLHF Hackers Hackathon
  32. 32 EleutherAI's Thoughts on the EU AI Act
  33. 33 Minetester: A fully open RL environment built on Minetest
  34. 34 🐶Safetensors audited as really safe and becoming the default
  35. 35 Alignment Research @ EleutherAI
  36. 36 Transformer Math 101
  37. 37 Exploratory Analysis of TRLX RLHF Transformers with TransformerLens
  38. 38 EleutherAI Second Retrospective: The long version
  39. 39 The View from 30,000 Feet: Preface to the Second EleutherAI Retrospective
  40. 40 Announcing GPT-NeoX-20B
  41. 41 A Preliminary Exploration into Factored Cognition with Language Models
  42. 42 Multiple Choice Normalization in LM Evaluation
  43. 43 Downstream Evaluations of Rotary Position Embeddings
  44. 44 What A Long, Strange Trip It's Been: EleutherAI One Year Retrospective
  45. 45 Why Release a Large Language Model?
  46. 46 On the Sizes of OpenAI API Models
  47. 47 Evaluating Different Fewshot Description Prompts on GPT-3
  48. 48 Finetuning Models on Downstream Tasks
  49. 49 Activation Function Ablation
  50. 50 Rotary Embeddings: A Relative Revolution