Blog on EleutherAI Blog热榜 - Hot点·热榜

1 Early Indicators of Reward Hacking via Reasoning Interpolation ↗

2 Reward Hacking Resarch Update ↗

3 Pretraining Data Filtering for Open-Weight AI Safety ↗

4 Attention Probes ↗

5 Research Update: Applications of Local Volume Measurement ↗

6 Studying inductive biases of random networks via local volumes ↗

7 The Common Pile v0.1 ↗

8 Product Key Memory Sparse Coders ↗

9 SAEs trained on the same data don’t learn the same features ↗

10 Partially rewriting an LLM in natural language ↗

11 Third-party evaluation to identify risks in LLMs’ training data ↗

12 Mechanistic Anomaly Detection Research Update 2 ↗

13 RLHF and RLAIF in GPT-NeoX ↗

14 The Practitioner's Guide to the Maximal Update Parameterization ↗

15 Mechanistic Anomaly Detection Research Update ↗

16 Open Source Automated Interpretability for Sparse Autoencoder Features ↗

17 Experiments in Weak-to-Strong Generalization ↗

18 Free Form Least-Squares Concept Erasure Without Oracle Concept Labels ↗

19 VINC-S: Closed-form Optionally-supervised Knowledge Elicitation with Paraphrase Invariance ↗

20 Pile-T5 ↗

21 Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times ↗

22 The Foundation Model Development Cheatsheet ↗

23 Least-Squares Concept Erasure with Oracle Concept Labels ↗

24 Diff-in-Means Concept Editing is Worst-Case Optimal ↗

25 The third New England RLHF Hackers Hackathon ↗

26 Extending the RoPE ↗

27 How the Foundation Model Transparency Index Distorts Transparency ↗

28 Llemma: An Open Language Model For Mathematics ↗

29 The second New England RLHF Hackers Hackathon ↗

30 Contributor Spotlight: Mohammad Aflah Khan ↗

31 The first New England RLHF Hackers Hackathon ↗

32 EleutherAI's Thoughts on the EU AI Act ↗

33 Minetester: A fully open RL environment built on Minetest ↗

34 🐶Safetensors audited as really safe and becoming the default ↗

35 Alignment Research @ EleutherAI ↗

36 Transformer Math 101 ↗

37 Exploratory Analysis of TRLX RLHF Transformers with TransformerLens ↗

38 EleutherAI Second Retrospective: The long version ↗

39 The View from 30,000 Feet: Preface to the Second EleutherAI Retrospective ↗

40 Announcing GPT-NeoX-20B ↗

41 A Preliminary Exploration into Factored Cognition with Language Models ↗

42 Multiple Choice Normalization in LM Evaluation ↗

43 Downstream Evaluations of Rotary Position Embeddings ↗

44 What A Long, Strange Trip It's Been: EleutherAI One Year Retrospective ↗

45 Why Release a Large Language Model? ↗

46 On the Sizes of OpenAI API Models ↗

47 Evaluating Different Fewshot Description Prompts on GPT-3 ↗

48 Finetuning Models on Downstream Tasks ↗

49 Activation Function Ablation ↗

50 Rotary Embeddings: A Relative Revolution ↗