cs.CL updates on arXiv.org热榜 - Hot点·热榜

1 ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models ↗

2 SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair ↗

3 Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models ↗

4 100,000+ Movie Reviews from Kazakhstan: Russian, Kazakh, and Code-Switched Texts ↗

5 Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents ↗

6 The Challenge and Reward of Fair Play in Narrative: A Computational Approach ↗

7 Diffusion-State Policy Optimization for Masked Diffusion Language Models ↗

8 RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German ↗

9 HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model ↗

10 Instructions Shape Production of Language, not Processing ↗

11 ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction ↗

12 How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation ↗

13 The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models ↗

14 Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary ↗

15 Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions ↗

16 ReAD: Reinforcement-Guided Capability Distillation for Large Language Models ↗

17 Modeling Narrative Structure in Latin Epic Poetry with Automatically Generated Story Grammars ↗

18 Predicting Psychological Well-Being from Spontaneous Speech using LLMs ↗

19 Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment ↗

20 SOMA: Efficient Multi-turn LLM Serving via Small Language Model ↗

21 MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models ↗

22 Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence ↗

23 READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling ↗

24 An Empirical Study of Automating Agent Evaluation ↗

25 RACC: Representation-Aware Coverage Criteria for LLM Safety Testing ↗

26 Deep Reasoning in General Purpose Agents via Structured Meta-Cognition ↗

27 More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing ↗

28 Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training ↗

29 Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics ↗

30 Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty ↗

31 KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference ↗

32 StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models ↗

33 How far can bias go? Tracing bias from pretraining data to alignment ↗

34 Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations ↗

35 Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis ↗

36 A Study on Hidden Layer Distillation for Large Language Model Pre-Training ↗

37 OASIS: A Multilingual and Multimodal Dataset for Culturally Grounded Spoken Visual QA ↗

38 Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation ↗

39 AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor ↗

40 Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting ↗

41 Not Worth Mentioning? A Pilot Study on Salient Proposition Annotation ↗

42 Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation ↗

43 Self-Consolidating Language Models: Continual Knowledge Incorporation from Context ↗

44 BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion ↗

45 PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning ↗

46 Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference ↗

47 Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing ↗

48 Efficient LLM-based Advertising via Model Compression and Parallel Verification ↗

49 Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG) ↗

50 DiffScore: Text Evaluation Beyond Autoregressive Likelihood ↗

51 KV Cache Offloading for Context-Intensive Tasks ↗

52 PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head ↗

53 ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV ↗

54 When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models ↗

55 UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs ↗

56 OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models ↗

57 World Action Models: The Next Frontier in Embodied AI ↗

58 Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization ↗

59 Reconstruction of Personally Identifiable Information from Supervised Finetuned Models ↗

60 Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability ↗

61 Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs ↗

62 Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter ↗

63 MEME: Multi-entity & Evolving Memory Evaluation ↗

64 Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation ↗

65 Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction ↗

66 Training-Inference Consistent Segmented Execution for Long-Context LLMs ↗

67 To Err Is Human; To Annotate, SILICON? Toward Robust Reproducibility in LLM Annotation ↗

68 Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control ↗

69 Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages ↗

70 From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction ↗

71 Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs ↗

72 Choosing features for classifying multiword expressions ↗

73 Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion ↗

74 Probabilistic Calibration Is a Trainable Capability in Language Models ↗

75 Prompting from the bench: Large-scale pretraining is not sufficient to prepare LLMs for ordinary meaning analysis ↗

76 Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models ↗

77 When the Gold Standard Isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content ↗

78 Concordance Comparison as a Means of Assembling Local Grammars ↗

79 Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models ↗

80 Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models ↗

81 Invisible failures in human-AI interactions ↗

82 YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning ↗

83 Asymmetric Advantage Modulation Calibrates Entropy Dynamics in RLVR ↗

84 Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging ↗

85 RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization ↗

86 On Predicting the Post-training Potential of Pre-trained LLMs ↗

87 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling ↗

88 Towards Visually-Guided Movie Subtitle Translation for Indic Languages ↗

89 GRC: Unifying Reasoning-Driven Generation, Retrieval and Compression ↗

90 Learning Agentic Policy from Action Guidance ↗

91 ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models ↗

92 SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation ↗

93 Modality-Inconsistent Continual Learning of Multimodal Large Language Models ↗

94 Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking ↗

95 Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO ↗

96 SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs ↗

97 Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner ↗

98 Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition ↗

99 HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench ↗

100 Do Language Models Encode Knowledge of Linguistic Constraint Violations? ↗