-
1
ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models
↗
-
2
SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair
↗
-
3
Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models
↗
-
4
100,000+ Movie Reviews from Kazakhstan: Russian, Kazakh, and Code-Switched Texts
↗
-
5
Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents
↗
-
6
The Challenge and Reward of Fair Play in Narrative: A Computational Approach
↗
-
7
Diffusion-State Policy Optimization for Masked Diffusion Language Models
↗
-
8
RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German
↗
-
9
HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
↗
-
10
Instructions Shape Production of Language, not Processing
↗
-
11
ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction
↗
-
12
How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation
↗
-
13
The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
↗
-
14
Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary
↗
-
15
Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions
↗
-
16
ReAD: Reinforcement-Guided Capability Distillation for Large Language Models
↗
-
17
Modeling Narrative Structure in Latin Epic Poetry with Automatically Generated Story Grammars
↗
-
18
Predicting Psychological Well-Being from Spontaneous Speech using LLMs
↗
-
19
Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment
↗
-
20
SOMA: Efficient Multi-turn LLM Serving via Small Language Model
↗
-
21
MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models
↗
-
22
Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
↗
-
23
READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
↗
-
24
An Empirical Study of Automating Agent Evaluation
↗
-
25
RACC: Representation-Aware Coverage Criteria for LLM Safety Testing
↗
-
26
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
↗
-
27
More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing
↗
-
28
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training
↗
-
29
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics
↗
-
30
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty
↗
-
31
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
↗
-
32
StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
↗
-
33
How far can bias go? Tracing bias from pretraining data to alignment
↗
-
34
Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations
↗
-
35
Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis
↗
-
36
A Study on Hidden Layer Distillation for Large Language Model Pre-Training
↗
-
37
OASIS: A Multilingual and Multimodal Dataset for Culturally Grounded Spoken Visual QA
↗
-
38
Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation
↗
-
39
AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor
↗
-
40
Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
↗
-
41
Not Worth Mentioning? A Pilot Study on Salient Proposition Annotation
↗
-
42
Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation
↗
-
43
Self-Consolidating Language Models: Continual Knowledge Incorporation from Context
↗
-
44
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
↗
-
45
PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning
↗
-
46
Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference
↗
-
47
Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing
↗
-
48
Efficient LLM-based Advertising via Model Compression and Parallel Verification
↗
-
49
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
↗
-
50
DiffScore: Text Evaluation Beyond Autoregressive Likelihood
↗
-
51
KV Cache Offloading for Context-Intensive Tasks
↗
-
52
PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head
↗
-
53
ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV
↗
-
54
When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models
↗
-
55
UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs
↗
-
56
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models
↗
-
57
World Action Models: The Next Frontier in Embodied AI
↗
-
58
Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization
↗
-
59
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
↗
-
60
Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability
↗
-
61
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
↗
-
62
Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter
↗
-
63
MEME: Multi-entity & Evolving Memory Evaluation
↗
-
64
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
↗
-
65
Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction
↗
-
66
Training-Inference Consistent Segmented Execution for Long-Context LLMs
↗
-
67
To Err Is Human; To Annotate, SILICON? Toward Robust Reproducibility in LLM Annotation
↗
-
68
Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control
↗
-
69
Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages
↗
-
70
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
↗
-
71
Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs
↗
-
72
Choosing features for classifying multiword expressions
↗
-
73
Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion
↗
-
74
Probabilistic Calibration Is a Trainable Capability in Language Models
↗
-
75
Prompting from the bench: Large-scale pretraining is not sufficient to prepare LLMs for ordinary meaning analysis
↗
-
76
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
↗
-
77
When the Gold Standard Isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content
↗
-
78
Concordance Comparison as a Means of Assembling Local Grammars
↗
-
79
Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models
↗
-
80
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
↗
-
81
Invisible failures in human-AI interactions
↗
-
82
YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning
↗
-
83
Asymmetric Advantage Modulation Calibrates Entropy Dynamics in RLVR
↗
-
84
Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging
↗
-
85
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
↗
-
86
On Predicting the Post-training Potential of Pre-trained LLMs
↗
-
87
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
↗
-
88
Towards Visually-Guided Movie Subtitle Translation for Indic Languages
↗
-
89
GRC: Unifying Reasoning-Driven Generation, Retrieval and Compression
↗
-
90
Learning Agentic Policy from Action Guidance
↗
-
91
ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models
↗
-
92
SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation
↗
-
93
Modality-Inconsistent Continual Learning of Multimodal Large Language Models
↗
-
94
Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking
↗
-
95
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
↗
-
96
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs
↗
-
97
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
↗
-
98
Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition
↗
-
99
HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench
↗
-
100
Do Language Models Encode Knowledge of Linguistic Constraint Violations?
↗