← 返回
cs.CL updates on arXiv.org

cs.CL updates on arXiv.org

AI
更新于 2026-05-15 01:27 共 100 条
  1. 1 ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models
  2. 2 SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair
  3. 3 Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models
  4. 4 100,000+ Movie Reviews from Kazakhstan: Russian, Kazakh, and Code-Switched Texts
  5. 5 Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents
  6. 6 The Challenge and Reward of Fair Play in Narrative: A Computational Approach
  7. 7 Diffusion-State Policy Optimization for Masked Diffusion Language Models
  8. 8 RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German
  9. 9 HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
  10. 10 Instructions Shape Production of Language, not Processing
  11. 11 ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction
  12. 12 How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation
  13. 13 The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
  14. 14 Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary
  15. 15 Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions
  16. 16 ReAD: Reinforcement-Guided Capability Distillation for Large Language Models
  17. 17 Modeling Narrative Structure in Latin Epic Poetry with Automatically Generated Story Grammars
  18. 18 Predicting Psychological Well-Being from Spontaneous Speech using LLMs
  19. 19 Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment
  20. 20 SOMA: Efficient Multi-turn LLM Serving via Small Language Model
  21. 21 MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models
  22. 22 Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
  23. 23 READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
  24. 24 An Empirical Study of Automating Agent Evaluation
  25. 25 RACC: Representation-Aware Coverage Criteria for LLM Safety Testing
  26. 26 Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
  27. 27 More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing
  28. 28 Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training
  29. 29 Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics
  30. 30 Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty
  31. 31 KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
  32. 32 StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
  33. 33 How far can bias go? Tracing bias from pretraining data to alignment
  34. 34 Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations
  35. 35 Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis
  36. 36 A Study on Hidden Layer Distillation for Large Language Model Pre-Training
  37. 37 OASIS: A Multilingual and Multimodal Dataset for Culturally Grounded Spoken Visual QA
  38. 38 Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation
  39. 39 AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor
  40. 40 Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
  41. 41 Not Worth Mentioning? A Pilot Study on Salient Proposition Annotation
  42. 42 Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation
  43. 43 Self-Consolidating Language Models: Continual Knowledge Incorporation from Context
  44. 44 BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
  45. 45 PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning
  46. 46 Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference
  47. 47 Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing
  48. 48 Efficient LLM-based Advertising via Model Compression and Parallel Verification
  49. 49 Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
  50. 50 DiffScore: Text Evaluation Beyond Autoregressive Likelihood
  51. 51 KV Cache Offloading for Context-Intensive Tasks
  52. 52 PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head
  53. 53 ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV
  54. 54 When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models
  55. 55 UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs
  56. 56 OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models
  57. 57 World Action Models: The Next Frontier in Embodied AI
  58. 58 Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization
  59. 59 Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
  60. 60 Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability
  61. 61 Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
  62. 62 Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter
  63. 63 MEME: Multi-entity & Evolving Memory Evaluation
  64. 64 Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
  65. 65 Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction
  66. 66 Training-Inference Consistent Segmented Execution for Long-Context LLMs
  67. 67 To Err Is Human; To Annotate, SILICON? Toward Robust Reproducibility in LLM Annotation
  68. 68 Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control
  69. 69 Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages
  70. 70 From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
  71. 71 Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs
  72. 72 Choosing features for classifying multiword expressions
  73. 73 Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion
  74. 74 Probabilistic Calibration Is a Trainable Capability in Language Models
  75. 75 Prompting from the bench: Large-scale pretraining is not sufficient to prepare LLMs for ordinary meaning analysis
  76. 76 Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
  77. 77 When the Gold Standard Isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content
  78. 78 Concordance Comparison as a Means of Assembling Local Grammars
  79. 79 Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models
  80. 80 Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
  81. 81 Invisible failures in human-AI interactions
  82. 82 YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning
  83. 83 Asymmetric Advantage Modulation Calibrates Entropy Dynamics in RLVR
  84. 84 Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging
  85. 85 RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
  86. 86 On Predicting the Post-training Potential of Pre-trained LLMs
  87. 87 LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
  88. 88 Towards Visually-Guided Movie Subtitle Translation for Indic Languages
  89. 89 GRC: Unifying Reasoning-Driven Generation, Retrieval and Compression
  90. 90 Learning Agentic Policy from Action Guidance
  91. 91 ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models
  92. 92 SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation
  93. 93 Modality-Inconsistent Continual Learning of Multimodal Large Language Models
  94. 94 Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking
  95. 95 Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
  96. 96 SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs
  97. 97 Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
  98. 98 Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition
  99. 99 HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench
  100. 100 Do Language Models Encode Knowledge of Linguistic Constraint Violations?