📊
Hot点·热榜
首页
综合
科技
娱乐
社区
购物
财经
更多 ▾
开发
AI
设计
🔍
✕
← 返回
cs.CV updates on arXiv.org
AI
更新于 2026-05-15 01:27
共 100 条
1
UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation
↗
2
ReLIC-SGG: Relation Lattice Completion for Open-Vocabulary Scene Graph Generation
↗
3
Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation
↗
4
STORM: Segment, Track, and Object Re-Localization from a Single Image
↗
5
GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking
↗
6
LoREnc: Low-Rank Encryption for Securing Foundation Models and LoRA Adapters
↗
7
Compact 3D Gaussian Splatting For Dense Visual SLAM
↗
8
M3Net: A Macro-to-Meso-to-Micro Clinical-inspired Hierarchical 3D Network for Pulmonary Nodule Classification
↗
9
VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority
↗
10
M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement
↗
11
Pyramid Self-contrastive Learning Framework for Test-time Ultrasound Image Denoising
↗
12
SSDA: Bridging Spectral and Structural Gaps via Dual Adaptation for Vision-Based Time Series Forecasting
↗
13
What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs
↗
14
CROP: Expert-Aligned Image Cropping via Compositional Reasoning and Optimizing Preference
↗
15
GUIGuard-Bench: Toward a General Evaluation for Privacy-Preserving GUI Agents
↗
16
Improving Diffusion Posterior Samplers with Lagged Temporal Corrections for Image Restoration
↗
17
Gradient-Free Noise Optimization for Reward Alignment in Generative Models
↗
18
DistractMIA: Black-Box Membership Inference on Vision-Language Models via Semantic Distraction
↗
19
Robust and Explainable Bicuspid Aortic Valve Diagnosis Using Stacked Ensembles on Echocardiography
↗
20
3D Primitives are a Spatial Language for VLMs
↗
21
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
↗
22
TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking
↗
23
SymbolSight: Minimizing Inter-Symbol Interference for Reading with Prosthetic Vision
↗
24
A Data Efficiency Study of Synthetic Fog for Object Detection Using the Clear2Fog Pipeline
↗
25
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
↗
26
MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation
↗
27
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
↗
28
DIVER:Diving Deeper into Distilled Data via Expressive Semantic Recovery
↗
29
CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving
↗
30
CRAFT: Clinical Reward-Aligned Finetuning for Medical Image Synthesis
↗
31
Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents
↗
32
No One Knows the State of the Art in Geospatial Foundation Models
↗
33
Data Agent: Learning to Select Data via End-to-End Dynamic Optimization
↗
34
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?
↗
35
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
↗
36
MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence
↗
37
Taming the Long Tail: Rebalancing Adversarial Training via Adaptive Perturbation
↗
38
Inline Critic Steers Image Editing
↗
39
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
↗
40
Is Video Anomaly Detection Misframed? Evidence from LLM-Based and Multi-Scene Models
↗
41
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
↗
42
Just Ask for a Table: A Thirty-Token User Prompt Defeats Sponsored Recommendations in Twelve LLMs
↗
43
UNIV: Unified Foundation Model for Infrared and Visible Modalities
↗
44
WildPose: A Unified Framework for Robust Pose Estimation in the Wild
↗
45
When Diffusion Breaks Constraints: Sequential Autoregressive Generation with RL and MCTS
↗
46
FRAME: Forensic Routing and Adaptive Multi-path Evidence Fusion for Image Manipulation Detection
↗
47
Aligning Forest and Trees in Images & Long Captions for Visually Grounded Understanding
↗
48
AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects
↗
49
Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance
↗
50
PRISM: Perinuclear Ring-based Image Segmentation Method for Acute Lymphoblastic Leukemia Classification
↗
51
NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
↗
52
Prediction of Rectal Cancer Regrowth from Longitudinal Endoscopy
↗
53
COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts
↗
54
Adaptive Conformal Prediction for Reliable and Explainable Medical Image Classification
↗
55
PicoEyes: Unified Gaze Estimation Framework for Mixed Reality with a Large-Scale Multi-View Dataset
↗
56
GuardMarkGS: Unified Ownership Tracing and Edit Deterrence for 3D Gaussian Splatting
↗
57
Evidence-based Decision Modeling for Synthetic Face Detection with Uncertainty-driven Active Learning
↗
58
Anatomy-Slot: Unsupervised Anatomical Factorization for Homologous Bilateral Reasoning in Retinal Diagnosis
↗
59
A Mimetic Detector for Adversarial Image Perturbations
↗
60
AuraMask: An Extensible Pipeline for Developing Aesthetic Anti-Facial Recognition Image Filters
↗
61
VIP: Visual-guided Prompt Evolution for Efficient Dense Vision-Language Inference
↗
62
CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation
↗
63
Energy Scaling Laws for Diffusion Models: Quantifying Compute in Image Generation
↗
64
DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport
↗
65
Test-Time Training with KV Binding Is Secretly Linear Attention
↗
66
Debunking Grad-ECLIP: A Comprehensive Study on Its Incorrectness and Fundamental Principles for Model Interpretation
↗
67
(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models
↗
68
Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation
↗
69
ThermalTap: Passive Application Fingerprinting in VR Headsets via Thermal Side Channels
↗
70
AdaFocus: Adaptive Relevance-Diversity Sampling with Zero-Cache Look-back for Efficient Long Video Understanding
↗
71
On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods
↗
72
GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion
↗
73
What Limits Vision-and-Language Navigation ?
↗
74
Reducing Bias and Variance: Generative Semantic Guidance and Bi-Layer Ensemble for Image Clustering
↗
75
DeepFilters: Scattering-Aware Pupil Engineering with Learned Digital Filter Reconstruction for Extended Depth of Field Microscopy
↗
76
Asymmetric Flow Models
↗
77
Min Generalized Sliced Gromov Wasserstein: A Scalable Path to Gromov Wasserstein
↗
78
ImageAttributionBench: How Far Are We from Generalizable Attribution?
↗
79
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
↗
80
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models
↗
81
PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution
↗
82
OCH3R: Object-Centric Holistic 3D Reconstruction
↗
83
NFR: Neural Feature-Guided Non-Rigid Shape Registration
↗
84
PRISM: Prior Rectification and Uncertainty-Aware Structure Modeling for Diffusion-Based Text Image Super-Resolution
↗
85
Scalable Object Detection in the Car Interior With Vision Foundation Models
↗
86
ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence
↗
87
Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing
↗
88
CoGE: Sim-to-Real Online Geometric Estimation for Monocular Colonoscopy
↗
89
The Joint Gromov Wasserstein Objective for Multiple Object Matching
↗
90
EgoForce: Robust Online Egocentric Motion Reconstruction via Diffusion Forcing
↗
91
Make-It-Poseable: Feed-forward Latent Posing Model for 3D Characters
↗
92
Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency
↗
93
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
↗
94
Uncertainty-aware Spatial-Frequency Registration and Fusion for Infrared and Visible Images
↗
95
Perception with Guarantees: Certified Pose Estimation via Reachability Analysis
↗
96
BrainAnytime: Anatomy-Aware Cross-Modal Pretraining for Brain Image Analysis with Arbitrary Modality Availability
↗
97
Exploring Multimodal LMMs for Online Episodic Memory Question Answering on the Edge
↗
98
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
↗
99
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows
↗
100
HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization
↗
🏠
全部
📡
综合
💻
科技
🎬
娱乐
💬
社区
↑