ViewActive：从单张图像进行主动视角优化

2409.09997v5

中文标题#

ViewActive：从单张图像进行主动视角优化

英文标题#

ViewActive: Active viewpoint optimization from a single image

中文摘要#

当观察物体时，人类能够利用其空间可视化和心理旋转能力，根据当前观察设想潜在的最佳视角。这种能力对于使机器人在操作过程中实现高效且稳健的场景感知至关重要，因为最佳视角提供了准确表示 2D 图像中场景的关键且信息丰富的特征，从而提升下游任务。为了赋予机器人这种类似人类的主动视角优化能力，我们提出了 ViewActive，这是一种现代化的机器学习方法，灵感来源于视图图，它仅基于当前的 2D 图像输入提供视角优化指导。具体而言，我们引入了 3D 视角质量场（VQF），这是一种类似于视图图的紧凑且一致的视角质量分布表示，由三个通用的视角质量度量组成：自遮挡比例、占用感知表面法线熵和视觉熵。我们使用预训练的图像编码器提取鲁棒的视觉和语义特征，然后将其解码为 3D VQF，使我们的模型能够在各种物体上有效泛化，包括未见过的类别。轻量级的 ViewActive 网络（单个 GPU 上 72 FPS）显著提升了最先进的目标识别流程的性能，并可以集成到机器人应用的实时运动规划中。我们的代码和数据集在此处可用：https://github.com/jiayi-wu-umd/ViewActive.

英文摘要#

When observing objects, humans benefit from their spatial visualization and mental rotation ability to envision potential optimal viewpoints based on the current observation. This capability is crucial for enabling robots to achieve efficient and robust scene perception during operation, as optimal viewpoints provide essential and informative features for accurately representing scenes in 2D images, thereby enhancing downstream tasks. To endow robots with this human-like active viewpoint optimization capability, we propose ViewActive, a modernized machine learning approach drawing inspiration from aspect graph, which provides viewpoint optimization guidance based solely on the current 2D image input. Specifically, we introduce the 3D Viewpoint Quality Field (VQF), a compact and consistent representation of viewpoint quality distribution similar to an aspect graph, composed of three general-purpose viewpoint quality metrics: self-occlusion ratio, occupancy-aware surface normal entropy, and visual entropy. We utilize pre-trained image encoders to extract robust visual and semantic features, which are then decoded into the 3D VQF, allowing our model to generalize effectively across diverse objects, including unseen categories. The lightweight ViewActive network (72 FPS on a single GPU) significantly enhances the performance of state-of-the-art object recognition pipelines and can be integrated into real-time motion planning for robotic applications. Our code and dataset are available here: https://github.com/jiayi-wu-umd/ViewActive.

PDF 获取#

查看中文 PDF - 2409.09997v5

智能达人抖店二维码

抖音扫码查看更多精彩内容