zikele

zikele

人生如此自可乐

ViewActive:從單張圖像進行主動視角優化

2409.09997v5

中文标题#

ViewActive:從單張圖像進行主動視角優化

英文标题#

ViewActive: Active viewpoint optimization from a single image

中文摘要#

當觀察物體時,人類能夠利用其空間可視化和心理旋轉能力,根據當前觀察設想潛在的最佳視角。這種能力對於使機器人在操作過程中實現高效且穩健的場景感知至關重要,因為最佳視角提供了準確表示 2D 圖像中場景的關鍵且信息豐富的特徵,從而提升下游任務。為了賦予機器人這種類似人類的主動視角優化能力,我們提出了 ViewActive,這是一種現代化的機器學習方法,靈感來源於視圖圖,它僅基於當前的 2D 圖像輸入提供視角優化指導。具體而言,我們引入了 3D 視角質量場(VQF),這是一種類似於視圖圖的緊湊且一致的視角質量分佈表示,由三個通用的視角質量度量組成:自遮擋比例、佔用感知表面法線熵和視覺熵。我們使用預訓練的圖像編碼器提取魯棒的視覺和語義特徵,然後將其解碼為 3D VQF,使我們的模型能夠在各種物體上有效泛化,包括未見過的類別。輕量級的 ViewActive 網絡(單個 GPU 上 72 FPS)顯著提升了最先進的目標識別流程的性能,並可以集成到機器人應用的實時運動規劃中。我們的代碼和數據集在此處可用:https://github.com/jiayi-wu-umd/ViewActive.

英文摘要#

When observing objects, humans benefit from their spatial visualization and mental rotation ability to envision potential optimal viewpoints based on the current observation. This capability is crucial for enabling robots to achieve efficient and robust scene perception during operation, as optimal viewpoints provide essential and informative features for accurately representing scenes in 2D images, thereby enhancing downstream tasks. To endow robots with this human-like active viewpoint optimization capability, we propose ViewActive, a modernized machine learning approach drawing inspiration from aspect graph, which provides viewpoint optimization guidance based solely on the current 2D image input. Specifically, we introduce the 3D Viewpoint Quality Field (VQF), a compact and consistent representation of viewpoint quality distribution similar to an aspect graph, composed of three general-purpose viewpoint quality metrics: self-occlusion ratio, occupancy-aware surface normal entropy, and visual entropy. We utilize pre-trained image encoders to extract robust visual and semantic features, which are then decoded into the 3D VQF, allowing our model to generalize effectively across diverse objects, including unseen categories. The lightweight ViewActive network (72 FPS on a single GPU) significantly enhances the performance of state-of-the-art object recognition pipelines and can be integrated into real-time motion planning for robotic applications. Our code and dataset are available here: https://github.com/jiayi-wu-umd/ViewActive.

PDF 獲取#

查看中文 PDF - 2409.09997v5

智能達人抖店二維碼

抖音掃碼查看更多精彩內容

載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。