“咖啡館入口看起來可以進入嗎？門在哪裡？” 面向視覺詢問的地理空間人工智慧代理

2508.15752v1

中文标题#

“咖啡館入口看起來可以進入嗎？門在哪裡？” 面向視覺詢問的地理空間人工智慧代理

英文标题#

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

中文摘要#

互動式數位地圖徹底改變了人們旅行和了解世界的方式；然而，它們依賴於 GIS 資料庫中的現有結構化數據（例如，道路網絡、興趣點索引），這限制了它們解決與世界外觀相關的地理視覺問題的能力。我們提出了 Geo-Visual Agents 的願景 —— 一種多模態 AI 代理，能夠通過分析大規模地理空間圖像庫（包括街道景觀（例如，Google 街景）、基於地點的照片（例如，TripAdvisor、Yelp）和航空影像（例如，衛星照片）以及傳統 GIS 數據源，來理解和回應關於世界的細微視覺空間詢問。我們定義了我們的願景，描述了感知和互動方法，提供了三個示例，並列出了未來工作的關鍵挑戰和機遇。

英文摘要#

Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.

文章页面#

“咖啡館入口看起來可以進入嗎？門在哪裡？” 面向視覺詢問的地理空間人工智慧代理

PDF 获取#

查看中文 PDF - 2508.15752v1

智能達人抖店二維碼

抖音掃碼查看更多精彩內容