Skip to search formSkip to main contentSkip to account menu
DOI:10.48550/arXiv.2402.02385 - Corpus ID: 267411728
@article{Xu2024ASO, title={A Survey on Robotics with Foundation Models: toward Embodied AI}, author={Zhiyuan Xu and Kun Wu and Junjie Wen and Jinming Li and Ning Liu and Zhengping Che and Jian Tang}, journal={ArXiv}, year={2024}, volume={abs/2402.02385}, url={https://api.semanticscholar.org/CorpusID:267411728}}
- Zhiyuan Xu, Kun Wu, Jian Tang
- Published in arXiv.org 4 February 2024
- Computer Science, Engineering
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control, and showcases their commonly used datasets, simulators, and benchmarks.
Ask This Paper
BETA
AI-Powered
Unknown Error
An unexpected error occurred. Please try again.
No Answer Found
Ask another question that can be answered by this paper or rephrase your question.
We are still processing this paper
Please try again later.
Question Answering Unavailable
Please try again later.
No Response
The server took too long to answer your question. You can either rephrase your question or wait until it is less busy.
AI-Generated
Thank you for your feedback!
We're sorry, something went wrong while submitting this feedback.
Thank you for your feedback!
We're sorry, something went wrong while submitting this feedback.
Supporting Statements
63 References
- Yingdong HuFanqi LinTong ZhangLi YiYang Gao
- 2023
Computer Science, Engineering
ArXiv
This study introduces Robotic Vision-Language Planning (ViLa), a novel approach for long-horizon robotic planning that leverages vision-language models (VLMs) to generate a sequence of actionable steps that demonstrates ViLa's superiority over existing LLM-based planners, highlighting its effectiveness in a wide array of open-world manipulation tasks.
- Mingxiao HuoMingyu Ding W. Zhan
- 2023
Computer Science, Engineering
ArXiv
The Task Fusion Decoder is introduced as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks.
- Tianhe YuTed Xiao F. Xia
- 2023
Computer Science, Engineering
Robotics: Science and Systems
This work makes use of the state of the art text-to-image diffusion models and performs aggressive data augmentation on top of the existing robotic manipulation datasets via inpainting various unseen objects for manipulation, backgrounds, and distractors with text guidance and shows that manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors.
- 58 [PDF]
- Anthony BrohanNoah Brown Brianna Zitkovich
- 2023
Computer Science, Engineering
Robotics: Science and Systems
This paper presents a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties and verify the conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
- 326
- Highly Influential[PDF]
- Wenlong HuangF. Xia Brian Ichter
- 2022
Computer Science
CoRL
This work proposes that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios, and finds that closed-loop language feedback significantly improves high-level instruction completion on three domains.
- 400 [PDF]
- Jensen GaoBidipta Sarkar Dorsa Sadigh
- 2023
Engineering, Computer Science
ArXiv
It is demonstrated that fine-tuning a VLM on PhysObjects improves its understanding of physical object concepts, including generalization to held-out concepts, by capturing human priors of these concepts from visual appearance.
- 13 [PDF]
- Wenhao YuNimrod Gileadi F. Xia
- 2023
Computer Science, Engineering
ArXiv
A new paradigm is introduced that harnesses the semantic richness of LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks, and can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.
- 65 [PDF]
- Wenlong HuangF. Xia Brian Ichter
- 2023
Computer Science, Engineering
ArXiv
This guided decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models.
- 47
- Highly Influential
- PDF
- Junjie WenYichen Zhu Jian Tang
- 2024
Computer Science, Engineering
ArXiv
This work utilizes a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction, thus aiding the policy network in mastering actions for versatile manipulation and presents a feature reuse mechanism to integrate the vision-language features from off-the-shelf pre-trained MLLM into policy networks.
- Michael AhnAnthony Brohan Mengyuan Yan
- 2022
Computer Science, Engineering
CoRL
This work proposes to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate, and shows how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions.
- 813
- Highly Influential[PDF]
...
...
Related Papers
Showing 1 through 3 of 0 Related Papers