[PDF] A Survey on Robotics with Foundation Models: toward Embodied AI

Skip to search formSkip to main contentSkip to account menu

Semantic ScholarSemantic Scholar's Logo

DOI:10.48550/arXiv.2402.02385
Corpus ID: 267411728

@article{Xu2024ASO, title={A Survey on Robotics with Foundation Models: toward Embodied AI}, author={Zhiyuan Xu and Kun Wu and Junjie Wen and Jinming Li and Ning Liu and Zhengping Che and Jian Tang}, journal={ArXiv}, year={2024}, volume={abs/2402.02385}, url={https://api.semanticscholar.org/CorpusID:267411728}}

Zhiyuan Xu, Kun Wu, Jian Tang
Published in arXiv.org 4 February 2024
Computer Science, Engineering

This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control, and showcases their commonly used datasets, simulators, and benchmarks.

[PDF] Semantic Reader

Ask This Paper

BETA

AI-Powered

63 References

Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning

Yingdong HuFanqi LinTong ZhangLi YiYang Gao

Computer Science, Engineering

ArXiv

2023

This study introduces Robotic Vision-Language Planning (ViLa), a novel approach for long-horizon robotic planning that leverages vision-language models (VLMs) to generate a sequence of actionable steps that demonstrates ViLa's superiority over existing LLM-based planners, highlighting its effectiveness in a wide array of open-world manipulation tasks.

[PDF]

Human-oriented Representation Learning for Robotic Manipulation

Mingxiao HuoMingyu Ding W. Zhan

Computer Science, Engineering

ArXiv

2023

The Task Fusion Decoder is introduced as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks.

[PDF]

Scaling Robot Learning with Semantically Imagined Experience

Tianhe YuTed Xiao F. Xia

Computer Science, Engineering

Robotics: Science and Systems

2023

This work makes use of the state of the art text-to-image diffusion models and performs aggressive data augmentation on top of the existing robotic manipulation datasets via inpainting various unseen objects for manipulation, backgrounds, and distractors with text guidance and shows that manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors.

[PDF]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony BrohanNoah Brown Brianna Zitkovich

Computer Science, Engineering

Robotics: Science and Systems

2023

This paper presents a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties and verify the conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.

326
Highly Influential

[PDF]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Wenlong HuangF. Xia Brian Ichter

Computer Science

CoRL

2022

This work proposes that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios, and finds that closed-loop language feedback significantly improves high-level instruction completion on three domains.

[PDF]

Physically Grounded Vision-Language Models for Robotic Manipulation

Jensen GaoBidipta Sarkar Dorsa Sadigh

Engineering, Computer Science

ArXiv

2023

It is demonstrated that fine-tuning a VLM on PhysObjects improves its understanding of physical object concepts, including generalization to held-out concepts, by capturing human priors of these concepts from visual appearance.

[PDF]

Language to Rewards for Robotic Skill Synthesis

Wenhao YuNimrod Gileadi F. Xia

Computer Science, Engineering

ArXiv

2023

A new paradigm is introduced that harnesses the semantic richness of LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks, and can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.

[PDF]

Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control

Wenlong HuangF. Xia Brian Ichter

Computer Science, Engineering

ArXiv

2023

This guided decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models.

47
Highly Influential
PDF

Object-Centric Instruction Augmentation for Robotic Manipulation

Junjie WenYichen Zhu Jian Tang

Computer Science, Engineering

ArXiv

2024

This work utilizes a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction, thus aiding the policy network in mastering actions for versatile manipulation and presents a feature reuse mechanism to integrate the vision-language features from off-the-shelf pre-trained MLLM into policy networks.

[PDF]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Michael AhnAnthony Brohan Mengyuan Yan

Computer Science, Engineering

CoRL

2022

This work proposes to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate, and shows how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions.

813
Highly Influential

[PDF]

...

Related Papers

Showing 1 through 3 of 0 Related Papers

[PDF] A Survey on Robotics with Foundation Models: toward Embodied AI | Semantic Scholar (2024)

FAQs

What is robotics and embodied AI lab real? ›

The Robotics and Embodied AI Lab (REAL) is a research lab in DIRO at the Université de Montréal and is also affiliated with Mila.

Show Me More ›

How is AI and robotics? ›

One of the key ways in which AI is used in robotics is through machine learning. This technique enables robots to learn and perform specific tasks through observing and mimicking human actions. AI gives robots a computer vision that enables them to navigate, detect and determine their reactions accordingly.

Discover More Details ›

How will AI and robotics affect education? ›

Role of Robots in Education

They can promote active engagement, problem-solving, and collaboration among students as active learning tools. By introducing robotics in the classroom, children can develop their critical thinking and creativity skills.

What is the difference between AI robotics and machine learning? ›

In summary, AI is the broad concept of creating intelligent machines, machine learning is a subset of AI that focuses on learning from data, and RPA is the automation of repetitive tasks using software robots.

Keep Reading ›

What is an example of embodied AI? ›

Self-driving cars and drones leverage embodied AI to sense and navigate their environments safely. This technology is essential for making transportation more efficient and reducing accidents. Robots equipped with embodied AI capabilities enhance manufacturing processes by automating repetitive and intricate tasks.

What is AI in simple words? ›

Artificial intelligence is the science of making machines that can think like humans. It can do things that are considered "smart." AI technology can process large amounts of data in ways, unlike humans. The goal for AI is to be able to do things such as recognize patterns, make decisions, and judge like humans.

View Details ›

Is it better to learn AI or machine learning? ›

If you're passionate about robotics or computer vision, for example, it might serve you better to jump into artificial intelligence. However, if you're exploring data science as a general career, machine learning offers a more focused learning track.

Are self replicating robots real? ›

November 29, 2021 a team at Harvard Wyss Institute built the first living robots that can reproduce.

Find Out More ›

What is the purpose of robotics lab? ›

Robotics labs provide an ideal environment for students to apply science, technology, engineering, and mathematics (STEM) concepts in a practical and meaningful way. Through building and programming robots, students can directly observe how these concepts come to life.

What is embodied intelligence in robotics? ›

Embodied Intelligence refers to “a computational approach to the design and understanding of intelligent behaviour in embodied and situated agents through the consideration of the strict coupling between the agent and its environment (situatedness), mediated by the constraints of the agent's own body, perceptual and ...

Is robotics the study of robots True or false? ›

Robotics is an interdisciplinary sector of science and engineering dedicated to the design, construction and use of mechanical robots.

View Details ›

[PDF] A Survey on Robotics with Foundation Models: toward Embodied AI | Semantic Scholar (2024)

63 References

Related Papers

FAQs

What is robotics and embodied AI lab real? ›

What is AI in simple words? ›

References