[PDF] A Survey on Robotics with Foundation Models: toward Embodied AI | Semantic Scholar (2024)

Skip to search formSkip to main contentSkip to account menu

Semantic ScholarSemantic Scholar's Logo
@article{Xu2024ASO, title={A Survey on Robotics with Foundation Models: toward Embodied AI}, author={Zhiyuan Xu and Kun Wu and Junjie Wen and Jinming Li and Ning Liu and Zhengping Che and Jian Tang}, journal={ArXiv}, year={2024}, volume={abs/2402.02385}, url={https://api.semanticscholar.org/CorpusID:267411728}}
  • Zhiyuan Xu, Kun Wu, Jian Tang
  • Published in arXiv.org 4 February 2024
  • Computer Science, Engineering

This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control, and showcases their commonly used datasets, simulators, and benchmarks.

Ask This Paper

BETA

AI-Powered

63 References

Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
    Yingdong HuFanqi LinTong ZhangLi YiYang Gao

    Computer Science, Engineering

    ArXiv

  • 2023

This study introduces Robotic Vision-Language Planning (ViLa), a novel approach for long-horizon robotic planning that leverages vision-language models (VLMs) to generate a sequence of actionable steps that demonstrates ViLa's superiority over existing LLM-based planners, highlighting its effectiveness in a wide array of open-world manipulation tasks.

Human-oriented Representation Learning for Robotic Manipulation
    Mingxiao HuoMingyu Ding W. Zhan

    Computer Science, Engineering

    ArXiv

  • 2023

The Task Fusion Decoder is introduced as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks.

Scaling Robot Learning with Semantically Imagined Experience
    Tianhe YuTed Xiao F. Xia

    Computer Science, Engineering

    Robotics: Science and Systems

  • 2023

This work makes use of the state of the art text-to-image diffusion models and performs aggressive data augmentation on top of the existing robotic manipulation datasets via inpainting various unseen objects for manipulation, backgrounds, and distractors with text guidance and shows that manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors.

RT-1: Robotics Transformer for Real-World Control at Scale
    Anthony BrohanNoah Brown Brianna Zitkovich

    Computer Science, Engineering

    Robotics: Science and Systems

  • 2023

This paper presents a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties and verify the conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.

  • 326
  • Highly Influential
  • [PDF]
Inner Monologue: Embodied Reasoning through Planning with Language Models
    Wenlong HuangF. Xia Brian Ichter

    Computer Science

    CoRL

  • 2022

This work proposes that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios, and finds that closed-loop language feedback significantly improves high-level instruction completion on three domains.

Physically Grounded Vision-Language Models for Robotic Manipulation
    Jensen GaoBidipta Sarkar Dorsa Sadigh

    Engineering, Computer Science

    ArXiv

  • 2023

It is demonstrated that fine-tuning a VLM on PhysObjects improves its understanding of physical object concepts, including generalization to held-out concepts, by capturing human priors of these concepts from visual appearance.

Language to Rewards for Robotic Skill Synthesis
    Wenhao YuNimrod Gileadi F. Xia

    Computer Science, Engineering

    ArXiv

  • 2023

A new paradigm is introduced that harnesses the semantic richness of LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks, and can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.

Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control
    Wenlong HuangF. Xia Brian Ichter

    Computer Science, Engineering

    ArXiv

  • 2023

This guided decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models.

  • 47
  • Highly Influential
  • PDF
Object-Centric Instruction Augmentation for Robotic Manipulation
    Junjie WenYichen Zhu Jian Tang

    Computer Science, Engineering

    ArXiv

  • 2024

This work utilizes a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction, thus aiding the policy network in mastering actions for versatile manipulation and presents a feature reuse mechanism to integrate the vision-language features from off-the-shelf pre-trained MLLM into policy networks.

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
    Michael AhnAnthony Brohan Mengyuan Yan

    Computer Science, Engineering

    CoRL

  • 2022

This work proposes to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate, and shows how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions.

  • 813
  • Highly Influential
  • [PDF]

...

...

Related Papers

Showing 1 through 3 of 0 Related Papers

    [PDF] A Survey on Robotics with Foundation Models: toward Embodied AI | Semantic Scholar (2024)

    FAQs

    What is robotics and embodied AI lab real? ›

    The Robotics and Embodied AI Lab (REAL) is a research lab in DIRO at the Université de Montréal and is also affiliated with Mila.

    How is AI and robotics? ›

    One of the key ways in which AI is used in robotics is through machine learning. This technique enables robots to learn and perform specific tasks through observing and mimicking human actions. AI gives robots a computer vision that enables them to navigate, detect and determine their reactions accordingly.

    How will AI and robotics affect education? ›

    Role of Robots in Education

    They can promote active engagement, problem-solving, and collaboration among students as active learning tools. By introducing robotics in the classroom, children can develop their critical thinking and creativity skills.

    What is the difference between AI robotics and machine learning? ›

    In summary, AI is the broad concept of creating intelligent machines, machine learning is a subset of AI that focuses on learning from data, and RPA is the automation of repetitive tasks using software robots.

    What is an example of embodied AI? ›

    Self-driving cars and drones leverage embodied AI to sense and navigate their environments safely. This technology is essential for making transportation more efficient and reducing accidents. Robots equipped with embodied AI capabilities enhance manufacturing processes by automating repetitive and intricate tasks.

    What is the most realistic AI robot in the world? ›

    Ameca is the world's most advanced human shaped robot representing the forefront of human-robotics technology.

    Will AI disrupt education? ›

    With the advent of generative AI also came the fear of the existential impact of artificial general intelligence (AGI). In the delivery of learning, we can expect some significant changes that will make our classes more personalized, more closely monitored and more likely to meet outcomes.

    How will robotics change the future? ›

    The future of robotics includes performing operations remotely and exploring extra-terrestrial worlds by 2030. Robotics, one of the fastest-growing tech fields, is shaping the future of travel, work, and adventure. Advances in AI, computing, and IoT are driving these developments.

    What is the future of robotics in education? ›

    The Future Vision of Robotics in Education

    Looking ahead, the future of robotics in education is likely to be characterized by even more advanced and intuitive technologies. The concept of robotic assistants in classrooms could become a reality, providing support to teachers and offering students personalized guidance.

    Which is better to study AI or robotics? ›

    However, choosing between AI and robotics depends on the specific task at hand. Both fields have their unique strengths and weaknesses, and it's crucial to evaluate which one is best suited for the task. For example, if the task involves analyzing large amounts of data, AI may be the best option.

    What is AI in simple words? ›

    Artificial intelligence is the science of making machines that can think like humans. It can do things that are considered "smart." AI technology can process large amounts of data in ways, unlike humans. The goal for AI is to be able to do things such as recognize patterns, make decisions, and judge like humans.

    Is it better to learn AI or machine learning? ›

    If you're passionate about robotics or computer vision, for example, it might serve you better to jump into artificial intelligence. However, if you're exploring data science as a general career, machine learning offers a more focused learning track.

    Are self replicating robots real? ›

    November 29, 2021 a team at Harvard Wyss Institute built the first living robots that can reproduce.

    What is the purpose of robotics lab? ›

    Robotics labs provide an ideal environment for students to apply science, technology, engineering, and mathematics (STEM) concepts in a practical and meaningful way. Through building and programming robots, students can directly observe how these concepts come to life.

    What is embodied intelligence in robotics? ›

    Embodied Intelligence refers to “a computational approach to the design and understanding of intelligent behaviour in embodied and situated agents through the consideration of the strict coupling between the agent and its environment (situatedness), mediated by the constraints of the agent's own body, perceptual and ...

    Is robotics the study of robots True or false? ›

    Robotics is an interdisciplinary sector of science and engineering dedicated to the design, construction and use of mechanical robots.

    References

    Top Articles
    Latest Posts
    Article information

    Author: Edwin Metz

    Last Updated:

    Views: 5861

    Rating: 4.8 / 5 (78 voted)

    Reviews: 93% of readers found this page helpful

    Author information

    Name: Edwin Metz

    Birthday: 1997-04-16

    Address: 51593 Leanne Light, Kuphalmouth, DE 50012-5183

    Phone: +639107620957

    Job: Corporate Banking Technician

    Hobby: Reading, scrapbook, role-playing games, Fishing, Fishing, Scuba diving, Beekeeping

    Introduction: My name is Edwin Metz, I am a fair, energetic, helpful, brave, outstanding, nice, helpful person who loves writing and wants to share my knowledge and understanding with you.