Here we present a number of real-world examples filmed in a real-world apartment never seen by the robot during training. All results are collected using policies generated by FLaRe. In each video, we show the robot's RGB cameras input, as well as a 3rd person perspective. All videos are sped up by up to 40x for ease of viewing.
For tasks that are in the trainig data of the base model that FLaRe fine-tunes upon, FLaRe can effectively align the policies towards task completion, thereby achieving state-of-the-art performances. Below, we show examples of tasks from the CHORES benchmark.
FLaRe policy finds a bed after exploring the unseen apartment with many rooms.
FLaRe successfully identifies and pick up a mug.
FLaRe identifies and navigate to an apple, correctly position its base such that the apple is reacheable, and successfully pick the apple up.
FLaRe explore the room extremely efficiently without requiring a map, and successfully visit all 6 rooms, even though it has never seen this layout during training.
Besides achieving SoTA performances on the aforementioned tasks, FLaRe can also be used to efficiently adapt to new capabilities that are not presented in the training data of the base model. These capabilities can be new tasks, new embodiments, new reasoning skills, or new objectives. Below, we show examples of FLaRe adapting towards tasks and embodiments never seen during training.
FLaRe can repurpose the base model towards new embodiments. Here, we show the FLaRe policy controlling a LoCoBot (as oppose to Stretch) to follow an instruction from the ObjectNav Task.
The RoomNav task requires the robot to understand concepts related to room types, rather than just focusing on the objects. Here, the robot successfully identifies the living room as instructed.
The ObjNavAfford task requires a high-level understanding of the affordance of different objects. In this video, the FLaRe policy is able to correctly identify the TV as a target object, and navigate to it, after a very long episode of exploring the environment.
The ObjNavRelAttr task requires the robot to not just find a specific type of object, but also reason about the object's attribute relative to other objects. Here, the FLaRe policy control the robot to find the chair in the kitchen, while ignoring other chairs in the process.
To better understand the decision-making process of FLaRe agent, we show multiple examples of FLaRe's behavior both in simulation and in the real world.
FLaRe learns closed loop behaviors. When the first grasp failed, the robot repositions itself and tries again, successfully grasping the spray bottle.
Even though the FLaRe agent does not have access to depth, it is able to learn good pickup policy that is robust to different surface heights in the real world.
We found the FLaRe policy to be quite robust across distinct realworld environments, thanks to the large number of houses it has seen during training in simulation. In this video, we show the robot conducting room exploration in RoboThor.
Another affordance task completion in the realworld, where the robot correctly identifies the right produce (i.e. Apple) and call the DONE action.
The Transformer architecture enables long-horizon memory and reasoning. In this clip, we show the robot completing the room visit task in an extremely challenging house layout with 7 rooms, demonstrating its exceptional memory capacity and spatial reasoning abilities.
One typical failure mode in the real world is grasp failure. In the future, we plan to improve this by 1) equip the robot with depth sensing capabilities, and 2) improve the physics realisticity of our simulator.
Another failure mode is object mismatch between simulation and real world. For example, we found that the FLaRe policy had trouble recognizing the vases in the real world (as shown by the clip). Upon investigation, we realize that a primary cause for this issue is that the vases in simulation looks quite different from the realworld vases that we tested with. We expect further diversification of objects in simulation to alleviate this issue.
@article{
hu2024flare,
title={FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning},
author={Jiaheng Hu and Rose Hendrix and Ali Farhadi and Aniruddha Kembhavi and Roberto Martin-Martin and Peter Stone and Kuo-Hao Zeng and Kiana Ehsani},
journal={arXiv},
year={2024},
eprint={2409.16578},
}