FLaRe: Achieving Masterful and Adaptive Robot Policies
with Large-Scale Reinforcement Learning Fine-Tuning

Equal Supervision
1Allen Institute for AI, 2University of Texas at Austin, 3University of Washington, 4Sony AI,


FLaRe fine-tunes transformer BC policies pre-trained on large Robotics datasets with large-scale RL, achieving SoTA results across a set of long-horizon mobile manipulation tasks, in both simulation and real-world. Furthermore, FLaRe shows exceptional adaptation capabilites towards mastering new task capacities, new embodiments, and new objectives efficiently.

header-image.



Real-world Qualitative Results

Here we present a number of real-world examples filmed in a real-world apartment never seen by the robot during training. All results are collected using policies generated by FLaRe. In each video, we show the robot's RGB cameras input, as well as a 3rd person perspective. All videos are sped up by up to 40x for ease of viewing.


FLaRe for seen capabilities

For tasks that are in the trainig data of the base model that FLaRe fine-tunes upon, FLaRe can effectively align the policies towards task completion, thereby achieving state-of-the-art performances. Below, we show examples of tasks from the CHORES benchmark.


Find a Bed (from ObjectNav Task)

FLaRe policy finds a bed after exploring the unseen apartment with many rooms.

Grasp a Mug (from Pickup Task)

FLaRe successfully identifies and pick up a mug.

Find and Hold an Apple (from Fetch Task)

FLaRe identifies and navigate to an apple, correctly position its base such that the apple is reacheable, and successfully pick the apple up.

Visit all 6 Rooms in the House (from RoomVisit Task)

FLaRe explore the room extremely efficiently without requiring a map, and successfully visit all 6 rooms, even though it has never seen this layout during training.

FLaRe for unseen capabilities

Besides achieving SoTA performances on the aforementioned tasks, FLaRe can also be used to efficiently adapt to new capabilities that are not presented in the training data of the base model. These capabilities can be new tasks, new embodiments, new reasoning skills, or new objectives. Below, we show examples of FLaRe adapting towards tasks and embodiments never seen during training.

Adapt to New Embodiment (LocoBot)

FLaRe can repurpose the base model towards new embodiments. Here, we show the FLaRe policy controlling a LoCoBot (as oppose to Stretch) to follow an instruction from the ObjectNav Task.

Find the Living Room (RoomNav)

The RoomNav task requires the robot to understand concepts related to room types, rather than just focusing on the objects. Here, the robot successfully identifies the living room as instructed.

Navigate to an Equipment for Watching
Movies and Shows (ObjNavAfford)

The ObjNavAfford task requires a high-level understanding of the affordance of different objects. In this video, the FLaRe policy is able to correctly identify the TV as a target object, and navigate to it, after a very long episode of exploring the environment.

Locate the Chair Closest to the Refrigerator
in the Kitchen (ObjNavRelAttr)

The ObjNavRelAttr task requires the robot to not just find a specific type of object, but also reason about the object's attribute relative to other objects. Here, the FLaRe policy control the robot to find the chair in the kitchen, while ignoring other chairs in the process.

Behavior Analysis in Simulation and Realworld

To better understand the decision-making process of FLaRe agent, we show multiple examples of FLaRe's behavior both in simulation and in the real world.

Grasp Retry in Fetch Task (Take a Spray Bottle)

FLaRe learns closed loop behaviors. When the first grasp failed, the robot repositions itself and tries again, successfully grasping the spray bottle.

Grasp the Houseplant / Apple

Even though the FLaRe agent does not have access to depth, it is able to learn good pickup policy that is robust to different surface heights in the real world.

RoomVisit in RoboThor (Distinct Scene)

We found the FLaRe policy to be quite robust across distinct realworld environments, thanks to the large number of houses it has seen during training in simulation. In this video, we show the robot conducting room exploration in RoboThor.

Locate a Produce that is best for Eating Fresh as Snacks (ObjNavAfford)

Another affordance task completion in the realworld, where the robot correctly identifies the right produce (i.e. Apple) and call the DONE action.

Long-horizon Reasoning and Memory (RoomVisit)

The Transformer architecture enables long-horizon memory and reasoning. In this clip, we show the robot completing the room visit task in an extremely challenging house layout with 7 rooms, demonstrating its exceptional memory capacity and spatial reasoning abilities.

Failure Mode Analysis

One typical failure mode in the real world is grasp failure. In the future, we plan to improve this by 1) equip the robot with depth sensing capabilities, and 2) improve the physics realisticity of our simulator.

Another failure mode is object mismatch between simulation and real world. For example, we found that the FLaRe policy had trouble recognizing the vases in the real world (as shown by the clip). Upon investigation, we realize that a primary cause for this issue is that the vases in simulation looks quite different from the realworld vases that we tested with. We expect further diversification of objects in simulation to alleviate this issue.


BibTeX

@article{
        hu2024flare,
        title={FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning},
        author={Jiaheng Hu and Rose Hendrix and Ali Farhadi and Aniruddha Kembhavi and Roberto Martin-Martin and Peter Stone and Kuo-Hao Zeng and Kiana Ehsani},
        journal={arXiv},
        year={2024},
        eprint={2409.16578},
}