Microsoft, University of Washington and NVIDIA unveil OmniReset for robotic learning
New reinforcement learning approach developed with Microsoft Research, University of Washington and NVIDIA targets large-scale training of robotic systems without manual reward design.
Robotic arms trained using reinforcement learning perform assembly tasks in simulation as part of the OmniReset system developed by Microsoft Research, University of Washington, and NVIDIA.
Microsoft Research, the University of Washington, and NVIDIA have developed OmniReset, a reinforcement learning method designed to train robotic systems at scale without relying on manual reward engineering.
The work, shared by Microsoft Research Principal Researcher Andrey Kolobov via LinkedIn, focuses on improving how robots learn complex physical tasks and could influence the development of physically embodied AI systems used in industry and education.
The approach centers on enabling reinforcement learning in simulation to produce policies that can transfer to real-world robots, with a focus on assembly and manipulation tasks.
Removing reward design from reinforcement learning
In their LinkedIn post, Andrey Kolobov describes OmniReset as a method that uses “resets to enable RL in sim to learn policies for robotics tasks far beyond pick-and-place, with a level of robustness unprecedented for robotic manipulation,” positioning it as an attempt to extend reinforcement learning beyond narrow use cases.
They explain that one of the core limitations in robotics has been the need for highly specific reward functions, noting that RL “required painstaking and bespoke reward function design or every new task,” particularly in assembly scenarios where this process becomes difficult to scale.
OmniReset addresses this by introducing what Kolobov calls a repeatable “recipe” that “sidesteps reward function design and instead relies on state resets to overcome RL's perennial exploration challenges,” allowing the same approach to be applied across a broader set of tasks.
Focus on simulation-to-real-world transfer
The system is also designed with deployment in mind, with Kolobov stating that “OmniReset is also about transferring these policies to physical robots,” highlighting the ongoing focus on sim-to-real transfer.
They argue that reinforcement learning offers efficiency advantages over imitation-based approaches, explaining that “RL-learned policies can operate a robot's embodiment much more efficiently than those learned by imitating a person.”
According to Kolobov, this translates into operational gains, with “higher task execution robustness, faster task execution, and higher throughput,” which remain key factors in industrial robotics environments.
Implications for physical AI and skills development
Beyond individual tasks, the work also targets the challenge of scaling data for training physical AI systems. Kolobov states that OmniReset is “about unlocking large-scale data generation for robotic assembly tasks,” an area that has limited progress in building generalist models.
They add that combining approaches may be necessary in practice, noting that “combining simulation data generated using reset-aided RL with physical demonstrations where they are viable is a promising path to economically valuable physical AI.”
The project involves contributors from Microsoft Research, the University of Washington, and NVIDIA, and is scheduled for presentation at ICLR 2026.