Building and using appropriate benchmarks is a key driver of the advancement of RL algorithms. For value-based deep RL algorithms, there is Arcade Learning Environment. For constant control, we have Mujoco. For multi-agent RL, there’s the StarCraft multi-agent challenge. Benchmarks demonstrating more open dynamics, such as procedural world creation, skill acquisition and reuse, long-term dependencies, and continuous learning, have emerged as part of the move toward more general agents. This led to the creation of tools such as MiniHack, Crafter, MALMO, and the NetHack learning environment.
Unfortunately, researchers cannot use it due to its long runtime, making it impractical to use in current methods that do not use large computer resources. At the same time, JAX is booming in the RL environment as the speed of executing end-to-end compiled RL pipelines is fully realized. Experiments that used to take days to run on large compute clusters can now be completed in minutes on a single GPU thanks to effective parallelization, compilation, and elimination of CPU-GPU transfers.
To integrate these two schools of thought, recent research from the University of Oxford and University College London provides the Craftax benchmark, a JAX-based environment that runs significantly faster than similar environments and displays complex and open dynamics. One specific example is Craftax-Classic, a JAX reimplementation of Crafter that is 250 times better than the original Python version.
The researchers show that a basic PPO agent can solve Craftax-Classic (90% of maximum return) in 51 minutes, easily accessing significantly more time steps. As such, we also offer Craftax, a much more difficult setting that borrows mechanics from NetHack and, more generally, the Roguelike genre. They provide a default Craftax experience designed to be more difficult while maintaining a fast runtime to provide a more engaging challenge for users. Craftax introduces a variety of new game mechanics. Using pixels just adds another layer of representation learning to the problem, and many of the properties Crafter investigates (navigation, memory) have nothing to do with the exact form of the observation. Therefore, they provide variants of Craftax that use symbolic and pixel-based observations. The former is about 10 times faster.
Our tests show that currently available approaches do not work well with Craftax. The team therefore hopes to pose a significant challenge to future RL research while allowing experiments with limited computational resources.
The team hopes Craftax-Classic will be a seamless introduction to Craftax for those already familiar with the Crafter standard.
Please confirm Paper, Github and Project. All credit for this study goes to the researchers of this project. Also, don’t forget to follow us Twitter and google news. join Over 38,000 ML SubReddits, 41,000+ Facebook communities; discord channeland LinkedIn GrWhoop.
If you like our work, you will love us Newsletter..
Don’t forget to join us telegram channel
You may also like us Free AI course…
Dhanshree Shenwai is a computer science engineer with a keen interest in AI applications and good experience in FinTech companies covering finance, cards and payments, and banking sectors. She is passionate about exploring new technologies and advancements in today’s evolving world that make life easier for everyone.
🐝 Join the fastest-growing AI research newsletter read by researchers at Google, NVIDIA, Meta, Stanford, MIT, Microsoft, and many others.