This section describes the basic procedure for making a submission with a model trained in simulation using reinforcement learning with PyTorch. It can be used as a starting point for any of the LF, LFV, and LFVI challenges.
Check out the DtRewardWrapper and modify the rewards (set them higher or lower and see what happens)
Try making the observation image grayscale instead of color. And while you’re at it, try stacking multiple images, like 4 monochrome images instead of 1 color image
You can also try increasing the contrast in the input. to make the difference between road and road-signs clearer. You can do so by adding another observation wrapper.
Cut off the horizon from the image (and correspondingly change the convnet parameters).
Check out the default hyperparameters in duckietown_rl/args.py and fiddle around with them; see if they work better. For example increase the expl_noise or increase the start_timesteps to get better exploration.
(more sophisticated) Use a different map in the simulator, or - even better - use randomized maps.
(more sophisticated) Use a different/bigger convnet for your actor/critic. And add better initialization.
(super sophistacted) Use the ground truth from the simulator to construct a better reward
(crazy sophisticated) Use an entirely different training algorithm - like PPO, A2C, or DQN. Go nuts. But this might take significant time, even if you’re familiar with the matter.
Doing great on the simulated challenges, but not on the real evaluation? Or doing great in your training, but not on our simulated, held-out environments? Take a look at env.py. You’ll notice that we launch the Simulator class from gym-duckietown. When we take a look at the constructor, you’ll notice that we aren’t using all of the parameters listed. In particular, the three you should focus on are:
map_name: What map to use; hint, take a look at gym_duckietown/maps for more choices
domain_rand: Applies domain randomization, a popular, black-box, sim2real technique
randomized_maps_on_reset: Slows training time, but increases training variety.
Mixing and matching different values for these will help you improve your training diversity, and thereby improving your evaluation robustness!
If you’re interested in more advanced techniques, like learning a representation that is a bit easier for your network to work with, or one that transfers better across the simulation-to-reality gap, there are some alternative, more advanced methods you may be interested in trying out. In addition, don’t forget to try using the logs infrastructure, which you can also use to do things like imitation learning!