Exploration is risky Reinforcement learning agents need to explore their environments in order to learn optimal behaviors.
We think constrained RL may turn out to be more useful than normal RL for ensuring that agents satisfy safety requirements.
These results suggest that larger data sets encourage more generic winning tickets than smaller data sets.
Generalizing to other domains and learning methods: RL and NLP So far, the lottery ticket phenomenon has only been tested in the context of supervised learning for vision-centric classification...
In this paper, we propose a Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using a copy mechanism, facilitating knowledge transfer when predicting (domain, slot, value) triplets not encountered during training.