This research project investigated how people perceive behaviours of reinforcement learning through sound.
This project acted as a preliminary step to design an expressive workflow allowing humans to teach reinforcement learning agents specific behaviours based on subjective preferences. This workflow was eventually implemented and evaluated within the Co-Explorer.
We led a controlled experiment asking participants to guide three types of agents in two sound synthesis environment using positive or negative feedback. One agent was always exploring, one always exploiting, and one balancing exploiration with exploitation; one sound environment had linear timbral attributes, and one had a timbral obstruction.
Participants successfully interacted with all three agents to reach a sonic goal. Subjective evaluations showed that agents’ exploration behaviour, rather than their learning to reach a sonic goal, was critical to how participants heard them as collaborative.