Exploration epsilon: 0.15 Item density: 300 Learning rate alpha: 0.005
The items in this world begin as 'flipme' items. The agents come in two types: apple-loving and poison-loving. Once an agent touches one of these items, the item becomes the opposite of what the agent likes. My hope is that two different agents will learn an optimal strategy of 'hanging out' together so that they can flip items for each other's mutual benefit. Yay cooperation!
Note. To test whether or not cooperation is happening, click 'Load Pretrained Agent' (or train your own), reduce the epsilon and alpha parameters to 0, and reduce the number of items to 0 or only a small number (such as 10 or 20). You will see the agents interact with one another, but their strategy will perhaps surprise you.