Multi-Agent Hide and Seek

The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.
bremnetsays...

Another entrant in the incredibly long line of adaptation / adaptive learning / intelligent systems / artificial intelligence demonstrations that aren't. The agents act based on a set of rules / freedoms/constraints prescribed by a human. The agents "learn" based on the objective functions defined by the human. With enough iterations (how many times did the narrator say "millions" in the video) . Sure, it is a good demonstration of how adaptive learning works, but the hype-fog is getting a big thick and sickening folks. This is a very complex optimization problem being solved with impressive and current technologies, but it is certainly not behavioural intelligence.

L0ckysays...

This isn't really true though and greatly understates how amazing this demo, and current AI actually is.

Saying the agents are obeying a set of human defined rules / freedoms / constraints and objective functions would lead one to imagine something more like video game AI.

Typically video game AI works on a set of weighted decisions and actions, where the weights, decisions and actions are defined by the developer; a more complex variation of:

if my health is low, move towards the health pack,
otherwise, move towards the opponent

In this demo, no such rules exist. It's not given any weights (health), rules (if health is low), nor any instructions (move towards health pack). I guess you could apply neural networks to traditional game AI to determine the weights for decision making (which are typically hard coded by the developer); but that would be far less interesting than what's actually happening here.

Instead, the agent is given a set of inputs, a set of available outputs, and a goal.

4 Inputs:
- Position of the agent itself
- Position and type (other agent, box, ramp) of objects within a limited forward facing conical view
- Position (but not type) of objects within a small radius around the agent
- Reward: Whether they are doing a good job or not

Note the agent is given no information about each type of object, or what they mean, or how they behave. You may as well call them A, B, C rather than agent, box, ramp.

3 Outputs:
- Move
- Grab
- Lock

Again, the agent knows nothing about what these mean, only that they can enable and disable each at any time. A good analogy is someone giving you a game controller for a game you've never played. The controller has a stick and two buttons and you figure out what they do by using them. It'd be accurate to call the outputs: stick, A, B rather than move, grab, lock.

Goal:
- Do a good job.

The goal is simply for the reward input to be maximised. A good analogy is saying 'good girl' or giving a treat to a dog that you are training when they do the right thing. It's up to the dog to figure out what it is that they're doing that's good.

The reward is entirely separate from the agent, and agent behaviour can be completely changed just by changing when the reward is given. The demo is about hide and seek, where the agents are rewarded for not being seen / seeing their opponent (and not leaving the play area). The agents also succeeded at other games, where the only difference to the agent was when the reward was given.

It isn't really different from physically building the same play space, dropping some rats in it, and rewarding them with cheese when they are hidden from their opponents - except rats are unlikely to figure out how to maximise their reward in such a 'complex' game.

Given this description of how the AI actually works, the fact they came up with complex strategies like blocking doors, ramp surfing, taking the ramp to stop their opponents from ramp surfing, and just the general cooperation with other agents, without any code describing any of those things - is pretty amazing.

You can find out more about how the agents were trained, and other exercises they performed here:

https://openai.com/blog/emergent-tool-use/

bremnetsaid:

Another entrant in the incredibly long line of adaptation / adaptive learning / intelligent systems / artificial intelligence demonstrations that aren't. The agents act based on a set of rules / freedoms/constraints prescribed by a human. The agents "learn" based on the objective functions defined by the human. With enough iterations (how many times did the narrator say "millions" in the video) . Sure, it is a good demonstration of how adaptive learning works, but the hype-fog is getting a big thick and sickening folks. This is a very complex optimization problem being solved with impressive and current technologies, but it is certainly not behavioural intelligence.

bremnetsays...

Thanks for the link and the education, truly appreciated. I'm still stuck on "there has to be more to it" ... but I guess after 85 million games, the outcome is bound to be a winner. Same philosophy I have for the Leafs winning the Cup.

L0ckysaid:

This isn't really true though and greatly understates how amazing this demo, and current AI actually is.

Saying the agents are obeying a set of human defined rules / freedoms / constraints and objective functions would lead one to imagine something more like video game AI.

Typically video game AI works on a set of weighted decisions and actions, where the weights, decisions and actions are defined by the developer; a more complex variation of:

if my health is low, move towards the health pack,
otherwise, move towards the opponent

In this demo, no such rules exist. It's not given any weights (health), rules (if health is low), nor any instructions (move towards health pack). I guess you could apply neural networks to traditional game AI to determine the weights for decision making (which are typically hard coded by the developer); but that would be far less interesting than what's actually happening here.

Instead, the agent is given a set of inputs, a set of available outputs, and a goal.

4 Inputs:
- Position of the agent itself
- Position and type (other agent, box, ramp) of objects within a limited forward facing conical view
- Position (but not type) of objects within a small radius around the agent
- Reward: Whether they are doing a good job or not

Note the agent is given no information about each type of object, or what they mean, or how they behave. You may as well call them A, B, C rather than agent, box, ramp.

3 Outputs:
- Move
- Grab
- Lock

Again, the agent knows nothing about what these mean, only that they can enable and disable each at any time. A good analogy is someone giving you a game controller for a game you've never played. The controller has a stick and two buttons and you figure out what they do by using them. It'd be accurate to call the outputs: stick, A, B rather than move, grab, lock.

Goal:
- Do a good job.

The goal is simply for the reward input to be maximised. A good analogy is saying 'good girl' or giving a treat to a dog that you are training when they do the right thing. It's up to the dog to figure out what it is that they're doing that's good.

The reward is entirely separate from the agent, and agent behaviour can be completely changed just by changing when the reward is given. The demo is about hide and seek, where the agents are rewarded for not being seen / seeing their opponent (and not leaving the play area). The agents also succeeded at other games, where the only difference to the agent was when the reward was given.

It isn't really different from physically building the same play space, dropping some rats in it, and rewarding them with cheese when they are hidden from their opponents - except rats are unlikely to figure out how to maximise their reward in such a 'complex' game.

Given this description of how the AI actually works, the fact they came up with complex strategies like blocking doors, ramp surfing, taking the ramp to stop their opponents from ramp surfing, and just the general cooperation with other agents, without any code describing any of those things - is pretty amazing.

You can find out more about how the agents were trained, and other exercises they performed here:

https://openai.com/blog/emergent-tool-use/

jmdsays...

Am I the only one that after hearing "Then the seekers learned to climb on top of boxes and surf them around to overcome barriers" immediately thought "That's not smart AI, that is just shitty coding"

Send this Article to a Friend



Separate multiple emails with a comma (,); limit 5 recipients






Your email has been sent successfully!

Manage this Video in Your Playlists




notify when someone comments
X

This website uses cookies.

This website uses cookies to improve user experience. By using this website you consent to all cookies in accordance with our Privacy Policy.

I agree
  
Learn More