search results matching tag: OpenAI

» channel: weather

go advanced with your query
Search took 0.000 seconds

  • 1
    Videos (4)     Sift Talk (0)     Blogs (0)     Comments (3)   

Artificial Intelligence: Last Week Tonight with John Oliver

Multi-Agent Hide and Seek

L0cky says...

This isn't really true though and greatly understates how amazing this demo, and current AI actually is.

Saying the agents are obeying a set of human defined rules / freedoms / constraints and objective functions would lead one to imagine something more like video game AI.

Typically video game AI works on a set of weighted decisions and actions, where the weights, decisions and actions are defined by the developer; a more complex variation of:

if my health is low, move towards the health pack,
otherwise, move towards the opponent

In this demo, no such rules exist. It's not given any weights (health), rules (if health is low), nor any instructions (move towards health pack). I guess you could apply neural networks to traditional game AI to determine the weights for decision making (which are typically hard coded by the developer); but that would be far less interesting than what's actually happening here.

Instead, the agent is given a set of inputs, a set of available outputs, and a goal.

4 Inputs:
- Position of the agent itself
- Position and type (other agent, box, ramp) of objects within a limited forward facing conical view
- Position (but not type) of objects within a small radius around the agent
- Reward: Whether they are doing a good job or not

Note the agent is given no information about each type of object, or what they mean, or how they behave. You may as well call them A, B, C rather than agent, box, ramp.

3 Outputs:
- Move
- Grab
- Lock

Again, the agent knows nothing about what these mean, only that they can enable and disable each at any time. A good analogy is someone giving you a game controller for a game you've never played. The controller has a stick and two buttons and you figure out what they do by using them. It'd be accurate to call the outputs: stick, A, B rather than move, grab, lock.

Goal:
- Do a good job.

The goal is simply for the reward input to be maximised. A good analogy is saying 'good girl' or giving a treat to a dog that you are training when they do the right thing. It's up to the dog to figure out what it is that they're doing that's good.

The reward is entirely separate from the agent, and agent behaviour can be completely changed just by changing when the reward is given. The demo is about hide and seek, where the agents are rewarded for not being seen / seeing their opponent (and not leaving the play area). The agents also succeeded at other games, where the only difference to the agent was when the reward was given.

It isn't really different from physically building the same play space, dropping some rats in it, and rewarding them with cheese when they are hidden from their opponents - except rats are unlikely to figure out how to maximise their reward in such a 'complex' game.

Given this description of how the AI actually works, the fact they came up with complex strategies like blocking doors, ramp surfing, taking the ramp to stop their opponents from ramp surfing, and just the general cooperation with other agents, without any code describing any of those things - is pretty amazing.

You can find out more about how the agents were trained, and other exercises they performed here:

https://openai.com/blog/emergent-tool-use/

bremnet said:

Another entrant in the incredibly long line of adaptation / adaptive learning / intelligent systems / artificial intelligence demonstrations that aren't. The agents act based on a set of rules / freedoms/constraints prescribed by a human. The agents "learn" based on the objective functions defined by the human. With enough iterations (how many times did the narrator say "millions" in the video) . Sure, it is a good demonstration of how adaptive learning works, but the hype-fog is getting a big thick and sickening folks. This is a very complex optimization problem being solved with impressive and current technologies, but it is certainly not behavioural intelligence.

Elon Musk's 'Dota 2' Experiment is Disrupting Esports

Jinx says...

1v1 mid to 5v5 is like checkers to chess, so I'd be interested to see if it can also master all of that.

It's curious how aggressive it plays. When you watch AlphaGo I think one of the defining features of it was that didn't look for huge leads because the margin of victory is meaningless. OpenAI seems totally different, it practices a sort of brinkmanship that is almost human. The games I've seen it lose were because of moves that actually did seem very human - it went all in up high ground and got unlucky for instance. I'd have somewhat expected it to evaluate doing that as an unnecessary risk when it can just extend its advantage in other ways... but then maybe it is overestimating the ability of its human opponents to keep up with last hits and raze placement.

  • 1


Send this Article to a Friend



Separate multiple emails with a comma (,); limit 5 recipients






Your email has been sent successfully!

Manage this Video in Your Playlists