Although the Turing Test is known to be a measure of human
intelligence it is never referred to as comprehensive anymore. Instead it is
thought of as insufficient to cover all aspects of what we now consider human
intelligence to be. However games are not required to actually be intelligent
rather they demonstrate the illusion of intelligence to the player(s). For this
reason any result of an AI test in a game setting should not search for a
result that indicates its intelligence but instead should look for overall
gameplay improvement. Therefore a Turing style test focusing on believability
rather than intelligence could be applicable in the game industry assuming it
was part of a more complete test.
Games are currently known for their ‘bad’ AI where situations
occur like killing a patrolling guard who dies loudly but doesn’t attract the
attention of the guard a few feet away. Or a persistent opponent that attacks
your base in the same place each time despite being constantly defeated. These
unrealistic conditions can take the player out of the game but implementing
behaviourism could help with this. Using neural networks as a method to train
agents or ‘condition’ them may be a
solution. AI opponents can then learn from their mistakes and attack bases from
other angles to avoid troop loss for example. Behaviourism relies on the theory
that only the observable behaviour should be analysed rather than the internal
thought process behind the behaviour. This is useful in the industry because
all the player sees is the behaviour, we don’t necessarily need to know why the
AI reacted a certain way, just that it did. Behaviourism is currently being
used to train speech/conversation AI to beat the Turing test and win the
Loebner prize and is measured by using lexical semantic analysis. So as
academic research progresses so might the AI in games. Although this technique isn’t
frequently used in the game industry yet, it is becoming more common.
Theoretically any game can be tested on this kind of test because
every game is a form of imitation game. For example can a player can tell the
difference between an AI and humans when running around a virtual world. Or can
a player distinguish whether it is playing against a human or an AI opponent in
a strategy game. But its not as simple as that. The problem with testing for
believability is that it’s subjective. People can have altered perceptions
based on cultural differences. As discovered by Mac
Namee, 2004 who came across the following scenario:
‘In one
test, subjects are shown two versions of a virtual bar populated by AI agents.
In one version the behaviour of the agents is controlled in a deliberate manner
because they try to satisfy long-term goals – buy and drink beer, talk to
friends, or go to the toilet. In the second version the agents randomly pick
new short-term goals every time they complete an action. One Italian subject
felt that having agents return to sit at the same table time after time was
unrealistic, whereas the other (Irish) subjects mostly felt that this behaviour
was more believable. Sadly, further tests to determine whether this – or other
– results are indeed culturally significant have yet to be carried out; the
possibility that such differences exist does appear quite likely, however.‘
Similarly Mac Namee also points out that players new to the game
can have a different experience of the AI due to lack of exposure of the game
environment compared to a veteran of the game. The veterans know exactly what
their looking for and have more time to concentrate on the AI because other game related skills are second
nature.
It is possible that the player can be concentrating on the game so
much that they don’t notice the AI. Or in some genres they may not get enough
interaction with the AI to make a firm judgement. A way to fix this is by
making the AI more obvious. Butcher and Griesemer (2002) found that players
need to be given exaggerated visual cues in order to even notice the AI. But
surely if the AI is exaggerated and
for example over emotive it becomes less realistic and less human? On the other hand it is possible to overdo
the AI and confuse the judge by over exaggerating a common mistake that humans
experience. This is exactly what McGlinchey and Livingstone (2004) found when
their Pong playing AI moved the bat in a slow, jerky way (by accident) players
often mistakenly thought it was human. Although some people suggest having an
observer as a judge to alleviate these issues it should be remembered that the
aim is to please the player not the people watching.
Laird and van Lent, 1999 conducted a
test in which the observers judged of players of mixed skill play against the
Soar Quakebot and found that the opponent was considered ‘more human’ if it had
a slower reaction speed, an inaccuracy when shooting a target and tactical
reasoning. (However none of them thought the AI was actually human). The
results of this test led them to devise 3 simple principles:
·
AI should have
human like reaction speeds not super-fast
·
AI should not
have ‘superhuman’ abilities like overly accurate aiming
·
AI should have
some kind of strategic reasoning so it doesn’t just react to each situation
One of the obvious flaws of the
original Turning test was that it gives a very black or white result. It either
passed or it hasn’t. But in a creative industry there needs to be more
flexibility. There have been attempts that score the AI on a scale of different
features. Using the following parameters
formulated by Daniel Livingstone by analysing game AI for common ‘bad AI’
situations it could be possible to devise a series of tests of believability to
cover different aspects.
·
Plan
o
demonstrate some
degree of strategic/tactical planning
o
be able to
coordinate actions with player/other AI
o
not repeatedly
attempt a previous, failed, plan or action
·
Act
o
act with
human-like reaction times and abilities
·
React
o
react to players’
presence and actions appropriately
o
react to changes
in their local environment
o
react to presence
of foes and allies
It seems to me that a Turing style
test has many flaws and many researchers seem to advocate it so much that they
are just desperate for it to work. Would it be accurate to say the only reason
this kind of test is such mainstream knowledge is simply that there is no
alternative to replace it? Maybe researchers should stop trying to find a
modification that works and start again in the search for a suitable comprehensive
AI test. In conclusion I feel that a Turing style test is inappropriate for use
in a game sense, even as a part of a more complete test, because of its
problematic testing parameters and ambiguity of results.
BUTCHER, C. AND GRIESEMER, J. 2002. The illusion of
intelligence: The integration of AI and level design in Halo. Presented at the
Game Developers Conference (San Jose, CA, March 21-23, 2002).
LAIRD, J. E. AND VAN
LENT, M. 1999. Developing an
artificial intelligence engine. In Proceedings of the Game Developers
Conference (San Jose, CA, Nov. 3-5,
1999).
MAC NAMEE, B. 2004. Proactive persistent agents: Using
situational intelligence to create support characters in character-centric
computer games. Ph.D. dissertation, Dept. of Computer Science. University of
Dublin, Dublin, Ireland
MCGLINCHEY, S. AND LIVINGSTONE, D. 2004. What believability
testing can tell us. In CGAIDE, Proceedings of the Conference on Game AI,
Design and Education (Microsoft Campus,
Redding, WA).
TURING, A. M. 1950. Computing machinery and intelligence.
Mind LIX, 236 (1950), 433-460.
WETZEL, B. 2004. Step one: Document the problem. In
Proceedings of the AAAI Workshop on Challenges in Game AI (San Jose, CA, July
25-26, 2004).