Millie Jackson: October 2012

Although the Turing Test is known to be a measure of human intelligence it is never referred to as comprehensive anymore. Instead it is thought of as insufficient to cover all aspects of what we now consider human intelligence to be. However games are not required to actually be intelligent rather they demonstrate the illusion of intelligence to the player(s). For this reason any result of an AI test in a game setting should not search for a result that indicates its intelligence but instead should look for overall gameplay improvement. Therefore a Turing style test focusing on believability rather than intelligence could be applicable in the game industry assuming it was part of a more complete test.

Games are currently known for their ‘bad’ AI where situations occur like killing a patrolling guard who dies loudly but doesn’t attract the attention of the guard a few feet away. Or a persistent opponent that attacks your base in the same place each time despite being constantly defeated. These unrealistic conditions can take the player out of the game but implementing behaviourism could help with this. Using neural networks as a method to train agents or ‘condition’ them may be a solution. AI opponents can then learn from their mistakes and attack bases from other angles to avoid troop loss for example. Behaviourism relies on the theory that only the observable behaviour should be analysed rather than the internal thought process behind the behaviour. This is useful in the industry because all the player sees is the behaviour, we don’t necessarily need to know why the AI reacted a certain way, just that it did. Behaviourism is currently being used to train speech/conversation AI to beat the Turing test and win the Loebner prize and is measured by using lexical semantic analysis. So as academic research progresses so might the AI in games. Although this technique isn’t frequently used in the game industry yet, it is becoming more common.

Theoretically any game can be tested on this kind of test because every game is a form of imitation game. For example can a player can tell the difference between an AI and humans when running around a virtual world. Or can a player distinguish whether it is playing against a human or an AI opponent in a strategy game. But its not as simple as that. The problem with testing for believability is that it’s subjective. People can have altered perceptions based on cultural differences. As discovered by Mac Namee, 2004 who came across the following scenario:

‘In one test, subjects are shown two versions of a virtual bar populated by AI agents. In one version the behaviour of the agents is controlled in a deliberate manner because they try to satisfy long-term goals – buy and drink beer, talk to friends, or go to the toilet. In the second version the agents randomly pick new short-term goals every time they complete an action. One Italian subject felt that having agents return to sit at the same table time after time was unrealistic, whereas the other (Irish) subjects mostly felt that this behaviour was more believable. Sadly, further tests to determine whether this – or other – results are indeed culturally significant have yet to be carried out; the possibility that such differences exist does appear quite likely, however.‘

Similarly Mac Namee also points out that players new to the game can have a different experience of the AI due to lack of exposure of the game environment compared to a veteran of the game. The veterans know exactly what their looking for and have more time to concentrate on the AI because other game related skills are second nature.

It is possible that the player can be concentrating on the game so much that they don’t notice the AI. Or in some genres they may not get enough interaction with the AI to make a firm judgement. A way to fix this is by making the AI more obvious. Butcher and Griesemer (2002) found that players need to be given exaggerated visual cues in order to even notice the AI. But surely if the AI is exaggerated and for example over emotive it becomes less realistic and less human? On the other hand it is possible to overdo the AI and confuse the judge by over exaggerating a common mistake that humans experience. This is exactly what McGlinchey and Livingstone (2004) found when their Pong playing AI moved the bat in a slow, jerky way (by accident) players often mistakenly thought it was human. Although some people suggest having an observer as a judge to alleviate these issues it should be remembered that the aim is to please the player not the people watching.

Laird and van Lent, 1999 conducted a test in which the observers judged of players of mixed skill play against the Soar Quakebot and found that the opponent was considered ‘more human’ if it had a slower reaction speed, an inaccuracy when shooting a target and tactical reasoning. (However none of them thought the AI was actually human). The results of this test led them to devise 3 simple principles:

· AI should have human like reaction speeds not super-fast

· AI should not have ‘superhuman’ abilities like overly accurate aiming

· AI should have some kind of strategic reasoning so it doesn’t just react to each situation

One of the obvious flaws of the original Turning test was that it gives a very black or white result. It either passed or it hasn’t. But in a creative industry there needs to be more flexibility. There have been attempts that score the AI on a scale of different features. Using the following parameters formulated by Daniel Livingstone by analysing game AI for common ‘bad AI’ situations it could be possible to devise a series of tests of believability to cover different aspects.

· Plan

o demonstrate some degree of strategic/tactical planning

o be able to coordinate actions with player/other AI

o not repeatedly attempt a previous, failed, plan or action

· Act

o act with human-like reaction times and abilities

· React

o react to players’ presence and actions appropriately

o react to changes in their local environment

o react to presence of foes and allies

It seems to me that a Turing style test has many flaws and many researchers seem to advocate it so much that they are just desperate for it to work. Would it be accurate to say the only reason this kind of test is such mainstream knowledge is simply that there is no alternative to replace it? Maybe researchers should stop trying to find a modification that works and start again in the search for a suitable comprehensive AI test. In conclusion I feel that a Turing style test is inappropriate for use in a game sense, even as a part of a more complete test, because of its problematic testing parameters and ambiguity of results.

BUTCHER, C. AND GRIESEMER, J. 2002. The illusion of intelligence: The integration of AI and level design in Halo. Presented at the Game Developers Conference (San Jose, CA, March 21-23, 2002).

LAIRD, J. E. AND VAN LENT, M. 1999. Developing an artificial intelligence engine. In Proceedings of the Game Developers Conference (San Jose, CA, Nov. 3-5, 1999).

LIVINGSTONE, D. 2006. Turing’s Test and Believable AI in Games. (PDF) Last accessed: 26/09/12. Available online at: http://www.acso.uneb.br/marcosimoes/Arquivos/IA/games_3.pdf

MAC NAMEE, B. 2004. Proactive persistent agents: Using situational intelligence to create support characters in character-centric computer games. Ph.D. dissertation, Dept. of Computer Science. University of Dublin, Dublin, Ireland

MCGLINCHEY, S. AND LIVINGSTONE, D. 2004. What believability testing can tell us. In CGAIDE, Proceedings of the Conference on Game AI, Design and Education (Microsoft Campus, Redding, WA).

TURING, A. M. 1950. Computing machinery and intelligence. Mind LIX, 236 (1950), 433-460.

WETZEL, B. 2004. Step one: Document the problem. In Proceedings of the AAAI Workshop on Challenges in Game AI (San Jose, CA, July 25-26, 2004).

Millie Jackson

Monday, 15 October 2012

Finite State Machines

Monday, 1 October 2012

Can we use the Turing Test for Games