Fancy a Turing Test challenge? Enter the Chatbot Battles!

Brought to you by Steve Worswick, frequent medal winner in the now-ended annual Chatterbox Challenge, this Challenge is not strictly-speaking a Turing Test as the entrants don’t have to pretend to be human, but they will be scored on how well they carry on a normal conversation. The competition is open to Android Apps but not iPhone Apps – unless you’d like to donate the man a phone!

Welcome to the first ever Chatbot Battles!

Ladies and gentlemen, grab your ringside seats and welcome to the first ever series of Chatbot Battles. Let’s get ready to rumble!

Chatbot owners from all over the world enter their creations for a knockout league to determine the winner. Each competitor must play each other in a league with the top entrants battling out in a one-on-one dual. Last bot standing wins!

Each match sees both chatbots either being asked the same 5 questions or having a free flowing conversation for 5 minutes. The winner is decided from their responses.

 

• Deadline to enter your chatbot: 1 June 2012
• League stage from: 11 June 2012 to: 25 June 2012
• Knockout stages start from: 26 June 2012 to: 11 July 2012
• Final result should be announced in July 2012

All bots are welcome to enter and test themselves against others, whether they be web based, downloadble, chatroom bots or even Android apps, sorry but we are unable to test iPhone apps – unless you want to send an iPhone along with your entry 🙂

Chatbot Battles are open to various kinds of chatterbots – Website Bots, Downloadable Bots, Chatroom Bots and Android apps – written in any programming language. Your bot only needs to be available so people can talk with it. Botmasters are invited to enter their bot and do battle with opponents from all over the world.

The Details:

Rules for entering are:

  • Only English speaking chatterbots are allowed to enter.
  • Only one chatterbot per person. This means you can’t enter different versions of the same bot.
  • Clones – you are free to enter any AIML, Personality Forge, MyCybertwin, INF.net bots and any other type of bot. However, if during a match, two bots produce exactly the same answer, neither will score a point for that question. This is to stop clones that have had no extra work carried out on them. The only exception to this rule is if the original bot (A.L.I.C.E., Ultra Hal etc) enters the contest. In this case, it will be awarded the point whereas the clone will score zero.
  • Accessibility – your bot must be easily accessible to the judges.
    • We can not accept bots that are only available on CD ROM. They have to be available to download on the internet.
    • We can not accept bots that require the judges to download a 3rd party chat program such as Mirc. Most instant messenger bots such as AOL are okay to enter.
    • We can not accept bots that require additional downloads of various programs that are not already install on Windows XP unless they are relatively small and easy to install.
    • We can not accept bots that are not online 24/7. With judges being from all over the world and in different time zones it’s not possible to schedule a time for judging.
    • If you are running some type of blocking filter on your bot to prevent user abuse it must be removed during judging or you must provide a method where the judge can re-establish contact instantly. If the judge can not complete the conversation, your bot will score 0 points for that match. Seeing how a bot reacts to foul language or abuse maybe a part of the judging process.
    • In short you are responsible to provide the judges with a bot that is easy to access or install.
  • There must be a way for the judges to copy and paste the output from each bot. This is especially important in speech-only bots. It is time consuming to judge any contest such as this and helping the judges to accurately record the responses is appreciated.
  • Bots created on Android apps are permitted to enter. These apps must be either free to download or a copy of the paid version must be accessible to us, either without us having to pay or by reimbursing us for the cost. Unfortuately, we have no way of testing Iphone apps. In the case of a phone app, it is not necessary to provide a way to copy the output but the bot’s responses must be visible on-screen and not just spoken.

Contest Calendar:
Deadline to Enter: June 1, 2012
League Stage: June 11, 2012 to June 25 2012
Knockout stages: June 26, 2012 to July 11, 2012
Announcement of Winners: July 2012

Prizes and Awards:
Unfortunately, as I have no sponsor and am not rolling in money, I can only offer the kudos of beating your peers as a prize and maybe a T-shirt, mousemat or similar gift. If anybody is interested in sponsoring the contest, please contact me.

Judges:
The judges will be selected from the general public to determine the winners in the contest. None of the judges will have a bot in the contest to ensure fairness. The judges will be totally impartial and fair to all entries regardless of the bot type or the owner. The judges will each be allocated matches to oversee and will be free to ask the bots anything they wish. If during a match, the bot asks the judge a question, the judge must answer it unless this goes to more than three questions in a row. Such tactics will be classed as stalling and the judge is then free to proceed to the next question without answering the bot.

Judging – League Matches:
A league match will either consist of a question and answer session or a free flowing conversation.

Question and answer match:
Each judge will pose the same five questions to the two competitors in each match and will score a point to the bot who in their opinion, gave the better answer for each question. Each question will be spelled and phrased exactly the same way to ensure that each bot gets an equal chance. At the end of the match, the points will be tallied up to find a winner. The maximum score in each match will be 5-0. All judge’s decisions are final.

Free flowing conversation match:
The judges will talk with both bots in the match for 5 minutes each. Any and all subjects may be covered. The judge will allow himself to follow the lead of the bot when necessary and will not be deliberately obtuse with the bots. Judges will be aware of the current state of AI and will converse accordingly. After the end of the match, the judge will decide which of the two bots gave the more convincing conversation and mark each bot out of 5. The chatbots are NOT expected to believe they are human. We are looking for the ability to chat not to try and fool the judges.

Judging – Knockouts:
The knockout stages will be judged as above but with 10 questions to each bot. This will continue until the semi-finals when there will be 15 questions. The losing semi-finalists will play an extra match with 15 questions to determine 3rd and 4th positions.

Questions:
1) The questions may contain typos to test the bot and internet slang terms like “lol” may also be used.
2) The questions will not be trivia like but reasonable in that an average person would be able to answer them. For example: “Who was Michael Jackson?” is good. “What is the population of Russia?” is bad.
3) As this is an international contest, the questions will not favour any particular country. For example asking a question about a certain country that only the people of that country would know. An example of this is “Who is the President of Finland?”.
4) Mathematical, logic, memory and reasoning questions may be employed. Examples of such questions are, “What is 6-2?”, “Can I eat a building?”, “I have a blue shirt. What colour is my shirt?”, “Joe and Jim are twins. Joe is 24 years old, how old is Jim?” and so on.
5) A question may just consist of a statement to see how a bot reacts in ordinary conversation rather than just being a question/answering program. An example of these are, “I went to the movies yesterday.”, “There is nothing on TV tonight.”
6) The questions will be posed in British English but if the judge deems a bot is having trouble understanding due to spelling, he/she may, at their discretion, pose the question in American English. An example is, “What is your favorite book?” instead of “What is your favourite book?”
7) Basically, anything goes. Each bot will be faced with the same input and so will face the same chance in each match as its opponent.

How the contest works:
Each entry in the competition is placed into a league table with at least three of the other entrants. The number of league tables and entrants in each will depend on the number of contestants. A typical league table will look like this:

 

Group A

P

W

D

L

F

A

Pts

Chatty Chatbot

0

0

0

0

0

0

0

Testbot 123

0

0

0

0

0

0

0

Chatter Bot

0

0

0

0

0

0

0

Sir Talk-a-lot

0

0

0

0

0

0

0

Blah D Blah

0

0

0

0

0

0

0


Just like the soccer World Cup, each bot will play against another bot in its group until all the entrants in each group have played each other. A match will consist of 5 questions. Both bots will be asked each of these questions with the one who answers the best, in the judge’s opinion, being awarded a point. If the answers are the same from each bot for anything except a general knowledge or a yes/no question, as they may be for A.L.I.C.E clones for example, no points will be awarded for that question. A typical match would go as follows:

Question 1: What is your name?
Bot 1: My name is Bot 1.
Bot 2: I do not know your name.
(Bot 1 scores a point for giving a correct answer. Score so far: 1-0)

Question 2: Do you like steak?
Bot 1: I like ice cream.
Bot 2: I like books.
(Bot 1 scores a point for giving a more relevant answer about food. Score so far: 2-0)

Question 3: What is a dog?
Bot 1: Dog: man’s best friend, a canine.
Bot 2: Dog: man’s best friend, a canine.
(No points scored. Although both bots were correct, they produced exactly the same response. The only exception to this rule would be if the original bot was taking part, as this is the original from which the clone was made. Score so far: 2-0)

Question 4: Jack is taller than Jim. Is Jim taller than Jack?
Bot 1: Who is that?
Bot 2: I do not know.
(Both bots got it wrong and so no points scored. Score so far: 2-0)

Question 5: What is the capital of France?
Bot 1: Paris.
Bot 2: Paris.
(Although they gave the same response, it was a general knowledge question and so both bots score a point. Final score: 3-1)

As in a soccer league table, 3 points are awarded for a win, 1 for a tie and 0 for a loss. The final result of the above match was 3-1 to Bot 1 and so that entry would score 3 points for winning the match. Bot 2 lost and receives no points.

These matches continue until everyone in each group has played each of the other members in the same group. The final table would then look something like this:

Group A

P

W

D

L

F

A

Pts

Sir Talk-a-lot

4

3

0

1

14

3

9

Chatty Chatbot

4

2

1

1

10

8

7

Chatter Bot

4

2

0

2

13

9

6

Blah D Blah

4

1

2

2

8

12

5

Testbot 123

4

0

1

3

2

16

1

 

The top two from each league table then go on to the knockout stages. The higest scoring runners up may also be included in the knockout stages if the numbers are not a power of 2 (4,8,16,32 etc). In the event of a tie, the number of points scored in all matches will be taken into consideration. If there is still a tie, the number of points conceded will be counted and if by some reason there is still a tie after that, the tieing bots will play a match between themselves to decide a winner.

The knockout stages continue in the same style as above but each match consists of 15 questions instead of 5. The winner will progress to the next knockout stage while the loser is out of the contest. In the event of a tie, further questions will be asked until one bot falters. Once the contest gets down to the final four, 15 questions will be asked in each round. The knockout matches will follow this style:

 

Quarter Finals (10 questions)

Semi Finals (15 questions)

Final (15 questions)

Contest winner

(1) Winner of group A

(5) Winner of match 1

(1) Runner up of group C

(7) Winner of match 5

(2) Winner of group B

(5) Winner of match 2

(2) Runner up of group D

Winner of match (7)

(3) Winner of group C

(6) Winner of match 3

(3) Runner up of group A

(7) Winner of match 6

(4) Winner of group D

(6) Winner of match 4

(4) Runner up of group B

 

There will also be a 3rd and 4th place play-off between the losing semi-finalists with 15 questions. The whole process should take around 1 month depending on the number of contestants but this time frame may increase or decrease.

Note:
The rules and guidelines may be changed at anytime to ensure fairness. It is impossible to forsee all the events that may arise during the contest so Chatbot Battles may have to make amendments to the rules to ensure the smooth running of the contest.