Real users, real questions, real time: the TREC LiveQA challenge

Eugene Agichtein, David Carmel, Dan Pelleg, Yuval Pinter – Yahoo Labs, Haifa

Donna Harman, NIST

The Text REtrieval Conference (TREC), sponsored by the National Institute of Standards and Technology (NIST), provides the infrastructure for large-scale evaluation of text retrieval methodologies. The automated question answering (QA) track, which has been one of the most popular tracks in TREC for many years, has focused on the task of providing automatic answers for human questions. The track primarily dealt with factual questions, and the answers provided by participants were extracted from a corpus of News articles.

The TREC Live Question Answering track (LiveQA) that we organized this year, focuses on “live” question answering for real-user questions. Real user questions, extracted from the stream of most recent questions submitted on the Yahoo Answers (YA) site, and which have not yet been answered by humans, were sent to participant QA systems who answer the questions.  There wasn’t any restriction on the system resources for question answering. The only restrictions were that the length of the answers should not exceed 1000 characters, and the response time was limited to one minute, thus enforcing the system to answer programmatically in (almost) real-time.

The official run of the track was scheduled for August 31, 2015, for a 24 hours period. 1340 live questions that were landed on the Yahoo Answers site were submitted to 22 registered systems, built by 14 participating institutions around the globe (from Australia, China, the Middle East, Europe, and North America).

The submitted questions were collected from a few predefined categories (Health, Sport, Computer & Internet, Pets…). In contrast to factoid questions used in previous QA tracks, YA questions are much more diverse, including opinion, advice, polls, and many other question types, thus making the task far more realistic and challenging. Here are a few examples of the submitted questions:

  • “What’s the best thing to do when depressed? Should I push myself to work?”
  • “I’m a black male with curly soft textured hair versus nappy hair. I want to twist it, is this possible for it to look nice and if so how?”
  • “Is it safe to use diluted clorox to get the stains off tea cups?”
  • “I want to migrate in USA. I am a software engineer from Bangladesh. Can anyone help me?”
  • “How does someone with one eye blink?”

About 25,000 valid answers were collected from participants by our test system. The correctness and the quality of the answers will later be judged by NIST assessors. Systems performance will be announced in the coming Trec conference at Washington D.C., in November 2015. In the conference we will also discuss and cover the different approaches used by the participants for Live QA. The questions with the judged answers will be available to the public after the conference for further research.

We want to thank all participants for their effort and patience. This is the first time that we run such an experiment and we are very happy with the amount of participation and with the quality of the answers. From our perspective, it was a real pleasure, and we all look forward for the results to be published. We are committed to run the LiveQA track again next year (if NIST will let us of course) and expect even more participation and higher answering quality next year.

