Real users, real questions, real time: the TREC LiveQA challenge

Eugene Agichtein, David Carmel, Dan Pelleg, Yuval Pinter – Yahoo Labs, Haifa

Donna Harman, NIST

The Text REtrieval Conference (TREC), sponsored by the National Institute of Standards and Technology (NIST), provides the infrastructure for large-scale evaluation of text retrieval methodologies. The automated question answering (QA) track, which has been one of the most popular tracks in TREC for many years, has focused on the task of providing automatic answers for human questions. The track primarily dealt with factual questions, and the answers provided by participants were extracted from a corpus of News articles.

The TREC Live Question Answering track (LiveQA) that we organized this year, focuses on “live” question answering for real-user questions. Real user questions, extracted from the stream of most recent questions submitted on the Yahoo Answers (YA) site, and which have not yet been answered by humans, were sent to participant QA systems who answer the questions.  There wasn’t any restriction on the system resources for question answering. The only restrictions were that the length of the answers should not exceed 1000 characters, and the response time was limited to one minute, thus enforcing the system to answer programmatically in (almost) real-time.

The official run of the track was scheduled for August 31, 2015, for a 24 hours period. 1340 live questions that were landed on the Yahoo Answers site were submitted to 22 registered systems, built by 14 participating institutions around the globe (from Australia, China, the Middle East, Europe, and North America).

The submitted questions were collected from a few predefined categories (Health, Sport, Computer & Internet, Pets…). In contrast to factoid questions used in previous QA tracks, YA questions are much more diverse, including opinion, advice, polls, and many other question types, thus making the task far more realistic and challenging. Here are a few examples of the submitted questions:

  • “What’s the best thing to do when depressed? Should I push myself to work?”
  • “I’m a black male with curly soft textured hair versus nappy hair. I want to twist it, is this possible for it to look nice and if so how?”
  • “Is it safe to use diluted clorox to get the stains off tea cups?”
  • “I want to migrate in USA. I am a software engineer from Bangladesh. Can anyone help me?”
  • “How does someone with one eye blink?”

About 25,000 valid answers were collected from participants by our test system. The correctness and the quality of the answers will later be judged by NIST assessors. Systems performance will be announced in the coming Trec conference at Washington D.C., in November 2015. In the conference we will also discuss and cover the different approaches used by the participants for Live QA. The questions with the judged answers will be available to the public after the conference for further research.

We want to thank all participants for their effort and patience. This is the first time that we run such an experiment and we are very happy with the amount of participation and with the quality of the answers. From our perspective, it was a real pleasure, and we all look forward for the results to be published. We are committed to run the LiveQA track again next year (if NIST will let us of course) and expect even more participation and higher answering quality next year.

Who are you Henryka Cohn (Continue)

A few days ago I got the following email from Ashwin Maini, a close relative of Henryka, which sheds some more light on her. I’m glad to post it  as is (with Ashwin permission of course)

Dear David Carmel
I am writing to you because I am the close relative of Henryka Cohn whose comments are quoted by you (as provided to you by Raymond Coffer) in your Blog post “Who are you, Henryka Cohn?“.
Henryka Cohn was my great-aunt (my grandfather’s sister).
I enjoyed your article and am touched that you had hoped that there might somehow be a familial connection, purely because of the surname.  However, I need gently just to let you know that you are quite correct in your conclusion in the article.  There is absolutely no connection whatsoever between your Cohn family and ours!
Henryka’s father and Richard Gerstl’s father knew each other in Vienna.   Henryka Cohn’s father (my great-grandfather) Hugo Cohn was actually from a small village in Prussian Upper Silesia called Schoffschuetz in the Kreis Rosenberg administrative district.  The Cohn family were not from Austrian Galicia (Galizien).  Rather, Hugo’s wife’s family (the Uiberalls) were from there.    Henryka’s parents lived there in the early years of their marriage before moving to Vienna.  That is why Henryka happened to have been born in Rzeszow in Austrian Galicia.
I have been researching our Cohn family for many decades, so I have a huge amount of detailed information about our antecedents.  Furthermore, my 89 year-old mother is still alive and often met her aunt Henryka, both in Vienna and London.
Perhaps you could please add a footnote to your Blog post, updating your article with these factual points?
With very best wishes.
Ashwin Maini

Who are you, Henryka Cohn?


The first time I met Henryka Cohn was at September 2014, at the Leopold Museum, Vienna. The picture “Portrait of Henryka Cohn”, by Richard Gerstl, is represented at the museum along with  many other masterpieces of Egon Schiele, Gustav Klimt, Oskar Kokoschka, and many other Austrian leading  painters who worked and lived at Vienna in the beginning of the 20th century.

The portrait caught my eye immediately. My grandmother, Gizela Cohn-Weinreich, who lived at Vienna at that time,  was at the same age of Henryka. Did they know each other? Did they ever meet? Is Henryka have any relationship to my family?

The story of my grandmother Gizela is told in my father’s book, That’s how it happened ( She was born in Galitzia, a province of the Austrian Empire, to her mother, Hannah. Hanna had come from Galitzia to Vienna in order to study medicine. To support herself, she worked for the Cohn family as a baby-sitter and maid. The Cohns were a very wealthy Jewish Sephardic family that dealt in diamonds. One of the family sons, Don, fell in love with Hannah and married her against his parent’s wishes.  This caused a huge scandal in the Jewish high society of Vienna. How could the son of a rich noble Sephardic family marry a poor servant girl from Galitzia? The Cohns disinherited their son and sent the young couple away from their home, cutting off all contact with them. Don returned with Hannah to her parents in Galitzia, where they had two children: Michael, born in 1885, and Gisela, born in 1886. Don, the young father, never got used to the poor and hard life in Galitzia, and died right after Gisela was born.  Hannah, a young widow with two small children, decided to return to Vienna. She gave up the idea of a university education, studied to become a certified midwife, and supported her family in this profession.

Gisela graduated from high school with honors, and wanted to study medicine like her mother, years ago.  Because she had to help her family, she could not pursue this dream. Instead of going to the University, she went to a business school.  After graduating, she got a job to support the family. Although she worked for long hours, she had an active social life and took part in the cultural events that flowered in Vienna during these years. She went to the opera, to concerts, and to plays, and attended social gatherings. The comfort life of Viennese Jews came to end with the German Anschluss in 1938. Gizela was murdered in Auschwitz, with her husband Moshe, in 1943.

No much is known about Henryka, However, we know much more about the painter. Richard Gerstl (1883-1908) was an Austrian painter known for his expressive psychologically insightful portraits. In 1898, at the age of fifteen, Gerstl was accepted into the Academy of Fine Arts. Around 1907, he began to associate with the famous composer Arnold Schoenberg. During this time, Gerstl painted several portraits of Schoenberg, his family, and his friends. These portraits also included paintings of Schoenberg’s wife, Mathilde, who became extremely close and, in the summer of 1908, she left her husband and children to travel with Gerstl. Schoenberg was in the midst of composing his Second String Quartet, which he dedicated to her. Mathilde rejoined her husband in October. Distraught by the loss of Mathilde, his isolation from his associates, and his lack of artistic acceptance, Gerstl entered his studio during the night of 4 November 1908 and apparently burned every letter and piece of paper he could find. Although many paintings survived the fire, it is believed that a great deal of his artwork as well as personal papers and letters were destroyed. Other than his paintings, only eight drawings are known to have survived unscathed. Following the burning of his papers, Gerstl hanged himself in front of the studio mirror and somehow managed to stab himself as well.

One of Gerstl paintings that survived was the portrait of Henryka Cohn. Following Henryka trails I approached Dr. Raymond Coffer from the University of London, who studied Gerstl work as part of his PhD thesis ( Coffer kindly let me know that Henryka’s father was Hugo Cohn and her mother, he believes, was Rivka (Regine) Uiberall. She was born at March 11, 1877, and the family moved from Rzeszow in Galicia, now Poland, when she was 3, to Vienna and settled at Esterhazygasse 23 in Vienna’s 6th District of Mariahilfestrasse. She never married and left Vienna on May 13, 1939, to England, where she lived until she died.

During her life in Viena, Henryka was an artist and piano teacher and a member of the Schonberg’s circle. Coffer cites one of her relatives: “… apart from being a “progressive and independent woman […] brought up in an atmosphere of considerable wealth. In a  cultured  and  assimilated  Viennese  Jewish  family,  she  was  a  “highly-regarded  […] music  teacher”,  who taught  singing,  piano  and actors.  Henryka  was also related to Gustav Mahler “and mixed freely in that society,” which may have accounted for  her connection to Schonberg, of whose circle she was an apparently confirmed member and to whom she sent a Christmas/New Year card after the Schonbergs moved to Berlin in 1911. It is highly possible, therefore, that Henryka’s portrait might have been the last  initiated by Schonberg. The work can  probably be ascribed to June 1908, presumably before Gerstl left Vienna on 27th, for it is doubtful that he would have been welcome into the households of Schonberg  after the denouement.

So the puzzle is almost solved. Henryka and Gizela, also sharing the same sure name, and also having very common history, were not relative after all. When I showed my father Henryka’s portrait, his immediate response was that she reminds him his cousin Lea Cohn (Michael’s daughter) who were murdered in Yugoslavia in WWII (see picture below). However, he does not remember any relative named Henryka from his childhood in Vienna. My grandmother roots belong to the Sephardic Cohn family from Vienna while Henryka roots belong to a Cohn family from Galitzia. Whether they knew each other? probably, however we do not have any evidence for that. Anyway, I’ve already decided to adopt Henryka as my new great-aunt.


Lea Cohn (Yugoslavia, 1941)

That’s How it happened – Memories of Dan Carmel

The book ‘That’s how it happened’, written by Dan Carmel, was published in 1997 in a very limited edition for family and friends only. This year, in honor of Dan’s 90th birthday, we decided that this is the best time to re-publish the book on the Internet, including a new English translation:

Through his personal history, my father tells the story of the establishment of the State of Israel. It begins with Dan’s childhood in Vienna, the capital of Austria, at the beginning of the 20th century. Jews who lived there comfortably and conveniently were deeply involved with the local society, while at the same time strongly influenced by the young Zionist movement born in this city. However, complacency and the comfortable life ended with the German annexation in 1938. Dan refuses to flee to the United States, along with his brothers and uncles, but insists on going to Israel, as he was educated by the Zionist youth movement. He joins a group of pioneers who intended to go to Palestine, leaving Vienna at the last minute, immediately after the outbreak of war. Unfortunately, the group gets stuck in Yugoslavia for two years. Dan narrowly succeeds in escaping at the last minute and safely arrives in Palestine. The people that remained, including his cousin Leah, are murdered after the Germans invaded Yugoslavia.

In Palestine Dan sheds the remnants of life in the diaspora. He stops speaking German and changes his surname from Weinreich to Carmel. After graduating high-school he joins the Palmach, the Israeli underground, and takes an active part in the defense operations before the establishment of the State of Israel. He participates in building Kibbutz Hukok in the Galilee, and fights in the Jordan Valley during the War of Independence. After the war, he marries Elka and they build their family at Kibbutz Beit-Hashita, where they live to this day.

In 2012 I visited Vienna, as part of my work, and took the opportunity to visit the house were you were born Father, at Arnezhofer-Strasse. The house is located in a small alley in the Prater quarter. Is this the alley where the Hitler Youth members marched shouting ‘Death for the Jews’? Is this the alley where you ran away from SA soldiers who wanted to punish you for your insolence? I looked at the building and saw the staircase you used to slide down, annoying your mother. Is this the staircase where years later you ambushed your Nazi neighbor who humiliated your mother and expelled you from your apartment, looking for revenge? Only God and you know how close he was to his death that day.

From your house I went to the beautiful Danube which flows through the city. A new modern bridge crosses the river instead of the old bridge that was destroyed during the war. I stood on the bridge and watched the water flowing calmly. It seems that here you stood, throwing in the service revolver and the certificate of excellence of your father who fought for the imperial army in WWI. You also threw your mother’s gold jewelry into the gray water, to avoid giving them to the hated authorities. I took the tram and watched the travelers. Luckily, there is no special trailer today for ‘Dogs and Jews Only’. Vienna is a cosmopolitan city, contemporary, smiling and cheerful, flooded with tourists and members of different races. The city has completely forgotten its dark past. I’m sure you’re glad of that.

Your book, Father, tells how you confronted evil at its most murderous, and built a beautiful home and family here in Israel. Is it the state you dreamed of, while lying on top of the water tower in Hukok, watching the moon overlooking the Sea of Galilee and the absolute desolation around you? I’m sure that even in your best dream you did not envision the prosperous country in which we live today. On the other hand, I suppose you did not foresee the effects of racism and xenophobia in the Jewish state. You probably ask yourself how people who suffered so much from hate and humiliation can hurt others in the same way? I hope your personal story can contribute to more tolerance among us.

This year we celebrate your 90th birthday, Father. Our wish for you is that you continue enjoying your grandchildren and great-grandchildren. Your book will expose them to the story of your life and to their roots. This is the story of your life, Father, it will live in our hearts forever.

In conclusion, I would like to thank everyone who helped bringing this book into existence. To Adi Carmel who helped in selecting the appropriate site and in finding the video interview of Dan and Elka talking about their wedding day. To Jenny Carmel for the cover design. To Chani Sacharen for the linguistic editing of the English version. And finally, especially to you Father, for sharing the story of your life with us, and for the tremendous effort you put in translating the Hebrew manuscript into English.

David Carmel
Haifa, Israel
March 2015


Entity Recognition and Disambiguation Challenge

Call for Participation

We are excited to announce the 2014 Entity Recognition and Disambiguation (ERD) Challenge! The participating teams will have the opportunity to not only win cash prizes in the total amount of US$1,500 but also be invited to publish and present their results at a SIGIR 2014 workshop in Gold Coast, Australia, co-sponsored by Google and Microsoft.

The objective of an ERD system is to recognize mentions of entities in a given text, disambiguate them, and map them to the known entities in a given collection or knowledge base. Building a good ERD system is challenging because

* Entities may appear in different surface forms.
* The context in which a surface form appears often constrains valid entity interpretations.
* An ambiguous surface form may match multiple entity interpretations, especially in short text.

The Challenge will have two tracks, with one focusing on ERD for long texts (i.e., web documents) and the other on short texts (i.e., web search queries), respectively. Each team can elect to participate either in one or both tracks.
The Challenge is open to the general public and participants are asked to build their systems as publicly accessible web services using whatever resources at their disposal. The entries to the Challenge are submitted in the form of URLs to the participants’ web services. Participants will have a period of 3 months to test run their systems using development datasets hosted by the Challenge website. The final
evaluations and the determination of winners will be performed on the held-out datasets that have similar properties of the development sets. Please visit the Challenge website listed below for more details.

Important Dates
February 5:
Challenge Website Live at
Discussion Forum Live at

March 10: Leaderboard and trial submission system online (tentative)
June 10: Trial runs end at 11:59AM PDT; Test begins at noon PDT
June 20: Team results announced
June 27: Workshop paper due
July 11: Workshop at SIGIR-2014, Gold Coast, Australia

David Carmel, Yahoo Lab
Ming-Wei Chang, Microsoft Research
Evgeniy Gabrilovich, Google
Bo-June (Paul) Hsu, Microsoft Research
Kuansan Wang, Microsoft Research

Did IBM Watson Pass the Turing Test?


Did IBM Watson Pass the Turing Test?

This post summarizes some talks that I gave on Watson and the Jeopardy challenge.

Recently millions of viewers witnessed computing history being made as IBM’s Watson question answering system defeated human champions in the Jeopardy! TV show. While watching the outstanding performance of Watson in the show, many tend to ponder about the “intelligence” of the entity performing in front of us. In this post I raise the question of whether Watson is indeed intelligent and discuss the meaning of machine intelligence in general.

The Turing Test

Artificial intelligence (AI) is the branch of computer science that aims to create intelligent machines. What is the meaning of an intelligent machine?  Alan Turing, a founding father of computer science whose centenary year has being celebrated recently around the world, was one of the first persons to ask that question. In his seminal paper from 1950, Computing Machinery and Intelligence [5], Turing raised the question: to what extent can a machine think for itself?  One of the arguments raised in this work was that since humans are the only “intelligent” entities we are aware of, any intelligent entity should “behave like” a human, i.e., its behavior could not be classified as artificial by an outside observer.  Turing suggested a test, now called the Turing test, to examine the intelligence of an unknown entity. An external interrogator observes the behavior of the entity and a human, where both are asked to solve the same set of problems simultaneously. The interrogator is only exposed to the actions taken by the two entities and to the outcome of their actions while solving the problems. When the judge is unable to distinguish correctly between the human and the artificial solutions, the machine has passed the test and should be considered intelligent. At that time, Turing anticipated that in about 100 years (forty years from now) artificial entities would be able pass the test for the majority of the tasks examined.

The Turing test has many shortcomings. One clear limitation is that humans are not ideal problem solvers. The artificial entity can be easily identified and fails the test when it comes to tasks that machines can solve far more effectively than humans.  For example, a calculating machine that is being asked to multiply two eight-digit numbers will perform much quicker and more accurately than any human being; therefore, it will be easily revealed. In order to pass the test, the calculating machine would have to perform sub-optimally, making a few mistakes and performing slowly in order to fool the judge. This does not make any sense from an engineering point of view and raises some doubts about the importance of mimicking the human behavior.

Weak AI

An alternative practical approach to AI, called weak AI, does not bother too much with the precise definition of intelligence, and focuses on building machines that are effective problem solvers for specific tasks. The goal of AI, accordingly, is to advance the technology by identifying tasks for which humans are currently better than machines, and look for solutions to those tasks, probably by observing and learning from humans, to ultimately arrive at an artificial solution that catches up to and even surpasses humans. Some examples of domains in which humans are still superior to machines are game playing, natural language understanding, speech recognition, image analysis, automatic car driving, and more. The artificial solution should not necessarily be identical to the human solution, as our goal is not to mimic humans but rather to achieve human level performance, and eventually outperform humans.

How can we determine that the artificial program outperforms the human solution? An evaluation test bed is needed to compare the performance of the two problem solvers. In contrast to the Turing test, the evaluation criteria should focus only on the effectiveness of the competing solutions, whether artificial or humanistic.

Deep Blue

Chess playing is an ideal domain for exemplifying the weak AI approach. In the early days of AI, many researchers designed and built chess-playing programs. But those programs were not able to challenge master-level human players. For many years, AI experts believed that high-level chess playing would remain infeasible for artificial programs in the near future (while some even argued that it would never be possible). The main argument raised by those experts was that an artificial expert-level playing system needs to be able to formalize the sophisticated heuristics used by the human experts—a task that is most likely infeasible [4].

In 1997, Deep-Blue, a chess program developed at IBM Research, defeated Garry Kasparov, the human chess champion at that time, in a six-game tournament [1].  In contrast to the expectations of specialists, Deep Blue did not apply any sophisticated heuristics to evaluate the board positions. The strength of its playing strategy came mostly from advanced computational machinery and a clever distributed search algorithm that was able to search the game tree many moves ahead, given the limited time allocated per move. This brute-force computational power was sufficient to beat Kasparov, one of the best human chess players of all times. Since then, many new chess programs have been developed with high-level playing performance, even surpassing Deep Blue with far fewer computational resources.

Did Deep Blue pass the Turing test? The answer is: certainly not. Any chess expert can easily identify a game played by an artificial program because such artificial players generally follow a typical strategy that assumes the worst case scenario, i.e., the opponent is assumed to play perfectly from the player’s perspective. This assumption leads to a typically conservative playing strategy, which is easily revealed. For example, such a strategy will never set traps since a perfect opponent is not expected to fall for them. However, from a weak AI perspective, chess playing is no longer a challenge. Computer scientists and practitioners might still find some interest in building machines that surpass Deep Blue. This is also true for strong AI researchers and other cognitive scientists who are still interested in modeling human strategies for chess playing. In contrast, weak AI researchers should turn their interests and efforts to new unexplored areas where humans are still superior, for example, the game of GO.

IBM Watson and the Jeopardy! Challenge

Natural language understanding (NLU) has always been a challenging domain for AI.  While a lot of progress has been made in this area during the last 50 years, the current technology is still unable to handle a free discussion with humans. Question answering (QA) is a specific NLU sub-task in which questions posed in natural language are automatically answered. QA has been extensively studied over the years and many QA systems were developed that can answer a wide range of question types including: facts, lists, definitions, and others.

Can an artificial QA system outperform humans in answering natural language questions? This is a typical weak AI challenge that was raised recently by a team of IBM researchers.  In order to answer such a question we first need an evaluation system in which the performance of humans and artificial systems can be objectively tested and compared. The Jeopardy! TV show was selected as an evaluation platform. Jeopardy! is a well-known television quiz show that has been broadcast in the United States for more than 50 years. It features trivia in a wide variety of topics, including history, language, literature, arts, science, popular culture, geography, and many more. Three human contestants compete against one another in a competition that requires rapidly understanding and answering, with penalties for incorrect answers.

IBM Watson, a QA system developed at IBM Research, was built specifically to play the game of Jeopardy! [2,3]. Watson was designed to answer Jeopardy! questions within a response time of less than three seconds. Watson’s main innovation is its ability to quickly execute more than 100 different language analysis techniques to analyze the question, find and generate candidate answers, and ultimately score and rank them.  Watson’s knowledge-base contains 200 million pages of structured and unstructured content, consuming 4 terabytes of disk storage. The hardware for Watson includes a cluster of 2,880 POWER7 processor cores and 16 terabytes of RAM, with massively parallel processing capability.

In an official Jeopardy! two-game match, broadcast in three episodes during February 2011, Watson beat Brad Rutter, the biggest all-time money winner on Jeopardy!, and Ken Jennings, the record holder for the longest championship streak. The match ended with a clear win for Watson. Jennings earned $4,800, Rutter took $10,400, and Watson won $35,734.

Did Watson pass the Turing test?  As in the case of Deep Blue, the answer is ‘no’. Clearly, there are many questions that are deemed more difficult for humans than for machines and vice versa. For example, factoid questions about non-curios and non-famous events in history might be easier for machines since the human brain is much more limited in remembering such events.  In contrast, puzzle questions are much easier for humans. For example, Jeopardy ‘common bonds’ questions (e.g., “what is in common to butter, carving, steak?” answer: knife) are much more difficult for machines because the expected answer type cannot be easily formalized.

Given the discrepancy between human and machines capabilities in answering specific types of questions, a smart interrogator can select questions that are difficult for humans while easy for machines, and vice versa. By analyzing the performance of the two entities being tested on these questions, the machine will presumably be detected [1]. However, from weak AI perspective, the research focus should be moved now to answering more challenging questions (e.g., puzzles) and to other unresolved tasks in the NLU domain.


The Watson success in winning Jeopardy! was a  tremendous advancement in the QA domain. Watson technology, based on deep natural language analysis and strong computational power, demonstrated again that tasks considered infeasible for machines can be solved effectively and efficiently, and even surpass human performance, when studied and researched by a motivated team allocated with appropriate and sufficient resources. The Deep Blue and Watson success stories strengthen our belief that other research areas can benefit from the weak AI approach of challenging human superiority as a platform for advancing computational technology. The question of whether such technologies can pass the Turing test seems to be an interesting thought exercise, however insignificant from a technological perspective.

1. Due to some concerns of the IBM team that the Jeopardy! show’s writers would exploit Watson’s deficiencies, it was agreed that questions for the match will be randomly selected from a pool of Jeopardy! clues that were never broadcast.


  1. Murray Campbell, A. Joseph Hoane, and Feng-hsiung Hsu. Deep Blue. Artificial Intelligence 134 (1), 2002, pages 57-83.
  2. Dave A. Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A. Kalyanpur, Adam Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty. Building Watson: An overview of the DeepQA project. AI Magazine, Volume 31 (3), 2010, pages 59-79.
  3. Dave A. Ferrucci. Introduction to “This is Watson”. IBM Journal of Research and Development, Volume 56 (3, 4), 2012, pages 1-15.
  4. David Levy and Monty Newborn. How computers play chess. Computer Science Press, 1991.
  5. Alan M. Turing. Computing machinery and intelligence. Mind 59, October 1950, pages 433-460.