NTCIR-11 Core Task: "Spoken Query and Spoken Document Retrieval (SpokenQuery&Doc)"


SpokenQuery&Doc task evaluates the information retrieval systems that make use of the speech technologies in two ways, i.e. speech-driven information retrieval and spoken document retrieval.

Current keyboard-typing-based information retrieval framework seems to face bottleneck in its human interface for drawing out one's information need. The SpokenQuery&Doc task evaluates the near future IR framework that makes use of a spontaneously spoken query instead of a typed text query. Here, a spontaneously spoken query means what is not sufficiently arranged before speaking, is speaked freely in its style, and tends to be long in its length. Note that its spontaneousness contrasts with the spoken query submitted to current voice search systems, which is well arranged before speaking and tends to be short lists of keywords. One of the advantage of the use of such spontaneously spoken queries as input to retrieval systems is that it enables users to easily submit long queries to give systems rich clues for retrieval, because the unconstrained speech is common in daily use for human and the most natural and easy way to express one's thought.

The target document collection of the SpokenQuery&Doc task is also speech (spoken documents). Following the NTCIR-9 SpokenDoc and NTCIR-10 SpokenDoc-2 tasks, SpokenQuery&Doc task evaluate two SDR tasks, i.e. spoken term detection (STD) and spoken content retrieval (SCR). Common search topics are to be used for STD and SCR, that results in simultaneous and componet-wise evaluations of STD and SCR.

Data Set

The lecture speech data, the recordings of the first to seventh annual Spoken Document Processing Workshop (the SDPWS data set), are going to be used as the target document in SpokenQuery&Doc. For this speech data, the manual and automatic transcriptions (with several ASR conditions) are to be provided. These enable researchers interested in SDR, but without access to their own ASR system to participate in the tasks.

Task Overview

The main task of SpokenQuery&Doc is searching spoken documents for contents described in spontaneously spoken query (spoken-query-driven spoken content retrieval: SQ-SCR). The partial tasks of the main task (called sub-tasks) are also conducted; spoken term detection task from spoken term included in the given spoken query (SQ-STD), spoken content retrieval task from the search results of SQ-STD (STD-SCR). For these tasks, the manual and automatic transcriptions of the spoken queries are also to be provided. These enable the participants in the previous SpokenDoc task to participate in the tasks using the text queries. For the SQ-SCR and STD-SCR tasks, a searching unit is either a speech segment that is spoken within a presentation slide (slide retrieval task) or a boundary-free speech segment (passage retrieval task).



Standard STD and SDR methods first transcribe the audio signal into its textual representation by using Large Vocabulary Continuous Speech Recognition (LVCSR), followed by text-based retrieval. The participants can use the following three types of transcriptions.

  1. Manual transcription

    It is mainly used for evaluating the upper-bound performance.

  2. Reference Automatic Transcriptions

    The task organizers are going to provide reference automatic transcriptions for the target speech data. These enabled researchers interested in SDR, but without access to their own ASR system to participate in the tasks. They also enabled comparisons of the IR methods based on the same underlying ASR performance.

    The textual representation of them will be both the n-best list of the word or syllable sequence depending on the two background ASR systems, and the lattice representation of them.

    1. Word-based transcription

      Obtained by using a word-based ASR system. In other words, a word n-gram model is used for the language model of the ASR system. With the textual representation, it also provides the vocabulary list used in the ASR, which determines the distinction between the in-vocabulary (IV) query terms and the our-of-vocabulary (OOV) query terms used in our STD subtask.

    2. Syllable-based transcription

      Obtained by using a syllable-based ASR system. The syllable n-gram model is used for the language model, where the vocabulary is the all Japanese syllables. The use of it can avoid the OOV problem of the spoken document retrieval. The participants who want to focus on the open vocabulary STD and SDR can use this transcription.

  3. Participant's own transcription

    The participants can use their own ASR systems for the transcription. In order to enjoy the same IV and OOV condition, their word-based ASR systems are recommended to use the same vocabulary list of our reference transcription, but not necessary. When participating with the own transcription, the participants are encouraged to provide it to the organizers for the future SpokenDoc test collections.

Task Description


2013-09-02NTCIR-11 Kick-off event
2012-13-20 2014-03-31Task registration due
2014-06-27 ~Dry run
2014-07-18 ~Formal run


  • Tomoyosi Akiba (Toyohashi University of Technology)
  • Hiromitsu Nishizaki (University of Yamanashi)
  • Hiroaki Nanjo (Ryukoku University)
  • Gareth Jones (Dublin City University)


Registration form is available at the official page of NTCIR-11.


トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2014-06-25 (水) 20:48:47 (2035d)