The annotation and transcription are performed on the 4 channel headset audio recordings with tool XTrans (figure above). The default output file is in ".tdf" format, which is then converted into ".mlf" and ".stm" format in release. The information about strips for dataset definition is added into the ".stm" format. Below are details for each format for those who are not familiar with them.
The .stm file starts with label information lines.
These information will be read by
NIST scoring tool SCTK so that
WER will be analysed for each category and each label during scoring. Such label information
lines start with ";;", while the main transcription does not have ";;".
For main transcription, i.e. the lines without ";;", each line displays several parts of information for each annotated speech utterance in the following order:
It is worth emphasizing that the second column (microphone channel) differs when evaluating
the ASR output based on individual headset microphone (IHM) recordings and the ASR output
based on single or multiple distant microphones (SDM/MDM). For SDM and MDM, the value for
the second column should be the same for all utterances. For NIST scoring tool there are a lot of
options for this column as long as the string is the same among all utterances. However in Kaldi
default setup, there is one validation script that only accepts either "A" or "B" as the value for this
The label column quotes one or multiple labels with "< >". This column indicates recording ID (swc1/swc2/swc3) and strip ID (A/B/C), and both will be used to decide which dataset that utterance belongs to. Below is one piece of transcription for SWC1 for IHM in .stm format.
The .mlf file is provided for HTK based systems. The format of each utterance/segment name is
Each segment name is followed by word transcription with each word in one line. Below is a piece of transcription for SWC1 in .mlf format.
The .tdf format is the default format as XTrans
output. It looks similar to .stm formt,
while the field separation is tab ("\t") rather than space (" "). The
.tdf format is included
in the release package for users' convenience to edit and verify the transcription or
annotation along with audio (.wav file) in case of need.
Below is a piece of transcription for SWC1 in .tdf format.