Room 4.04 RRST
Room 4.04, Run Run Shaw Tower, Centennial Campus, HKU


Sep 18 2018


3:30 pm


Department of Linguistics

Department of Linguistics

Strength of Forensic Voice Comparison Evidence from Long-term Fundamental Frequency in Chinese Conversations


Phil Rose
Australian National University Emeritus Faculty

School of Criminal Investigation, Southwest University of Political Science & Law, Chongqing
Chongqing Institutes of Higher Education Key Forensic Science Laboratory



In forensic speaker identification, the expert typically compares questioned and known voice samples to help determine whether the questioned voice has come from the known speaker. Usually the questioned sample is from an offender and the known sample from a suspect, and the beneficiary is a fact-finder, e.g. judge or jury, or an investigating authority, e.g. police.

One popular acoustic-phonetic feature in forensic semi-automatic speaker recognition – its mean and standard deviation values reportedly used by 94% and 72% of forensic voice comparison experts world-wide – is fundamental frequency (F0), the acoustic reflex of the rate of vibration of the vocal cords. This is because of promising results in early speaker recognition research, and also because F0 is relatively easily measurable and there is usually lots of it. F0 is also relatively immune to channel distortion.

The most important property of any feature used to help identify voices forensically is its demonstrable effectiveness in discriminating same-speaker speech samples from different- speaker speech samples, and it must have been shown to do so under the conditions of the forensic case in which the feature is being used:

  • Without actual empirical evidence of the ability of a forensic feature-comparison method to produce conclusions at a level of accuracy appropriate to its intended use under circumstances reasonably related to this use, an examiner’s conclusion that two samples are likely to have come from the same source is completely meaningless.” [E.S. Lander: ‘Response to the ANZFSS council statement on the President’s Council of Advisors on Science and Technology Report.’ Australian J. Forensic Sci. 2017]

As well as the many linguistic uses of F0, which encodes tone, intonation and stress, many non-linguistic factors, like state of health, are also known to affect it. This multiplicity of factors has an adverse effect on its between- to within-speaker variance ratio by increasing the latter. Since the inherent strength of forensic speaker recognition features relies primarily on their ratio of within- to between-speaker variance, one would not expect particularly good strength of evidence from global F0 properties, and this has been demonstrated in several studies. These studies have, however, also used arguably ecologically less than valid material, or at least material less likely to occur in real cases, for example, contemporaneous recordings or monologs of varying, but atypically long, duration. Such conditions also have the potential to overestimate the strength of evidence of uncontrolled global F0.

One relationship between suspect and offender recordings which is commonly found in real-world case-work, but which does not seem to have been tested, is mismatch in formality between the conditions under which the suspect and offender voice recordings are obtained. Typically, the suspect’s recordings are taken from a formal police interview, whereas the offender’s recordings are from informal conversational exchanges. The aim of the research described in this talk was to test how well F0 from natural speech performs, in particular under these mismatched conditions. We used a database of non-contemporaneous recordings of informal conversations, simulated police interrogations, and information exchanges from 90 male Chinese speakers of North-Eastern Mandarin. The database was recorded in 2011 for the purpose of aiding forensic voice comparison research and practice. The data were evaluated within the likelihood ratio framework, which was recently endorsed as best practice in forensic automatic and semi-automatic speaker recognition by the Board of the European Network of Forensic Science Institutes, representing 58 laboratories in 33 countries. In the talk I will also explain the likelihood ratio approach and how it is used to estimate the strength, and weight, of evidence.