Publication date: 
2023/12/12
Some modern broadband technologies commonly used in online or mobile communications are less able to transmit female voices than male voices. The findings were made by the team of Professor Jan Holub, who heads the Department of Measurement at Faculty of Electrical Engineering. The researchers applied a different calculation method than is commonly used in current tests of transmission technologies. According to the research, the difference in the transmission quality of the female voice is significant. The study resulted in formulation of a technical recommendation by the European Telecommunications Standards Institute (ETSI).

Picture

According to Prof. Holub, the broadband technologies used for the last 20 years or so, as well as the more recent use of surround sound, aim at the most realistic voice recording possible - that is, the listener should hear the caller "as if sitting by his side". Many people use these networks every day, for example in online communication applications, mobile phones and other indispensable technologies of today.

Standardised methods are used to measure the quality of various parameters, including voice transmission. "These methods are both subjective, i.e. listening or conversational tests, and objective, where the tests are performed using algorithms that should give a similar result as the subjective tests," explained Prof. Holub, who together with his faculty colleagues has been working on the quality of voice transmission in communication networks since the late 1990s. He noted that these procedures are used, for example, when a mobile operator selects a new technology for deployment in its network.

Prof. Holub described that to assess the quality of state-of-the-art technologies, listening tests according to the current standard use a set of male and female voices in a specified minimum range and number of samples. "Then, in a gender-balanced way, a statistic is created that builds up generally over all the voices together. But no one has yet officially evaluated this separately - that is, only for male and only for female votes," the researcher described. And that's exactly what his team set out to do. "When evaluated separately in this way, the results show that even with many state-of-the-art technologies, female voices are transmitted with statistically significantly lower quality, which increases the listening effort required on the part of the listener, or even affects the intelligibility of the transmission," said Prof. Holub. "And if we go on a scale of 1 to 5 (the so-called MOS - Mean Opinion Score), it's about a tenth of that scale," the scientist explained. He stressed that even deviations of around 0.2 are important for deciding on the quality of different technologies. "0.5 is already quite a big difference," he added.

How to apply the study in practice?

The study, which is backed by prof. Ing. Jan Holub, Ph.D., doc. RNDr. Kateřina Helisová, Ph.D. and postgraduate student Ing. Yann Kowalczuk, resulted in a draft recommendation from the European Telecommunications Standards Institute (ETSI). Among the member organizations that supported the proposal of the FEL team, Prof. Holub said, besides a number of companies, was also the NATO Communication and Information Agency (NCIA).

The draft Recommendation was discussed by the ETSI STQ comitee, adopted by the delegates after incorporating a number of comments and published on 25.10.2023 under ETSI TR 103 950: Gender-related aspects of listening quality and effort in speech communication systems. This opens the way for a better balance of these aspects in the design of future codecs and transmission systems.

How the problem arises

According to Prof. Jan Holub, the worse transmission of female voices was known so far with older narrowband connections, for example with amplitude modulation (AM), which is still used for safety reasons in air traffic. "It is typical for narrowband that an average sounding female voice, which has a higher pitched fundamental tone and all of its energy is higher in the spectrum, is frequency clipped. So the information is technically harder to transmit than in the case of deeper male voices. However, this is 'nicely compensated' by the fact that narrowband transmissions tend to propagate in a cluttered environment, which in turn 'masks' male voices lying in a similar spectrum to the hustle and bustle," described Prof. Holub. Paradoxically, female voices are sometimes better understood in real life, even though the results from laboratory measurements show the opposite.

In the case of modern broadband and surround sound technologies, however, this "clipping" no longer occurs, and yet women's voices are often transmitted less well. "The reasons why the difference arises are quite well known. It's always a trade-off between some new criterion and how much data needs to be transmitted per call," Prof. Holub outlined. "One of the criteria in the design of a modern digital encoder is the frame length. The speech signal is divided into overlapping parts. The shorter the frames, the more frames there are, per minute or second. The longer they are, the fewer there are. If each of these sections is encoded through a library of instantaneous spectra into a finite number of bits, then in the final analysis, the longer the section, or thinner the packetization, the less data is transmitted," the scientist described the process. This, he said, also has implications for savings in the transmission network when, for example, part of the transmission path is leased. "As the female voice lays higher in the spectrum, a lot of the detail in a given time course is sped up. Hence, the larger the frames chosen, the harder it is to encode, as the encoder inside the frame assumes it is a quasi-steady signal. It cannot capture the rapid changes there well," the expert added.

The first step for improvement, he says, is to use shorter frames. "Which unfortunately has the direct consequence of increasing the required transmission speed. Or conversely, when designers are forced to fit in a given bit rate, one option is to design a sufficient length of speech frame. This is a known fact," said Prof. Holub.

"Then there are the frequency filters that occur enroute. They are designed to reduce noise outside the speech spectrum. These filters are historically designed so that they may suppress the higher frequency components - including part of the female voice spectrum. That's an easily fixable thing, but it's worse with packetization because it just costs something," the researcher said. He stressed that the requirement to reduce the statistically significant difference between the transmission of the average male and female voice is justified.

In his words, the female voice was not deliberately omitted. "The packetization has simply evolved historically from narrowbanding, where the frames were even longer or even more poorly coded, and it hasn't caught up yet," Prof. Holub noted. In reality, the current state of affairs can manifest itself in that, for example, messages dictated over the radio link have to be repeated multiple times and the communication takes longer.

Contact person: 
Name: 
Mgr. Šárka Loukotová Novotná
E-mail: 
loukosar@fel.cvut.cz