A Pilot Study on Tone-Dependent Directivity Patterns of Musical Instruments
The work in this page was presented at the 4th AES International Audio for Virtual and Augmented Reality Conference (AVAR 2022, August 15–17) in Redmond, Washington, USA.
Andrea Corcuera Marruffo and Vasileios Chatziioannou. "A Pilot Study on Tone-Dependent Directivity Patterns of Musical Instruments" Audio Engineering Society Conference: AES 2022 International Audio for Virtual and Augmented Reality Conference. Audio Engineering Society, 2022
Introduction
Accurate representation of the directivity characteristics of sound sources is fundamental to achieving authentic simulations of virtual acoustic environments [ 1]. Variations in the directivity characteristics of a source can potentially influence the perceived localization [2 ] and auditory distance [3]. Sound sources, like human voice, loudspeakers or musical instruments have each a distinctive directivity pattern that vary significantly across the frequency range and can also change depending on other aspects.
To better understand the perceptual requirements of musical instrument directivities in virtual acoustic environments, this paper analyzes the differences between tone-dependent directivities and the directivity averaged over all tones. Using multichannel single-note recordings from the Technical University of Berlin (TU Berlin) database, the directivity patterns of several musical instruments were derived. To validate the classification of the instruments into three categories, the patterns were analyzed based on their maximum directivity index and direction of maximum directivity. After selecting a single instrument representative of each group, the spectral differences of each tone were calculated from their spherical harmonic representations of 4th order. Subsequently, a listening test was performed to investigate whether listeners can hear differences between auralizations using averaged directivities and auralizations using tone-specific directivities of the three selected instruments under anechoic conditions.
Sorting of musical instruments based on TU Berlin database
In order to investigate the differences between time varying (tone-specific) and static (averaged) directivities, and to reach general conclusions about symphonic instruments, a set of instruments was selected from groups with similar radiation characteristics. The conventional classification divides the symphonic musical instruments into four groups or families: strings, woodwind, brass, and percussion instruments. However, this and other traditional classifications are not based on the radiation of the instruments but on other criteria, such as the morphology of the instruments or the way the sound is generated [ 16 , 17 ]. Shabtai et al. [ 15 ] made a preliminary sorting into three groups depending on how the instruments radiate sound. To validate this classification, this section presents a general analysis of the musical instruments according to their maximum directivity index and direction of maximum directivity.
From the TU Berlin dataset, 38 musical instruments were selected for analysis; all but the timpani, which did not contain single-note recordings, and the singer, which was excluded in order to focus on musical instruments. The recordings of single notes in ff were used for the analysis to guarantee a good signal-to-noise ratio.
Figure 1 - Maximum directivity index (DImax) of the TU Berlin database, grouped in three categories suggested by Shabtai et al. in [15].
In this study, the directivity index (DI) is defined as the ratio between the sound power at a certain direction and the average power over all measured directions [20]. The DI of a source indicates the extent to which the source’s radiation is biased towards a certain direction, as a function of angle and frequency.
Figure 1 shows the DImax of all selected instruments, grouped into the three categories proposed by Shabtai et al. In general, brass and many woodwind instruments present low DImax at low frequencies and higher DImax as the frequency increases. Musical instruments in the Category I (all bras instruments, English horn and basset horn) show the highest DImax, which increases considerably with frequency. However, the English horn, with a constant DImax value over the entire frequency range, does not show the same behavior as the rest of the instruments in this category. It is therefore surprising that this instrument falls into the same category as all brass instruments.
Some woodwinds (tenor saxophone, the modern and classical clarinets) also exhibit high DImax at frequency bands above 3000 Hz, but to a lesser extent than brass instruments. In contrast, strings and some woodwinds, such as the flute, tend to exhibit low DImax values over the entire frequency range, suggesting that they are less unidirectional (they radiate less in a single direction, like brass instruments). It should also be noted that, although measurements were carefully done with the instruments pointing in a specific direction (for example, brass often pointing at one specific microphone), the DImax of some instruments may vary slightly, as the measurement point may not coincide to the maximum point of radiation of the instrument.
Tone-dependent directivity analysis
The listening test and note-dependent objective analysis were conducted using the open-access database of spherical harmonic (SH) representations of sound sources provided by [22]. This database contains impulse responses of various notes of several instruments, based on the TU Berlin measurements and representing the directivity of the sound source in various given discrete directions.
To obtain the directivity pattern at a particular direction around the instrument in the horizontal plane, the spherical harmonic representation of 4th order of the directivities was first obtained from the data using the toolbox provided by Ahrens in [22]. Then, the magnitude directivity was computed from the SH representation in various directions. One musical instrument representative of each of the three aforementioned categories was used for the tone-dependent analysis and listening test: a trumpet, an oboe, and a violin.
Figure 2 - Balloon plots (N = 4) of F0 (1047 Hz) for various tones for the oboe (TU Berlin database), and the averaged directivity.
Generating stimuli for the listening test
Following [24], stereo signals were approximated by using the directivity at two directions separated 5 degrees in the horizontal plane (ear-to-ear distance of 18 cm, calculated for a distance of 2.1 meters to the source), centered at 60 degrees. This direction was chosen based on the results of the spectral analysis, which suggested that differences would be audible at this angle. It should be noted that since this pilot study evaluates a simplified situation with only direct sound under anechoic conditions, binaural signals are not used in the listening test. However, future investigations should include binaural signals to study tone-dependent directivities in room simulations.
The anechoic recordings used for the listening test were obtained from the denoised versions of anechoic orchestral recordings provided in [26]. To avoid colorizing the spectrum, the directivities were normalized by the directivity in the direction of the microphone used in the dry recordings (azimuth = 0º and elevation = 11º), obtained from SH interpolation. Static directivity auralizations were obtained by convolving excerpts of anechoic recordings of the three instruments with the averaged directivities. The averaged directivities were calculated for each instrument by averaging the magnitude across all available tones before performing the normalization by the recording microphone. Following [23], minimum-phase filters were computed from the magnitude spectra.
Time-varying auralizations using tone-dependent directivities require knowing the tone being played at every moment in order to use their corresponding directivity patterns. Therefore, in this study the monophonic pitch tracker CREPE [27] was used to estimate the pitch of the chosen sound excerpts, with a time step of 10 milliseconds. The output of the pitch tracker contains the timestamps, the predicted fundamental frequency in Hz, and the confidence (value from 0 to 1). Before using this information to generate the stimuli, the predicted fundamental frequencies with a confidence lower than 0.5 were set to the previous predicted frequency with a higher confidence level. Predicted pitch with frequencies higher than the expected highest frequency per instrument were considered outliers and were replaced by a lower neighboring value. In order to avoid misleading results derived from the use of vibrato in the recordings, the estimated pitch of the anechoic excerpts was smoothed by applying a median filter. The predicted pitch of the excerpts was then manually revised and fixed if needed, and linked to their corresponding tones and directivity patterns. Finally, the tone-specific stimuli were generated by block-wise and time-variant convolution of the anechoic recordings with the directivity filter of each corresponding tone.
Listening test
On a user interface developed in Matlab, listeners were presented with stimulus A, B and X and two forced answers: X equals A or X equals B. For each trial, the simulation with the tone-specific and averaged directivities were randomly assigned to the A and B buttons and one of them was randomly repeated on button X.
The participants could listen to the sound samples as many times as desired before giving an answer. To ensure that a variety of notes was included in the test, participants listened to three melodies of 2-5 seconds each. The sounds were presented through headphones (Beyerdynamic DT990), with the same playback level for all listeners.
To familiarize themselves with the test procedure and stimuli, participants underwent a training session with 3 conditions (one per instrument) prior to the listening test. After the test, the participants completed a short questionnaire about their musical background (years of experience) their experience with listening tests, and whether they had any hearing impairments. They also answered in their own words what auditory cues they had used to differentiate the sounds.
A total of 10 listeners, 4 men and 6 women, aged 20-33 years (mean 24.9 years) participated in the listening test, which lasted about 30 minutes on average. Written informed consent was received from all participants at the beginning of the session. All of them reported normal hearing and had at least 12 years of musical experience (mean 17.2 years) or experience with listening tests, therefore they were considered trained listeners. Every participant was presented with a total of 45 test trials (3 instruments X 3 melodies X 5 repetitions). While during the test the order of the instruments was randomized for each participant, in the training, the conditions were presented in the same order to all participants.
Figure 2 - Results of the ABX listening test of tone-specific and averaged directivity patterns for all participants and conditions. The height of the bars indicates the number of correct answers. Null hypothesis (listeners cannot hear any difference) is rejected for scores on or above the dotted lines (significance levels at 5% and 1%)
For each test condition, there were a total of 135 answers (9 participants, 15 repetitions). Applying the binomial distribution for the analysis of the results allows the calculation of the probability that a number of correct answers occur by chance. If the number of correct answers is above the critical value, the differences between tone-specific and averaged directivities are considered to be significant. For a 5% significance level with Bonferroni correction, the critical number of correct answers in order to reject the null hypothesis is 81 (detection rate 60%). For a 1% significance level, the number of correct answers must be equal or higher than 84 (detection rate 62.2%).
As seen in Figure 2, the pooled detection rates for both the violin and oboe stimuli are significantly above the critical values, indicating that differences between tone-specific and averaged directivity representations are audible. For the trumpet, this difference was barely significant only for a significance level of 5% but was not significant for a significance level of 1%.
After completing the listening test, participants wrote in their own words which auditory cues influenced their decisions. All listeners identified timbre or color as their main cue for distinguishing sounds (also reported as brightness, harmonic content, and how muffled the sounds were). Furthermore, a couple of participants also mentioned audible changes in the onset of the tones.