Acoustic simulations for virtual environments in Unity

Semester project for the MSc. in Sound and Music Computing

Alex Baldwin and Andrea Corcuera 

(2017) 

This project presents a method for acoustics simulation of virtual environments. This approach, which is integrated in the Unity3d game engine, models sound waves propagation using the ray-tracing geometric method.

The created software is an implementation of geometric ray tracing for modelling sound propagation in a given virtual environment, written in C++ as a standalone spatialiser plug-in for the game engine Unity3D. The software was designed to easily accommodate different input geometries as well as methods to quickly change parameters such as number of source rays, maximum ray length, maximum number of reflections and absorption coefficients for the geometric objects.


Reverberant scene in the shape of a cave used for testing 

The proposed method begins by consolidating the input geometry and storing it in a bounding volume hierarchy binary tree acceleration structure in order to facilitate quicker intersection tests. Rays are then emitted from the source position and specular reflections for each ray modelled,the ray’s energy loss due to absorption is calculated during this process. Rays will continue to propagate through the scene until they have either exceed the maximum path length, maximum number of reflections,their energy has decreased below a threshold or they intersect with the listener. Once all propagation paths have been found it is possible to calculate propagation time from the length of their paths as well as intensity at the listener by the amount of remaining ray energy. This results in an impulse response of the the early reflections for a given source/listener position and geometry combination. The computational expense of using geometric methods means that tracing paths with a high number of reflections requires too much time for interactive mediums, therefore late reverberation is estimated by calculating the decay time of the early reflections using a Schroeder decay curve. The early reflections and late reverberation are combined into a single stereo impulse response and then auralised using convolution. 

Omnidirectional source used for the test.

Modelling the source

We assume an omnidirectional point source and therefore it is modelled as an sphere emitting a fixed number of rays in various directions. To ensure that the rays have an homogeneous distribution over the sphere, the surface of the sphere was divided into a number of equal areas 

Early reflections and late reverberation

In order to input the rays into an impulse response the power carried by each ray upon arrival at the listener as well as the time a ray takes to propagate through the scene needs to be calculated. The power of the ray at the listener’s position for a given sub band is calculated. A more accurate representation of energy of the ray would be to convert it to intensity which is defined as the power per unit area, therefore that’s done afterwards.

The reverberation decay curve can be obtained using the Schroeder’s backward integration method [18]. This is computed by integrating the impulse response h(t) in a backward time order. 

 

Reverberant scene in the shape of a cave used for testing 

Evaluation

A study with 23 users was conducted in order to evaluate the effectiveness of our system integrated in different virtual environments. 

Two main hypothesis were tested:

We generated the audio using the full-geometry of the scenario and we varied the number of rays shot. Three different sound excerpts were played using impulse responses created by shooting 1.000, 10.000 and 30.000 rays. On the other hand, two sounds were generated by varying the geometry definition of the scenario: full definition and a simplified version of the environment that fits a "shoebox" within the original scene.

The experiment was conducted using a computer and a pair of headphones. The listeners were provided with a list of sounds that they could listen as many times they wanted. They were asked to imagine the room where they thought the sound was. After listening each sound the subjects had to fill a questionnaire and rate the size, emptiness and material of the room that they imagined using in a 5-points Likert scale. After answering questions about the scene that they imagined, a picture of the scenario (rendered in Unity) used to get the sound reproduced was showed to the listeners. They had to choose which audio fit the room best and they were asked to describe the differences between the room that they had imagined and the one in the picture. The number of sounds were 10 in total. The audio excerpt used for all the examples was a woman singing an ascending scale.

Results

Significant differences for the size (p = 0.00002), emptiness (p = 0.0002) and material (p = 0.00001) perception of the reverberant room (the cave) were found by performing one-way ANOVA. A clear evidence is that 78.3\% of the participants chose the sound rendered with 50.000 rays instead of the ones created with 1.000 and 10.000 as the one that fitted the best in the cave, and this value goes up to 83\% if we only quantify the answers from people with 3D experience. This is consistent with our first hypothesis. However, in the case of a smaller, less reverberant room, only 34.8\% of the subjects picked the sound rendered with the maximum number of rays and no significant differences were found for the size nor the material of the room. There were found significant differences for the sense of emptiness (p = 0.003), which is promising considering that the main difference with the other scenario is the presence of objects in the scene.

As far as the comments about the bedroom are concerned, we got very favourable responses for both cases. The subjects specified that they had imagined scenarios quite similar to the ones that we showed to them. In the first example, the cave, only three listeners commented that the room that they imagined was smaller, whereas the rest wrote that they thought about big empty rooms with reflective materials, like stone or marble. For the children's room the main comments were about the emptiness of the room, they described common rooms, like living/dinning rooms.

These results, similar to those that [17] found in his thesis , suggest that for reverberant scenarios with low absorption coefficients the number of rays used need to be large, whereas for rooms with walls/objects with a high level of absorption the amount of rays traced don't need to be high.

On the other hand, paired t-tests of data, which is assumed normal distributed, were conducted to prove the importance of the geometry definition in the perception of the room enclosure. In the case of the children's room, only significant differences were found for the sense of emptiness and half of the participants preferred the sound rendered with the shoebox. Significant differences were found in the case of the cave for the perception of size (p = 10^6) and material (p = 0.001), which means that the simple "shoebox" geometry was rated as bigger and more reflective. This last case was the one rated as the one which fit the best in the cave scenario, which suggest that for simple geometry environments with the same material for all the enclosure, a non-complex definition is enough to render the sound.

We can conclude from our evaluation that the number of rays involved does matter for environments where the walls are reflective, whereas a large amount of rays is not necessary for rooms with enclosures and objects with high absorption coefficients. It would be interesting then to find the optimal number of rays required to get a good sound while keeping a low computational cost.

However, our second hypothesis can not be confirmed. The "shoebox" shaped approximation of the enclosure may be enough for rendering audio in simple environments, but we can not confirm that this can be applied to other geometries too. Some factors may have influenced to this results. The simplification of the reflections to only specular leads to a worse reverberation estimation. For the cave's case a low convergence ratio of the rays to the receiver have been obtained, which influences in the number of early reflections represented, which leads to a worse representation of the sound. As far as the children's room is concerned, the picture showed to the subjects may have not been the best one since the perspective makes the room look bigger. This may have yielded the subjects to rate a more reverberant sound as the one which fitted the best instead of one drier sound. In addition, the shape of both enclosures were quite rectangular, thus, for future tests a more complex geometry with different shapes will be used.

 

REFERENCES

[1] Interpolation of Combined Head and Room Impulse Response for Audio Spatialization. IEEE, October 2011.

[2] L. Antani and D. Manocha. Aural proxies and directionally-varying reverberation for interactive sound propagation in virtual environments. IEEE Transactions on Visualization and Computer Graphics, 19(4):567–575, April 2013.

[3] Dinesh Manocha Carl Schissler, Ravish Mehra. High-order diffraction and diffuse reflections for interactive sound propagation in large environments. http://gamma.cs.unc.edu/HIGHDIFF/paper.pdf, 2016.

[4] A. Chandak, C. Lauterbach, M. Taylor, Z. Ren, and D. Manocha. Ad-frustum: Adaptive frustum tracing for interactive sound propagation. IEEE Transactions on Visualization and Computer Graphics, 14(6):1707–1722, 2008.

[5] Thomas Funkhouser, Nicolas Tsingos, and Jean-Marc Jot. Survey of methods for modeling sound propagation in interactive virtual environment systems, 2003.

[6] Marko Jankovic, Dejan G. Ciric, and Aleksandar Pantic. Automated estimation of the truncation of room impulse response by applying a nonlinear decay model. The Journal of the Acoustical Society of America, 139(3):1047–1057, 2016.

[7] Matti Karjalainen, Poju Antsalo, and Timo Peltonen. Estimation of modal decay parameters from noisy response measurements. Journal of the Audio Engineering Society, 2002.

[8] Gary S. Kendall. A 3d sound primer: Directional hearing and stereo reproduction. Computer Music Journal, 19(4):23–46, 1995.

[9] A. Krokstad, S. Strom, and S. Sørsdal. Calculating the acoustical room response by the use of a ray tracing technique. 8:118–125, 1968.

[10] Y. W. Lam. A comparison of three diffuse reflection modeling methods used in room acoustics computer models. The Journal of the Acoustical Society of America, 100(4):2181–2192, 1996

[11] Eric A. Lehmann, Anders M. Johansson, and Sven Nordholm. Reverberationtime prediction method for room impulse responses simulated with the imagesource model. 2007.

[12] Hilmar Lehnert. Systematic errors of the ray-tracing algorithm. Applied Acoustics, 38(2):207 – 221, 1993.

[13] R. Mehra, A. Rungta, A. Golas, M. Lin, and D. Manocha. Wave: Interactive wave-based sound propagation for virtual environments. IEEE Transactions on Visualization and Computer Graphics, 21(4):434–442, April 2015.

[14] Lakulish Antani Dinesh Manocha Micah Taylor, Anish Chandak. Interactive geometric sound propagation and rendering. Intel Software Network, 2010.

[15] W. Mueller and F. Ullmann. A scalable system for 3d audio ray tracing. In Multimedia Computing and Systems, 1999. IEEE International Conference on, volume 2, pages 819–823 vol.2, Jul 1999.

[16] T MÖLLER and B TRUMBORE. Fast, minimum storage ray triangle intersection. Journal of Graphics Tools, (2):21–28, 1997.

[17] D. Oliva Elorza. Room Acoustics Modeling Using the Raytracing Method: Implementation and Evaluation. 2005.

[18] M. R. Schroeder. New method of measuring reverberation time. The Journal of the Acoustical Society of America, 37(3):409–412, 1965.

[19] A Williams, S Barrus, R K Morley, and P Shirley. An efficient and robust ray-box intersection algorithm. Journal of Graphics Tools, (10):49–54, 2005.

[20] Sessler Gerhard M Xiang, Ning. Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder. 2014.