Every time I attend conference calls where people dial in from all over the world, my ears are usually suffering if there's more than one speaker. It's for various reasons:
- The volume different people have on the call is widely different. So while you have to listen very closely to understand some participants, others can be heard so loudly that your ear-drums almost pop out when the call goes from one extreme to the other.
- Most of these issues are caused not only because different telephone networks seem to interconnect on different volume levels. It's also because every participant has a different phone, some use the hands-free mode and, for international calls, some countries use a different voice codec that at some point is converted into the codec used in the country of the phone bridge.
- Add to that some echo when people are not muted, background noise such as babies crying, dogs barking and cars passing by and the perfect storm approaches.
- There are always people on a conference calls who are on the move and their mobile phones often try much too hard to filter out the background noise, resulting in shriek peaks and hard to understand participants.
- Automatic announcements that people are leaving and and re-joining the conference call due to patchy network coverage every couple of minutes doesn't make things much easier, either.
And on top of all of that put non-native speakers with sometimes heavy accents and after an hour your (or at least my) head starts spinning. So what's the solution to this?
I think it's wideband audio conference calls with heavy pre-processing of the individual call legs. As you can't get that over the standard telephone network, it must be Internet based, maybe, and that's already a big compromise, with telephone dial-in for those who for one reason or another can't access the Internet (on the move, stupid company firewall, etc.). I guess anyone who've once enjoyed the difference between wideband and narrowband speech knows what I am talking about. And on top of that the conference server or the clients should do some intelligent pre- or post-processing of the signal coming from the different participants. Is it really that hard to have everybody's voice arriving at the same level?
Anyone aware of such a system?