I was asked to give the Closing Keynote for the 2013 Audio Engineering Conference this week at Javits in NYC. The convention drew over 18,000 attendees. For my topic I selected “The Future of Audio Production 2020-2050″.
Added: CNET reported here.
I spoke about the demise of “physical” post-production. The lecture was accompanied by around forty proprietary research graphs, not included here. By 2050, perhaps even as early as 2030, I showed (with extensive data) that most media post-production will be performed in virtuality, where every functional piece of equipment — every knob, fader, switch, and patch point — will be visible and controllable entirely in virtual space. This paradigm will encompass film editing, sound and music editing, game production, mixing, mastering, and just about any type of aural-visual post-production and delivery.
By 2040, we’ll have mostly abandoned the mouse. Physical touchscreens will be largely obsolete. There will be far fewer physical media objects — such as external audio monitors, keyboards, trackballs, personal desktop video monitors, and so forth. Save for a quiet room, a comfortable chair, and innocuous motion trackers, the physical “production studio” will largely be a thing of the past. Certainly, a number of “legacy hardware rooms” will still exist, but they will be dying curiosities.
We are moving from a hand-held culture to a head-worn culture. Physicality will be replaced with increasingly sophisticated head worn immersion devices. Most of these basic changes will be well in place by 2035. And by 2050, head-worn audio and visual “fully spherical” realism will be nearly indistinguishable from “true space”. Audio will be mixed for a true three-dimensional sound space (in fact, we are doing this now). Visual production will require three axes of reality (also happening today).
During this transition, perhaps the only remaining piece of CEH (clunky external hardware) will be sub-woofers, which cannot be emulated with a headworn device. By 2025-2030, today’s emerging object-oriented 3D audio environments (Atmos, Neo, Auro, etc.) will be commodity delivery formats. By 2025-2035, head motion tracking and hand gestural tracking will also be inexpensive, matured commodities.
A single desktop computer in 2050 will be equivalent to roughly 10 billion human brains working in parallel, so media processing power is no longer a bottleneck. The 2050 Internet will be hosting roughly 10,000,000,000,000,000,000,000 bits of data, per second (10 sextillion bits/s).
Production and post-production studios of 2030-2040 will give us our familiar working tools: mixing consoles, outboard equipment, patch bays, audio and visual monitors … or their DAW-based equivalents. The difference is that all of this “equipment” will live in virtual space. When we don our VR headgear, anything we require for media production is there “in front of us” with lifelike realism. It’s a Holodeck on your head. Headworn reality.
Matured gestural control (2030-2035) allows us to reach out and control anything in the production chain. Efficiency will be improved with scalable depth-of-field. Haptic touch (emulated physical feedback) will add an extra layer of realism (2030-2040), but it’s probably not necessary for media production emulation. Anything in the virtual room can be changed with one voice or gestural command. Don’t like the sound of that Neve 8086 console? Install the Beatles EMI Abbey Road console. A ten second operation.
But why stop there? Let’s dream bigger. Call up a complete AI symphony orchestra that fills your immersive vision stage. Call up a great concert hall (let’s try the Concertgebouw. Hmm, that’s a little too swimmy. Let’s try Boston Symphony Hall). Add a 200 voice choir. Add Yo Yo Ma soloing with his carbon fiber cello. You’re there in front, conducting and refining the orchestra with gestural and voice commands, making refinements to the score and performance, until it becomes exactly as you want it. We achieve a complete Virtual Audio Workstation, or more precisely a Virtual Media Workstation which can be tailored to fit any creative production goal.
The future of audio, music, film making, game design, TV, industrial apps – any creative media construction, from inception to post-production — becomes truly boundless and limited only to our imagination. Personally, I dream about being able to think of music directly into a recording system: a non-invasive brain-machine interface. It turns out that this dream is moving from science fiction to reality (link, link, link). And if we assume a two-year doubling period for cortex sensing resolution, by the early 22nd century our non-invasive brain interfaces will be about 20 orders of magnitude more powerful than today.
But will that give us the ability to think music and visual art directly into our computers? Or does it simply blur the line between our brains and our computers, so that the entire paradigm of augmented thinking and collective knowledge is radically shifted? At that point … when we have billions of devices globally networked, and each device is trillions of times smarter than the combined intelligence of all humanity … what will our species become? What will our collective thought processes look like?
Personally, I think these kinds of paradigm-shifting social questions are coming sooner than we may realize. And I think there’s both great promise and great risk with the technologies that are emerging. Or as my wife reminds me before my lectures, teach them that the heart is always more important than our technology. Or as Bryan Stevenson said, “we will not be judged by our technology, intellect, or reason. Ultimately, the character of a society will be judged not by how they treat the powerful, but by how they treat the poor.”
Nevertheless, somewhere in the future, we will create human-to-machine interfaces that respond and adapt directly to our personal imagery and creative ideas; so that one day just about anything we can imagine will become our art.