Develop uncovers the inner workings of Speech Graphics' new lip-syncing tool for devs
Scottish technology outfit Speech Graphics exists as a thriving service company specialising in audio-driven lip syncing.
A relative newcomer in the global games industry, it has already built quite a name for itself with a run of award nominations and wins, and now it has bold plans to repackage its eponymous technology as a tool.
“We feel a software company is more scalable and ultimately it’s where we want to go,” confirms Speech Graphics CEO Dr Gregor Hofer.
“By offering a tool we can reach more customers with our solution. As to what developers prefer, some say they are happy to outsource to animation services while others prefer the control offered by a tool that can be integrated directly into their production pipeline. We feel there is a place for both models.
“Starting with the service model simply allows us to use interactions with game developers to inform our development program, to ultimately produce a technology that will provide exactly what they want.”
Speech Graphics may not be the first service provider to move on the middleware space, but to hear such a clear-cut plan certainly isn’t typical so early in the life of a games technology company, regardless of discipline or specialty.
And just because the team’s long term plan emphasises fleshing out a tool, it doesn’t mean that Hofer and his colleagues are holding back with their dedication to developing and offering a fully featured service.
The company’s solution is centred around a model that sees clients provide audio and facial models, which Speech Graphics work on with in-house software tools to produce synchronised animation.
“We use ten, proprietary high-level animation channels to drive bones or blendshapes in the same manner,” explains Hofer.
“These channels are based on our analysis of the way the human face moves during speech. Output is provided for any of the popular 3D animation packages.
"The solution works equally well in any language, which means our service is also ideal for re-animating faces in different languages for localisation. Our animation solution has already been integrated and proven in production pipelines at major game development and motion capture studios.”
According to Speech Graphics’ senior members, the tech behind the service betters equivalent audio-driven facial animation systems by pooling speech recognition and graphics experience.
In doing so it builds a solution based on muscle dynamics that predicts how human facial muscles move to produce speech sounds. It’s a highly specialised approach, and one its creators insist betters the established model.
“Previous audio-driven solutions rely on very simple lip-sync algorithms, and quality has not changed much since the 1990s,” claims Speech Graphics’ CTO Michael Berger.
“With current polygon counts and character realism this lack of improvement is increasingly out of place. Speech is one of the most difficult things to animate convincingly, due to that amazing combination of fluidity and accuracy in natural speech.”
For now, that difficulty and Speech Graphics’ technology’s subsequent intricacy means that currently triple-A studios remain the service’s most relevant customers.
Those bigger studios have a greater hunger – and perhaps need – for visual polish. And in particular it is the triple-A games that, like Skyrim and Star Wars: The Old Republic, shower the player with spoken dialogue that Hofer sees as best suited to the company’s ability.
“However, mobile and browser-based games are fast improving in graphic quality,” offers Hofer.
“Our solution will be ideal for that market as well, especially when we release a software product that can be used by a large number of developers.
“In addition to game developers, we also work with other animation companies that service the games industry.”
Speech Graphics is also forming partnerships with motion capture studios with a view to offer joint services based on its relevant technologies.
THE SOUND OF POTENTIAL
In working its service into a tool, and through collaboration with companies in related specialties, Speech Graphics is to bring facial animation to lower-budget games; a fact that will chime with those who have long predicted the emergence of triple-A on mobile and browser.
And as the convergence between mobile, social, and console games increases, Speech Graphics is striving to keep pace with the technological frontier so as to make sure its services and technology match current needs.
Hofer highlights the on-going content delivery required for online games as an example of an area closely watched by he and his colleagues, and something the team can plug into by automatically generating animation for incoming speech content.
“We are also following trends in facial modelling, as we believe even the most accurate dynamics are lost on poorly rigged facial models,” says Berger.
“Part of our expertise lies in knowing how the face naturally deforms during speech, so we feel we can have a major impact on the industry by guiding development of facial models that are well-suited to speech."
Despite all the enthusiasm and optimism on Speech Graphics’ behalf, challenges remain in turning a service into a tool, even for a company so confident in its product.
As Speech Graphic’s current in-house tech is standalone software, managing import and export vis-à-vis the major animation packages in order to interface with clients, has proved a trial that has taken a huge effort on the company’s behalf.
“A bigger obstacle was dealing with client facial models that are not suited to speech, which required a good deal of effort until we arrived at an ideal method for facial model adaptation,” admits Berger.
“This problem must also be handled in a user-friendly way for any software tool we subsequently release, whether standalone or plug-in. In addition, in GUI design for these tools we need to visualise important information about the speech signal that can be used to optimise output.”
Looking forward, there are other challenges, but most are not Speech Graphic’s sole responsibility. Together, a collective pan-industry effort is underway to address the fact that, according to Hofer, facial animation remains some way behind other aspects of games technology.
“This is partly because of the complexity of faces, but also because of how we perceive them,” Hofer suggests.
“As humans we innately scrutinise faces and respond on a very deep level to facial expressions. A title like L.A. Noire shows that you can make a successful game where some of the main game elements centre around our innate ability to read faces."
Berger adds: “However, the push for greater realism in faces – combined with a continued rise in face-centred spoken dialogue – makes it crucial for developers to acquire new technologies for facial modelling and animation pipelines.
“We believe there will be continued convergence between facial modelling, procedural facial animation and the use of data-driven techniques such as 3D or 4D scanning.”
It’s an enchanting vision of the future, and one that Speech Graphics is doing everything it can to be a part of, be it as a tool or as a service to artists and animators.