In the past, it was almost impossible to produce high-quality fakes of video or audio material. The dynamic content presents the challenge of consistently faking at least 16,000 data points every second. However, the use of AI methods now makes this process almost child’s play. Open-source versions of the software required to produce convincing fakes automatically are freely available online.
How exactly is it done? As with other comparable machine learning models, deepfake systems are trained using data acquired online. Architectures like Tacotron and Wav2Lip (Wang, Yuxuan, et al., Shen, Jonathan, et al., Prajwal, K. R., et al.) enable the construction of neural networks that combine any sentence spoken by a target person with a corresponding facial expression and that person’s typical intonation. It is precisely these neural networks to which the “deep” in “deepfakes” refers. Around 30 minutes of suitable audio or video material is all that is required.
Deepfakes entail new risks
The risks associated with deepfakes are considerable. In theory, any one of us runs the risk of transactions or contracts being concluded in our name online using a faked voice or videos — as long as sufficient audio or video material is available. Companies can also suffer damage if employees are tricked into fraudulent behavior using faked audio messages. Precisely this happened to an energy company based in Great Britain, when its CEO transferred a six-figure sum of money, seemingly at the bidding of the chairperson of its German parent company — but in reality using a voice cloned by a machine (Forbes).
For the media landscape, the ability to manipulate statements made by politicians and influential decision-makers presents a particular challenge. There is often a wealth of audio and video content available for public figures like this, which provides sufficient AI training material for the creation of deepfakes. As a result, virtually any statement can be placed in the mouths of high-ranking politicians, using footage that both looks and sounds authentic.
Beating deepfakes at their own game
Although AI makes deepfakes possible in the first place, it can also be a key tool in exposing manipulated audio and video materials. And it is here where the Fraunhofer Institute for Applied and Integrated Security AISEC comes into play. IT experts in the Cognitive Security Technologies (CST) research department are hard at work creating systems for the reliable, automated recognition of deepfakes. They are also investigating methods to improve the robustness of systems that evaluate video and audio material.