The music world sat up, shocked, amidst the COVID-19 lockdown when the famous rapper Jay-Z’s company Roc Nation filed a take-down notice against YouTuber Vocal Synthesis. The YouTube channel had created a deepfake video where the popular rapper Jay-Z is seen performing in another rapper Billy Joel’s We Didn’t Start The Fire song and Hamlet’s To Be Or Not To Be. Billy Joel’s persona had been replaced by Jay-Z in the whole music video using artificial intelligence.
With Jay-Z’s deepfake music issue, a gamut of similar synthetic creations is getting attention, raising the quintessential question of copyright infringement, as to where does the copyrights start and where it ends. If Jay-Z is successful in getting the deepfake videos removed from the YouTube channel, then a new legal precedent is being set.
In Jan 2020, it was reported that ByteDance and TikTok had already built their own deepfakes maker, with Tik Tok’s app code referred to it as Face Swap. Though this technology is yet to be made public the app company is already asking users to take a multi-angle biometric scan of their face, then choose a video where they want to add their face and share.
Speech synthesis is not new for the artificial intelligence world. Powerful machine learning techniques can manipulate or generate visual and audio content with a highly deceptive ‘authenticity.’ These machine learning methods involve training generative neural network architectures such as autoencoders, or generative adversarial networks (GANs). Given a training set it can generate new visual or audio content that look ‘almost’ authentic with many realistic characteristics. From talkbots to real-time speech synthesis, these transformations of text to speech or written content to audio material are being used for several purposes.
Deepfake is the portmanteau of ‘deep learning’ and ‘fake’ – synthetic media in which a person in an existing image or video is replaced with someone else’s likeness. Artificial Intelligence can re-create or duplicate what is already existing, which can be used as a deep fake technology. This deep fake technology has progressed to the point where audio and video are synthesized, or in other words, manipulated to the maximum extent.
As AI tunes and retunes its machine learning, the music world is not spared. The technology already creates, composes, and manufactures synthetic music bytes. All it needs is a data set as a training model, as is seen in Jay-Z’s situation. The algorithms gather and analyze large collections of an artist’s songs, identify patterns in the audio data that humans would correlate with that music style, and then use those patterns as training sets to generate new audio with someone else’s persona as the artist.
The phenomenal growth and progress in AI and machine learning might even make it impossible to identify the original. Or, it can create a new technology to counter the deepfakes, in enabling identification of deepfakes from the originals in real-time. While creating and counter-creating these technologies, it leaves an open-ended question as to how to even identify that a music content has been synthesized at all in the digital universe. As of now, the popularity of the artist has brought the concern to the fore. What would happen when lesser known artist’s unique musical compositions that get uploaded are manipulated and shared online amongst the public is something to ponder. Incidentally, the contentious YouTube channel “Vocal Synthesis” also features videos that are described as ‘entirely computer-generated text-to-speech models trained on the speech patterns.’ One of their featured videos contains speeches of six U.S Presidents, with the same claim of being trained from the speech patterns of those Presidents. If these syntheses efforts are taken a step further, it is not impossible to generate a ‘potential’ message or even an order that is supposedly coming from the President himself. In 2016, fake news was a hurdle in the U.S. elections, but the extent of ruckus that today’s deepfake news could create is simply left to our imagination. Such synthetic creations are going to disrupt politics in a big way. Finding efficient ways to identify deepfakes from original ones would ameliorate the problem, but only after whatever damage is done.