世界顶级深度换脸(deepfake)艺术家与他自己创造的怪物角力

岸边露伴
岸边露伴

发布于2019-08-22 17:28来源:原创 0 评论 0 点赞

世界顶级深度换脸(deepfake)艺术家与他自己创造的怪物角力(1)


黎颢的职业生涯一直在完善数字骗局。而如今,他正在努力解决现成的且越来越天衣无缝的欺骗。

今年6月,中国大连——一个半岛上的城市,一个方向从北京向外延伸,另一个方向从朝鲜边境延伸,汇聚到黄海。黎颢站在一个巨大、棱角分明的建筑里,这里很适合成为007反派的巢穴。外面天气闷热,安全措施严密。世界经济论坛年会正在城市中举行。

黎颢的附近,来自世界各地的政客和首席执行官们轮流走进一个摊位。在摊位上,他们笑着看到自己的脸变成了一张名人的脸:李小龙,尼尔·阿姆斯特朗,或者奥黛丽·赫本。这种戏法是实时的,而且几乎完美无缺。

这台引人注目的换脸机器不仅仅是制造来娱乐世界上的富人和权贵。黎颢希望这些有权有势的人能够因此考虑被人工智能篡改的视频——“深度换脸”——可能给他们和其他人带来的后果。

长久以来虚假信息一直是破坏地缘政治的流行工具,社交媒体更是为虚假新闻的传播推波助澜。当假视频像假新闻一样容易制作时,这实际上就保证了它的武器化。想要左右一场选举,毁掉敌人的事业和名誉,或者引发种族暴力? 很难想象有比一段看起来真实的视频更有效的载体了,它像野火一样在Facebook、WhatsApp或Twitter上迅速传播,速度之快让人们无法分辨自己已经被骗。

作为数字伪造的先驱,黎颢担心深度换脸仅仅是个开始。尽管黎颢帮助开创了一个不能永远相信自己的双眼的时代,但他想利用自己的技能来解决一个迫在眉睫的问题——无处不在、近乎完美的视频骗局。

问题是,现在会不会已经太晚了?


改写现实


黎颢不是典型的深度换脸用户。他不会潜伏在Reddit上发布假色情片,也不会把一些著名电影改成由尼古拉斯.凯奇主演。在他的职业生涯中,他一直在开发尖端技术来更容易、更令人信服地伪造人脸。他还在现代大片中篡改了一些世界上最著名的面孔,让数百万人相信一个其实不存在的微笑或眨眼。某天下午他在洛杉矶的办公室里用Skype聊天,不经意间提到威尔·史密斯最近来过这里看一部他正在拍的电影。

演员们经常到黎颢在南加州大学的实验室对自己的样子进行数码扫描。他们被安排在一个球形阵列的灯光和机器视觉摄像机前,以此捕捉他们的面部形状、面部表情、肤色和纹理,精确至单个毛孔的级别。制作电影的特效团队可以处理已经拍摄好的场景,甚至可以在后期制作中为新场景添加一个演员。

这种数字骗局如今在大预算电影中很普遍。背景通常以数字方式呈现,动作场景中将演员的脸贴在特技演员身上也是很常见的。这为电影观众带来了令人惊叹的时刻,像是年轻的莱娅公主在《星球大战外传:侠盗一号》的结尾短暂现身时,尽管扮演莱娅的女演员凯莉.费雪在电影拍摄时已年近60。

让这些特效看起来出色通常需要大量的专业知识和数百万美元。但由于人工智能的进步,现在在一个视频中互换两张脸几乎变得微不足道,不需要比一台笔记本电脑更高级的装备。只要掌握一点额外的技能,你就能让政客、首席执行官或你的仇人说或做任何你想做的事(就像故事开头的视频中,黎颢把埃隆•马斯克的样子换到了我的脸上)。


欺骗的历史


黎颢本人看起来更偏向赛博朋克,而不是日落大道。他的头发被剃成莫霍克式,垂向一边,经常穿着黑色t恤和皮夹克。说话时他总是眨眼,这一奇怪的习惯暴露了他总是在电脑屏幕温暖的灯光下度过深夜。他并不羞于炫耀他的技术才华,或者他的作品。在交谈中,他喜欢拿出智能手机给你看一些新东西。

黎颢在德国萨尔布鲁肯长大,上一代是台湾移民。他在一所法德高中上学,能流利地讲四种语言(法语、德语、英语和普通话)。他还记得自己下决心要努力模糊现实和幻想之间界限的那一刻——1993年,他在看史蒂文·斯皮尔伯格执导的《侏罗纪公园》时,一只巨大的恐龙缓缓进入视野。当演员们呆呆地看着这只电脑生成的怪兽时,当时12岁的黎颢从刚才的画面中意识到科技使之成为了可能(优化表法)。“我意识到现在基本上可以创造任何东西,甚至是不存在的东西。”他回忆道。

黎颢获得了苏黎世联邦理工学院的博士学位。在这所瑞士著名的技术大学里,他的一位导师回忆他是一个聪明的学生,也是一个不可救药的恶作剧者。他的学术论文附带的视频有时包含对他的老师不太恭维的讽刺。

进入南加州大学后不久,黎颢发明了一种面部追踪技术,用于制作动作电影《速度与激情7》中已故演员保罗·沃克的数字化替身。这是一个巨大的成就,因为沃克在电影拍摄半途中因一场车祸去世,事先并没有进行扫描,而他的角色需要出现在非常多的场景中。沃克的两个兄弟在超过200个场景中轮流扮演沃克,黎颢的技术在后期把沃克的脸粘贴到他们的身上。

这部电影的票房收入达到了15亿美元,是第一部数字再造明星占重要戏份的电影。黎颢在谈论视频伪造演变得多出色时提到了沃克的虚拟角色。“连我都分不清哪些是假的,”他摇摇头说。


世界顶级深度换脸艺术家与他自己创造的怪物角力(2)


几乎是你


2009年,在深度换脸出现不到10年之前,黎颢找到了一种实时捕捉人脸的方法,并用它来操纵一个虚拟木偶。过程包括使用最新的深度传感器和新软件将人脸及其表情映射到由可变形的虚拟材料制成的面具上。

最重要的是,这种方法不需要在一个人的脸上添加几十个动作追踪标记——一种追踪面部活动的标准行业技术——就能工作。黎颢促成了这款名叫Faceshift的软件的开发,该软件后来作为大学的衍生公司被商业化。该公司于2015年被苹果收购,其技术被用于开发Animoji软件,让你可以在最新款iphone上变成独角兽或会说话的便便。

黎颢和他的学生们已经发表了几十篇论文,主题包括能够反映全身运动的虚拟替身、高度逼真的虚拟头发以及能够像真实皮肤一样伸展的模拟皮肤。近年来他的团队借鉴了机器学习的进步,尤其是深度学习——一种利用大型模拟神经网络来训练计算机工作的方法。他的研究也被应用于医学,帮助开发追踪体内肿瘤的方法,并对骨骼和组织的特性进行建模。

如今,黎颢在教学、为电影公司提供咨询以及经营一家名为Pinscreen的新公司之间奔波。该公司使用比深度换脸更先进的人工智能来制作虚拟化身。公司推出的应用程序可以在几秒钟内将一张照片变成逼真的3D替身。它使用机器学习算法,这些算法经过训练,可以使用数千张静止图像和相应的3D扫描,将人脸的外观映射到3D模型上。这个过程通过所谓的生成式对抗网络(或称为GANs,并不被大多数深度换脸采用)得到了改进。这意味着,一种算法生成假图像,同时另一种算法判断图像是否为假的,这一过程将逐步提升伪造能力。你可以让你的替身跳愚蠢的舞蹈,试穿不同的衣服,你还可以在智能手机的摄像头里变化表情,来实时控制替身的面部表情。

公司前雇员伊曼·萨迪吉正在起诉Pinscreen,指控该公司在2017年的SIGGRAPH大会上伪造了该技术的演示文稿。《MIT技术新闻》收到了几位专家和SIGGRAPH组织者的来信,他们驳斥了这些说法。

Pinscreen正在与几家知名服装零售商合作,这些零售商将Pinscreen的技术视为一种让人们不用去实体店就能试穿衣服的方法。这项技术在视频会议、虚拟现实和游戏领域也将大有作为。想象一下一个不仅长得像你,连笑起来和跳舞的样子也跟你一样的“堡垒之夜”人物会多有吸引力。

然而在数字化犯傻的背后,有一个重要的趋势:人工智能正迅速让高级图像处理成为智能手机的领域,而非笔记本电脑。由俄罗斯圣彼得堡一家公司开发的FaceApp,通过提供一键换脸功能吸引了数百万用户,最近也引发了争议。你可以在照片上为自己或他人添加微笑、去除瑕疵,或更改年龄和性别。更多的应用程序提供了类似的操作,都只需点击一下按钮。

并不是每个人都对这项技术的普及感到兴奋。专注于视频和人权的非营利组织Witness的主管萨姆•格雷戈里表示,黎颢和其他人“基本上是在尝试制作单个图像、移动设备和实时的深度换脸品”。“这种威胁级别令我担心,因为它变得不那么容易控制,而且更容易被一些演员接触到。”

幸运的是,大部分的深度换脸产品看起来还是有点假,闪烁的脸、摇摇晃晃的眼睛或奇怪的肤色都很容易被识破。但正如专家能够消除这些缺陷一样,人工智能的进步也有望自动消除这些缺陷,使伪造视频变得更简单,也更难被发现。

尽管黎颢在数字伪造方面走在行业前列,但他也担心潜在的危害。“我们正面临着一个问题。”他说。

摘自:MIT技术新闻08.16

The world’s top deepfake artist is wrestling with the monster he created(1)

Hao Li has spent his career perfecting digital trickery. Now he’s working to confront the problem of increasingly seamless off-the-shelf deception.

It’s June in Dalian, China, a city on a peninsula that sticks out into the Yellow Sea a few hundred miles from Beijing in one direction and from the North Korean border in the other. Hao Li is standing inside a cavernous, angular building that might easily be a Bond villain’s lair. Outside, the weather is sweltering, and security is tight. The World Economic Forum’s annual conference is in town.

Near Li, politicians and CEOs from around the world take turns stepping into a booth. Inside, they laugh as their face is transformed into that of a famous person: Bruce Lee, Neil Armstrong, or Audrey Hepburn. The trick happens in real time, and it works almost flawlessly.

The remarkable face-swapping machine wasn’t set up merely to divert and amuse the world’s rich and powerful. Li wants these powerful people to consider the consequences that videos doctored with AI—“deepfakes”—could have for them, and for the rest of us.

Misinformation has long been a popular tool of geopolitical sabotage, but social media has injected rocket fuel into the spread of fake news. When fake video footage is as easy to make as fake news articles, it is a virtual guarantee that it will be weaponized. Want to sway an election, ruin the career and reputation of an enemy, or spark ethnic violence? It’s hard to imagine a more effective vehicle than a clip that looks authentic, spreading like wildfire through Facebook, WhatsApp, or Twitter, faster than people can figure out they’ve been duped.

As a pioneer of digital fakery, Li worries that deepfakes are only the beginning. Despite having helped usher in an era when our eyes cannot always be trusted, he wants to use his skills to do something about the looming problem of ubiquitous, near-perfect video deception.

The question is, might it already be too late?

Rewriting reality

Li isn’t your typical deepfaker. He doesn’t lurk on Reddit posting fake porn or reshoots of famous movies modified to star Nicolas Cage. He’s spent his career developing cutting-edge techniques to forge faces more easily and convincingly. He has also messed with some of the most famous faces in the world for modern blockbusters, fooling millions of people into believing in a smile or a wink that was never actually there. Talking over Skype from his office in Los Angeles one afternoon, he casually mentions that Will Smith stopped in recently, for a movie he’s working on. 

Actors often come to Li’s lab at the University of Southern California (USC) to have their likeness digitally scanned. They are put inside a spherical array of lights and machine vision cameras to capture the shape of their face, facial expressions, and skin tone and texture down to the level of individual pores. A special-effects team working on a movie can then manipulate scenes that have already been shot, or even add an actor to a new one in post-production.

Such digital deception is now common in big-budget movies. Backgrounds are often rendered digitally, and it’s common for an actor’s face to be pasted onto a stunt person’s in an action scene. That’s led to some breathtaking moments for moviegoers, as when a teenage Princess Leia briefly appeared at the end of Rogue One: A Star Wars Story, even though the actress who had played Leia, Carrie Fisher, was nearly 60 when the movie was shot.

Making these effects look good normally requires significant expertise and millions of dollars. But thanks to advances in artificial intelligence, it is now almost trivial to swap two faces in a video, using nothing more powerful than a laptop. With a little extra knowhow, you can make a politician, a CEO, or a personal enemy say or do anything you want (as in the video at the top of the story, in which Li mapped Elon Musk's likeness onto my face). 

A history of trickery

In person, Li looks more cyberpunk than Sunset Strip. His hair is shaved into a Mohawk that flops down on one side, and he often wears a black T-shirt and leather jacket. When speaking, he has an odd habit of blinking in a way that betrays late nights spent in the warm glow of a computer screen. He isn’t shy about touting the brilliance of his tech, or what he has in the works. During conversations, he likes to whip out a smartphone to show you something new. 

Li grew up in Saarbrücken, Germany, the son of Taiwanese immigrants. He attended a French-German high school and learned to speak four languages fluently (French, German, English, and Mandarin). He remembers the moment that he decided to spend his time blurring the line between reality and fantasy. It was 1993, when he saw a huge dinosaur lumber into view in Steven Spielberg’s Jurassic Park. As the actors gawped at the computer-generated beast, Li, then 12, grasped what technology had just made possible. “I realized you could now basically create anything, even things that don’t even exist,” he recalls.

Li got his PhD at ETH Zurich, a prestigious technical university in Switzerland, where one of his advisors remembers him as both a brilliant student and an incorrigible prankster. Videos accompanying academic papers sometimes included less-than-flattering caricatures of his teachers. 

Shortly after joining USC, Li created facial tracking technology used to make a digital version of the late actor Paul Walker for the action movie Furious 7. It was a big achievement, since Walker, who died in a car accident halfway through shooting, had not been scanned beforehand, and his character needed to appear in so many scenes. Li’s technology was used to paste Walker’s face onto the bodies of his two brothers, who took turns acting in his place in more than 200 scenes.

The movie, which grossed $1.5 billion at the box office, was the first to depend so heavily on a digitally re-created star. Li mentions Walker’s virtual role when talking about how good video trickery is becoming. “Even I can’t tell which ones are fake,” he says with a shake of his head. 

The world’s top deepfake artist is wrestling with the monster he created(2)

Virtually you

In 2009, less than a decade before deepfakes emerged, Li developed a way to capture a person’s face in real time and use it to operate a virtual puppet. This involved using the latest depth sensors and new software to map that face, and its expressions, to a mask made of deformable virtual material. 

Most important, the approach worked without the need to add dozens of motion-tracking markers to a person’s face, a standard industry technique for tracking face movement. Li contributed to the development of software called Faceshift, which would later be commercialized as a university spinoff. The company was acquired by Apple in 2015, and its technology was used to create the Animoji software that lets you turn yourself into a unicorn or a talking pile of poop on the latest iPhones.

Li and his students have published dozens of papers on such topics as avatars that mirror whole body movements, highly realistic virtual hair, and simulated skin that stretches the way real skin does. In recent years, his group has drawn on advances in machine learning and especially deep learning, a way of training computers to do things using a large simulated neural network. His research has also been applied to medicine, helping develop ways of tracking tumors inside the body and modeling the properties of bones and tissue.

Today, Li splits his time between teaching, consulting for movie studios, and running a new startup, Pinscreen. The company uses more advanced AI than is behind deepfakes to make virtual avatars. Its app turns a single photo into a photorealistic 3D avatar in a few seconds. It employs machine-learning algorithms that have been trained to map the appearance of a face onto a 3D model using many thousands of still images and corresponding 3D scans. The process is improved using what are known as generative adversarial networks, or GANs (which are not used for most deepfakes). This means having one algorithm produce fake images while another judges whether they are fake, a process that gradually improves the fakery. You can have your avatar perform silly dances and try on different outfits, and you can control the avatar’s facial expressions in real time, using your own face via the camera on your smartphone.

A former employee, Iman Sadeghi, is suing Pinscreen, alleging it faked a presentation of the technology at the the SIGGRAPH conference in 2017. MIT Technology Review has seen letters from several experts and SIGGRAPH organizers dismissing those claims.

Pinscreen is working with several big-name clothing retailers that see its technology as a way to let people try garments on without having to visit a physical store. The technology could also be big for videoconferencing, virtual reality, and gaming. Just imagine a Fortnite character that not only looks like you, but also laughs and dances the same way.

Underneath the digital silliness, though, is an important trend: AI is rapidly making advanced image manipulation the province of the smartphone rather than the desktop. FaceApp, developed by a company in Saint Petersburg, Russia, has drawn millions of users, and recent controversy, by offering a one-click way to change a face on your phone. You can add a smile to a photo, remove blemishes, or mess with your age or gender (or someone else’s). Dozens more apps offer similar manipulations at the click of a button. 

Not everyone is excited about the prospect of this technology becoming ubiquitous. Li and others are “basically trying to make one-image, mobile, and real-time deepfakes,” says Sam Gregory, director of Witness, a nonprofit focused on video and human rights. “That’s the threat level that worries me, when it [becomes] something that’s less easily controlled and more accessible to a range of actors.”

Fortunately, most deepfakes still look a bit off. A flickering face, a wonky eye, or an odd skin tone make them easy enough to spot. But just as an expert can remove such flaws, advances in AI promise to smooth them out automatically, making the fake videos both simpler to create and harder to detect. 

Even as Li races ahead with digital fakery, he is also troubled by the potential for harm. “We’re sitting in front of a problem,” he says. 

Excerpted from MIT Technology Review 08.16

 

 

 

 

 


相关标签:

  • 深度
  • 人脸
  • 电影

发布你的看法