April 28, 2021

HBM146: Theodora

April 28, 2021/ Jeff Emtman

Computer generated text projected on a computer generated waves. Image by Jeff Emtman.

How does a computer learn to speak with emotion and conviction?

Language is hard to express as a set of firm rules. Every language rule seems to have exceptions and the exceptions have exceptions etcetera. Typical, “if this then that” approaches to language just don’t work. There’s too much nuance.

But each generation of algorithms gets closer and closer. Markov chains were invented in the 1800’s and rely on nothing more than basic probabilities. It’s a simple idea, just look at an input (like a book), and learn the order in which words tend to appear. With this knowledge, it’s possible to generate new text in the same style of the input, just by looking up the probability of words that are likely to follow each other. It’s simple and sometimes half decent, but not effective for longer outputs as this approach tends to lack object permanence and generate run-on sentences. Markov models are used today in predictive text phone keyboards, but can also be used to predict weather, stock prices, etc.

There’ve been plenty of other approaches to language generation (and plenty of mishaps as well). A notable example is CleverBot, which chats with humans and heavily references its previous conversations to generate its results. Cleverbot’s chatting can sometimes be eerily human, perfectly regurgitating slang, internet abbreviations, obscure jokes. But it’s kind of a sly trick at the end of the day, and, as with Markov chains, Cleverbot’s AI still doesn’t always grasp grammar and object permanence.

In the last decade or two, there’s been an explosion in the abilities of a different kind of AI, the Artificial Neural Network. These “neural nets” are modelled off the way that brains work, running stimuli through their “neurons” and reinforcing paths that yield the best results.

The outputs are chaotic until they are properly “trained.” But as the training reaches its optimal point, a model emerges that can efficiently process incoming data and spit out output that incorporates the same kinds of nuance, strangeness, and imperfection that you expect to see in the natural world. Like Markov chains, neural nets have a lot of applications outside language too.

But these neural networks are complicated, like a brain. So complicated, in fact, that few try to dissect these trained models to see how they’re actually working. And tracing it backwards is difficult, but not impossible.

If we temporarily ignore the real risk that sophisticated AI language models pose for societies attempting to separate truth from fiction these neural net models allow for some interesting possibilities, namely extracting the language style of a large body of text and using that extracted style to generate new text that’s written in the voice of the original text.

In this episode, Jeff creates an AI and names it “Theodora.” She’s trained to speak like a presenter giving a Ted Talk. The result varies from believable to utter absurdity and causes Jeff to reflect on the continued inability of individuals, AI, and large nonprofits to distinguish between good ideas and absolute madness.

Three bits of raw output from Theodora. These were text files were sent to Google Cloud’s TTS service for voicing.

On the creation of Theodora: Jeff used a variety of free tools to generate Theodora in the episode. OpenAI’s Generative Pre-trained Transformer 2 (GPT-2) was turned into the Python library GPT2 Simple by Max Woolf, who also created a tutorial demonstrating how to train the model for free using Google Colab. Jeff used this tutorial to train Theodora on a corpus of about 900 Ted Talk transcripts for 5,000 training steps. Jeff then downloaded the model locally and used JupyterLab (Python) to generate new text. That text was then sent to Google Cloud’s Text-To-Speech (TTS) service where it was converted to the voice heard on the episode.

Producer: Jeff Emtman
Music: Liance

James Li aka. “Liance.” Photo by Alex Kozobolis

This Painting Doesn't Dry album art (4000 x 4000).jpg — Sponsor: Liance

Independent musician James Li has just released *This Painting Doesn’t Dry*, an album about the relationship between personal experiences and the story of humanity as a whole.
James made this album while he anxiously watched his homeland of Hong Kong fall into political crisis.

Listen on Spotify

November 27, 2019

HBM125: Deepfaking Nixon

November 27, 2019/ Jeff Emtman

There’s a beautifully written speech that was never delivered. Written for President Richard Nixon by Bill Safire, the speech elegizes astronauts Buzz Aldrin and Neil Armstrong of Apollo 11, who’d become stuck on the moon, and were left to die there. In reality, Buzz and Neil made it home safely, but this contingency speech was written anyways, just in case. Sometimes it’s called The Safire Memo and is sometimes called In Event of Moon Disaster.

The latter title share its name with an installation that’s (as of publish date) on display for the first time at IDFA in the Netherlands. This project by Francesca Panetta and Halsey Burgund explores an alternate past where Aldrin and Armstrong don’t make it home from the moon. The film portion of the installation heavily features a reading of The Safire Memo by a computer generated version of President Nixon sitting in the Oval Office, reading from notes, making all the familiar facial expressions, sharing the same vocal tics, presidential timbre, and some of the Nixonian je ne sais quoi that makes the fake nearly believable.

But it’s not Nixon. And it’s not entirely accurate to say it’s an actor. It’s a kind of mix of the two, a synthetic Nixon generated by a booming form of artificial intelligence called “deep learning” which creates mathematical models of complex systems, like speech. Lewis Wheeler (the actor tasked with providing the voice of Nixon) did not have to imitate Nixon’s voice, only provide a proper pacing an intonation. From there, the artists hired several companies (including Re-Speecher and Vocal ID) trained a computer model to translate Lewis’s voice into Nixon’s.

Excerpt from the installation In Event of Moon Disaster by Francesca Panetta and Halsey Burgund. This video is a deepfake.

In Event of Moon Disaster on display at IDFA in The Netherlands. Photo by Francesca Panetta. — *In Event of Moon Disaster* on display at IDFA in The Netherlands. Photo by Francesca Panetta.

This kind of deep-learned fakery (called “deepfakes”) currently usually falls somewhere in the uncanny valley—the tech is good enough to get create a strong impersonation of a voice, but one that sounds still a bit mechanical, or metallic. This won’t be the case for long though, as more and more convincing deepfake voices emerge with each generation of new code.

And on the visual front, current video deepfakes are often so good as often pass the gut check of credibility. This may have been most famously demonstrated in a Buzzfeed article where comedian Jordan Peele impersonates President Obama’s voice and a video deepfake moves his face along with the spoken words.

With the 2020 presidential elections looming, it seems almost inevitable that deepfakes will enter the media fray that’s meant to discredit political enemies, creating scandals that never happened. And outside of politics, deepfake pornographers take up the task of swapping pornographic actresses’ faces with those of celebrities or the faces of female journalists they seek to discredit.

On this episode of Here Be Monsters, Francesca and Halsey tell producer Jeff Emtman that deepfakes aren’t going to rupture society. We’ve dealt with this before, whether it’s darkroom manipulations or photoshop, societies eventually learn how to detect deception. But the adjustment period can be rough, and they hope that In Event of Moon Disaster will help educate media consumers on the danger of taking media at face value, regardless of whether it’s deepfakes or just old-fashioned photo mis-captioning.

Also on this episode, Ahnjili Zhuparris explains how computers learn to speak, and we listen to some audio examples of how computer voices can fail, using examples from the paper Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis. Also heard: a presidential parody deepfake from user Stable Voices on Youtube.

Producer: Jeff Emtman
Editor: Bethany Denton
Music: The Black Spot

April 24, 2019

HBM117: Grave Oversight

April 24, 2019/ Bethany Denton

Fire burning near Abyei town, composited with a shadow of a satellite for HBM by Jeff Emtman. Source image by DigitalGlobe — *Fire burning near Abyei town, composited with a shadow of a satellite for HBM by Jeff Emtman. Source image by* *DigitalGlobe*

Sudan has been involved in ongoing civil wars since 1983. The wars were about religion, culture and resources. By 2005, approximately two million civilians had died. In 2011, the southern part of the country voted to secede from the north, creating the new country of South Sudan. But there were still three regions that were claimed by both north and south: Abyei, Blue Nile, and South Kordofan. These regions are rich in oil and have fertile farmlands, so politicians and humanitarians predicted there would be violence following the secession. Civilians in these regions, mostly farmers and shepherds, would be caught in the middle.

Content Note:
Discussion of genocide

Nathaniel Raymond is a human rights investigator. He was looking into an alleged massacre in Afghanistan when he was introduced to the idea of using satellite imagery for humanitarian purposes. At that time, satellite images were sometimes used for documenting force swells and finding the locations of mass graves. But Nathaniel wondered if he could figure out a way to use satellite imagery proactively; what if he could figure out a way to see an attack coming and sound an alarm before anyone got hurt?

Nathaniel wasn’t the only one who had this idea. Actor George Clooney had also been researching ways to use satellites as “anti-genocide paparazzi” in Sudan through an organization he co-founded called The Enough Project. The Enough Project and the Harvard Humanitarian Initiative and others sponsored the project. The Satellite Sentinel Project partnered with the private satellite imagery company DigitalGlobe, who gave the SSP permission to point some of their satellites where they pleased and take pictures. By December 2010, the Satellite Sentinel Project was in full swing, inventing a new methodology for analyzing satellite imagery of active conflict in real time.

The mission of the Satellite Sentinel Project was threefold:

Warn civilians of impending attacks,
document the destruction in order to corroborate witness testimony in later investigations, and
potentially dissuade the governments in both Sudan and South Sudan from returning to war in the first place.

“We wanted to see if being under surveillance would change the calculus… If they knew we were watching, would they not attack?” The Satellite Sentinel Project would release their reports at midnight so that they would be available in time for morning news in East Africa.

Critics of Satellite Sentinel Project say that South Sudan shouldn’t be a playground for experimental humanitarian efforts bankrolled by a foreign movie star. And Nathaniel says the critiques are valid. “It was always a Hail Mary pass. And, we must be clear, it was always an experiment, which in and of itself is problematic. But… what else are we going to do, sit on our hands?”

Satellite Sentinel Project released a total of 28 reports over 18 months. The methodology Nathaniel and his team developed is still being taught at the Harvard Humanitarian Initiative.

Today Nathaniel Raymond is a lecturer on Global Affairs at Yale’s Jackson Institute. Special thanks to Ziad al Achkar, one of Nathaniel’s colleagues from Satellite Sentinel Project that helped us with this episode.

Producer: Garrett Tiedemann
Editors: Bethany Denton and Jeff Emtman
Music: Garrett Tiedemann

Nathaniel Raymond, former Director of Operations at the Sentinel Satellite Project. Photo by Jeff Emtman.

Nathaniel Raymond’s 2018 talk on Satellite Sentinel Project at the EyeO Festival 2018.

PBS Newshour Reporting on Satellite Sentinel Project’s documentation of burned villages in South Sudan.

October 11, 2017

HBM082: MI5 MI6 KGB CIA

October 11, 2017/ Bethany Denton

John Barner spent his entire childhood fiddling with his dad’s shortwave radio, picking up transmissions from all over the world. He like the way the sounds crackled, and the voices speaking foreign languages, and the eerie whine of transmissions coming in and out of static.

Content Note: Language

One night John got a phone call from one of his friends who also had a shortwave radio. “I think I just found spy stuff,” John’s friend said, “come over.”

John and his friends had found a number station, coded transmissions broadcast on unlicensed frequencies. Number stations are believed to be a form of espionage where intelligence agencies broadcast encrypted messages to field operatives. But no government has claimed responsibility for their existence.

Number stations come in many forms. Some are beeps or sustained tones. Some are repeated bars of familiar folk songs. The rest are strings of numbers and words from the phonetic alphabet.

Spectrograms of suspected number stations. ⤴

John, like countless other shortwave enthusiasts, has been captivated by the mystery ever since discovering them as a teenager. He used to try to crack the coded messages, thinking he’d stumbled on the X-Files.

Henry Cooke, a technologist and number stations enthusiast, believes that its the indecipherable code that makes number stations so alluring. He’s found internet forums dedicated to tracking number stations broadcasts and even videos of radio sleuths claiming to have found broadcast locations. Henry believes this to be a type of modern folklore; that shortwave enthusiasts trading theories about the origins and meaning behind the number stations are almost like telling ghost stories around the campfire.

Garrett Tiedemann produced this episode. Garrett also produces the podcast The White Whale. Bethany Denton edited this episode with help from Jeff Emtman and Nick White.

Number Station recordings courtesy of The Conet Project. Full archive can be found here.

Music from John Barner’s new album, Shadow Time.

HBM TEE002 -11.jpg — New HBM Shirts

$18 + Shipping

Get Yours!

Here Be Monsters

Content Note:Discussion of genocide

Content Note: Language

Spectrograms of suspected number stations. ⤴

Content Note:
Discussion of genocide