If you find yourself feeling astounded by AI’s ever-growing capacity, then get ready because we’re about to blow your socks off. Introducing GPT-3 and DALLׂᐧE, the latest innovation in natural language processing and neural networking. Before we look at the impact of these technologies let’s take a quick deep dive into what they are and how they work.
What is GPT-3?
Simply put, GPT-3 is a relatively new AI innovation, more specifically a language model, that has the best ability to create content that has a language structure. The initials stand for Generative Pre-trained Transformer and the three added at the end is to denote that it is the third version created by OpenAI, a research institution co-founded by Elon Musk. GPT-3 generates text using algorithms that are pre-trained. This means that the AI has already been given all the data it needs to carry out the task it’s been asked to do.
More specifically, the algorithms have been provided with around 570 Gigabytes of text information gathered by crawling through the internet, along with texts selected by OpenAI such as articles from Wikipedia. From there, GPT-3 can create anything that has a language structure. It can answer questions, write essays, develop summaries of longer text items and it can even translate languages.
Surprisingly, the content it creates is anything but rigid and computer-like as this article featured in the Guardian illustrates. It does start off a little sticky but by the end, this writer began to feel a little uneasy about what job security might look like in the future. Now, wait until you read about how it has revolutionized image creation with DALL·E.
What is DALL·E?
DALL·E is another AI program developed by OpenAI. Specifically, it is a neural network that has the ability to create images from text captions “for a wide range of concepts expressible in natural language” as the by-line on the website states.
When they say wide they mean wide. The best way to see it is to play with the text captions yourself here. It really is quite incredible and its applications are far from simple. As the article linked above explains, these captions can be as simple as asking it to render images of a sign for a storefront as pictured below:
To more complex text captions include things like creating an emoji of a baby penguin wearing a blue hat, red gloves, green shirt and yellow pants.
You can play around and change the prompts. Then within seconds, you get a variety of choices for the image you’re looking for. For instance, we chose to look at how it would create a painting of a fox on a mountain at sunrise and we were treated to several types and styles of paintings.
It’s wild and I’m pretty sure all those creatives in the visual arts are beginning to feel as nervous as I was when I learnt about GPT-3. So it seems pertinent to look at what this may do for the creative industry as a whole.
The creative industry is set to change
But, not quite yet. Currently, DALL·E images are not yet able to produce the high-quality images that are expected for professional artwork, especially when it comes to advertising and other design artforms. In fact, they’re really more like caricatures that are set to a 256×256 ratio and would ultimately need to be reworked by professional artists or designers. However, it’s an excellent tool for brainstorming and prototyping new concepts.
Looking further into the future, DALL·E will most certainly disrupt the creative industry in big ways as the technology improves but to date, outputs from GPT-3, which has 15x the parameters of DALL·E, still have many grammatical and factual errors that need to be corrected by its human counterpart. So even if DALL·E’s manages to improve its file sizes and crispness of the image, there will still need to be a human eye watching the production process.
As a creative, you’re also not likely to simply become a proofreader or finishing artist, if you’re smart you can begin working on some really cool high-concept work. You’ll need to start thinking about things the world has yet to see. While that may sound overwhelming it’s also pretty exciting.
The possible downside
As DALL·E is trained on large datasets scraped directly from the internet with no attribution, there are concerns around copyright infringement. Some argue that “if it’s on the net, it’s public domain”. However, there are differing opinions on this and it seems that the only way to know is when the issue gets brought before a court of law. While OpenAI has come out in defence of their technology by saying they are ‘constructing a new dataset of 400 million (image, text) pairs collected from a variety of publicly available sources on the internet’, GPT models are said to reproduce their training content verbatim. As such it seems reasonable that creatives are sceptical.
Another concern that plagues all forms of AI is bias in machine learning, which we covered in great detail in our book Working Machines. Similar to natural language produced AI that has shown both elements of bias and racism, an image generator like DALL·E is also vulnerable to the potential for creating inappropriate material. In the best situations in the agency space, inappropriate concepts are weeded out during the creative process (most of the time) but with DALL·E, you’ll need human intervention to ensure no major faux pas reaches the light of day.
The future looks interesting
As an agency that specialises in the development of AI solutions for our clients, we can’t help but be excited about every new innovation in this field. However, one of the critical things we’ve realised is that we need to bring our creative teams along for the ride. Ensuring that they’re on board and are able to contribute to new ways of working brought about by AI innovation has helped us become better at what we do and the approaches we take for our client work.