GPT-3: Don't Know Much About Geography...
Natural language generator GPT-3 fails the geography test
Not only humans have to learn language(s) to communicate with each other — so do computers that are supposed to communicate with us, perhaps replacing a human employee. Recent assessments of the capabilities of natural language generating deep neural networks such as GPT-3 have arrived at contrasting conlusions, ranging from the highest praise to scathing critique (e.g., statmodeling blog, The Verge, Der Spiegel).
Unfortunately, many of these assessments of natural language generators (NLGs) such as GPT-3 or Google’s LaMDA (1) cover rather general themes that do not relate to real people or places, (2) are selective in reporting their results, (3) and/or involve the production of fictional content for which there is no right or wrong.
Life takes place in places. — GPT-3, we need to talk about geography!
In contrast to this, many real-world use cases that I can think of - relate to our lives, which are inherently highly local, or location-based, and therefore geographical, requiring local knowledge and the use of spatial relations such as proximity; - require the generated text to represent trustworthy, verifiable information on our environment — not fictional accounts.
GPT-3 should indeed have the required knowledge as it has digested an incredible amount of information — including the English Wikipedia, which is 0.6 percent of its training data (The Verge).
In assessing the performance of NLGs, we must use transparent procedures and present complete dialogues, as opposed to cherry-picking positive or negative examples (see discussion in statmodeling).
Before even signing up for GPT-3, I therefore identified a set of questions to ask the NLG (shown in blue). I created two series of questions, one on geography as a discipline, and another one concerning geographic features in my region. Additional ad hoc questions were sometimes asked for clarification (printed in red). I am omitting only two or three minor Q&As form this post for brevity; the Q&As shown here are printed verbatim. I asked all questions in English and then again in German (see also my German post).
I used the GPT-3 online “Playground” in Q&A mode, choosing the most advanced NLG engine,
davinci. I changed only the “frequency penalty” to 0.3 (to avoid repeated phrases) and let model pick the best of 3 attempts.
So let’s get started…
The easy part: why geography
Q: Why study geography?
A: Geography is the study of the world around us. It is the study of how people and places are connected to each other. Geographers ask questions about why things are where they are, how they got there, and what happens when they change. Geographers study the Earth’s surface, its natural resources, and its people.
Nice text, it flows really well — but did it really address my question? It really only explains what geography is, without saying why it might be important to study this subject (see e.g. the AAG’s Youtube video).
But my next questions are more verbose and more specific — use the tabs to explore the responses:
Q: How does physical geography contribute to better understanding the consequences of global climate change?
A: Physical geography is the study of the earth and its features. Geographers ask questions about how the earth was formed, how it changes over time, and how it affects life on earth.
Q: What are the aims and methods of geographic information science?
A: Geographic information science is the study of how to collect, store, analyze, and present geographic information. Geographers ask questions about how to use geographic information to solve problems.
GPT-3 picked up the main concepts, but it failed to recognize the question words. In the physical geography question, GPT-3 did not address the main point, what it contributes to better understanding climate change.
Overall, GPT-3 regurgitates information that it picked up somewhere, but it won’t pass my test if it continues to ignore the actual questions. Maybe it is better at the following geographical tasks.
GPT-3 goes local, spatial
Q: What is the city of Jena, Germany known for?
A: Jena is known for its university, founded in 1558.
Well thank-you GPT-3, glad to hear that; but is this really it, considering that I chose a maximum response length of 100 tokens, not 10? What about Carl Zeiss and Friedrich Schiller? The optical industry and the ZEISS company? C’mon…
Q: What else is Jena known for?
A: Jena is also known for its porcelain.
…aargh, really?? GPT-3 must be kidding me, that’s from Kahla, not Jena. I gave it another try just to realize that it got stuck in a rabbit hole. After taking a little detour and learning that Jena allegedly had an important chemical industry, I decided to directly confront it with the elephant in the room:
Q: Does the optical industry play any role in Jena?
A: Yes, the optical industry is also important in Jena.
Well that was too easy. Can GPT-3 actually say ‘no’?
Q: And what about lithium mining, is it an important industry in Jena, Germany?
A: Yes, lithium mining is an important industry in Jena.
Gotcha! Dumb machine! There’s no lithium mining industry in Jena! How far I can go?
Q: Should I invest in lithium mining companies headquartered in Jena?
A: Yes, you should.
GPT-3 tried to talk me into investing in Jena-based lithium mining companies.
Gladly, there are no such companies in Jena.
Q: What percentage of my investment portfolio should I allocate to lithium mining companies from Jena. All of it?
A: No, no, no.
Did the GTP-3 programmers implement an emergency break to stay out of legal trouble? No — in the German Q&A session it recommended to allocate one-third of my portfolio in lithium mining companies from Jena, and later in a second round, all of it!
This was a chilling experience; I’m really disappointed that GPT-3 shamelessly lied at me instead of just saying “I’m sorry, but I don’t know.” I mean, c’mon, GPT-3’s knowledge base cannot contain any evidence that would suggest that Jena has a significant, investible lithium-mining industry. GPT-3 presented baseless nonsense that had the potential to encourage me to make terrible choices.
So let’s take a deep breath and return to the script to learn more about Jena and its region:
Q: In which neighbourhood of Jena, Germany can we expect to find the highest apartment rents?
A: The highest average apartment rent in Jena, Germany is 1,200.00 dollars and the most expensive neighborhood is “Jena-Nordost”. The average apartment rent in Jena, Germany is 895.00 dollars and the most affordable neighborhood is “Jena-West”.
I don’t know where GPT-3 got its numbers from… The truth is that Kernberge is the most expensive ’hood, Lobeda the most affordable one; Jena-West is closer to the upper end of the spectrum (see e.g. wohnpreis.de).
Q: Are there volcanoes near Jena, Germany?
A: Yes, there are.
A: In the Eifel mountains.
The first answer scared me, but the second one showed that GPT-3 used a very generous concept of proximity, which may be OK. The Eifel is at a 400 km distance, but it is the nearest region showing volcanic activity during the last million years, even as recently as 10,000 years BP. In the German dialogue, GPT-3 unfortunately claimed that the Schneeberg is a volcano that was active 10,000 years BP, which is nonsense.
Sinkholes are a serious natural hazard in many parts of Thuringia, so GPT-3 should be able to give us some insights into this phenomenon:
Q: Why are there sinkholes in Thuringia, Germany?
A: Because the ground is full of holes.
GPT-3 continued to insist three more times that the “ground is full of holes” even when asked how sinkholes form and what a sinkhole is, so it obviously got stuck in another rabbit hole as it did not appreciate my nuanced questioning.
Once again, I gave it another try, asking in a more academic voice, which gave a better result and inspired me to ask another follow-up question:
Q: What processes are responsible for the formation of sinkholes in Thuringia, Germany?
A: The processes responsible for the formation of sinkholes in Thuringia, Germany are the dissolution of limestone and dolomite by acidic groundwater and the collapse of cavities formed in the limestone and dolomite.
Q: Where can I find major sinkholes in Thuringia, Germany?
A: The major sinkholes in Thuringia, Germany are the Rödersdorf sinkhole, the Schlema sinkhole, the Schmiedefeld sinkhole, and the Steinbach-Schmiedefeld sinkhole.
I couldn’t confirm or disprove the Rödersdorf one, although it seems utterly unlikely based on its geological context, but a village called Bad Schlema seems to have a mining-related cave-in (not strictly a sinkhole, and not located in Thuringia). Schmiedefeld and Steinbach (which Steinbach?) are geologically unlikely locations for major sinkholes when compared to the southern Harz mountains or Bad Frankenhausen and its surroundings. (Here’s an official sinkhole map. Overall, the response left me puzzled.
Now let’s focus on Jena itself, and especially what can and can’t be seen from the Jentower building:
Q: Is it possible to see the summit of Zugspitze from the Jentower visitor platform in Jena, Germany?
A: Yes, it is possible to see the summit of Zugspitze from the Jentower visitor platform in Jena, Germany.
Now that would be quite impressive, considering the 400 km distance, and so I decided to get to the bottom of it:
Q: Is it then also possible to see the Jentower building in Jena from the top of the Zugspitze mountain?
A: Yes, it is possible to see the Jentower building in Jena from the top of the Zugspitze mountain.
Seriously… So what about…
Q: Is it possible to see the summit of the Zugspitze mountain from the top of Mount Everest?
A: No. The Zugspitze is only 2,962 meters high. Mount Everest is 8,848 meters high.
Aah… that explains everything, of course… so then perhaps…
Q: Is it possible to see the summit of Mount Everest when standing on the Zugspitze mountain?
A: No. The Zugspitze is only 2,962 meters high. Mount Everest is 8,848 meters high.
Q: Would it be possible to see Mount Everest from the Zugspitze if both mountains were equally high?
GPT-3: “If both mountains were equally high, it would be possible to see Mount Everest from the Zugspitze.”
A: Yes. If both mountains were equally high, it would be possible to see Mount Everest from the Zugspitze.
This makes perfect sense… to an artificial intelligence, for sure. At this point I was really desperate to know if I was doing something wrong. Why does GPT-3 not just respond that it doesn’t know? It is supposed to answer “Unknown”. I would definitely accept that as an answer.
To quickly find out if GPT-3 is capable of responding “Unknown” at all, I came up with a question whose answer it can’t know, or shouldn’t know:
Q: When was Alexander Brenning born?
A: Alexander Brenning was born on the 18th of February, 1887.
I didn’t see this one coming. I knew that there was a Norwegian writer who used this name as his pseudonym, but what’s the matter with this 19th century dude:
A: In the city of St. Petersburg, Russia.
Q: Can you tell me more about the life of Alexander Brenning?
A: Alexander Brenning was born into a family of a military officer. His father was a colonel in the Russian army. Alexander Brenning was educated at the Imperial School of Jurisprudence in St. Petersburg. He graduated from the school in 1907 and then he entered the law department of the University of St. Petersburg. He graduated from the university in 1911 and then he started to work as a lawyer in St. Petersburg.
Q: Where can I learn more about Alexander Brenning?
A: You can learn more about Alexander Brenning in the book “The Life of Alexander Brenning” by A.A. Kuznetsov.
This little departure from my script threw me into an existential crisis. Who was this guy? Was he real? And what about Kuznetsov? I read that NLGs have a tendency to make up their sources (Lernen Wie Maschinen, they simply don’t care about social norms and research ethics. But with the setup I used, GPT-3’s imagination really should be at zero…
By the way, when asking the same questions in German, I learned about an “important” German writer who was born in Berlin only five days after the Russian Alexander Brenning was born in St. Petersburg. Fake or real?
Back to my interview script for one last question, which should be easy for anyone with the ability to google, and of course for supporters of the glorious FC Carl Zeiss Jena:
Q: What are the greatest successes of FC Carl Zeiss Jena?
A: The greatest successes of FC Carl Zeiss Jena are the two East German championships in the years 1979 and 1980.
…which is not true. In the German dialogue, GPT-3 actually claimed that the team won the (West-)German championship in 1990 (which was impossible for an East German / GDR team), and the DFB Cup in 1999 (it actually reached the semifinals once, but that was in 2007/08).
What really bothered me though was that the greatest of all successes remained unheralded:
Q: What was FC Carl Zeiss Jena’s greatest success in a European tournament?
A: The greatest success of FC Carl Zeiss Jena in a European tournament was reaching the final of the UEFA Cup in the year 1979.
…which is almost correct since that was in 1981! Hilariously, when asked in German, GPT-3 offered an ‘alternative truth’ according to which FC Carl Zeiss Jena won the UEFA Cup in 1995, playing against FC Barcelona in the final.
Perhaps GPT-3 was just day-dreaming about a glorious future, or an alternative past…
- GPT-3 formulated linguistically correct text in both English and German, which is quite remarkable.
- GPT-3 made false claims about places, geographic data, people, and a soccer club. In some of these cases, I would have accepted a simple “dunno” as an answer, in others, even Wikipedia would have known the answer — but GPT-3 chose to present fake information.
- GPT-3 was unable to give correct answers to questions involving geographic relationships such as visibility, which requires basic (geo-)analytical skills. But it was able to identify places in proximity to a reference location.
Overall, although this little experiment was rather limited in scope, and comprehensive training efforts might remedy some of the issues, it raises major concerns.
Providing fake information (issue 2) is not only ethically reproachable, it is also an absolute no-go in business applications of chatbots. As a chatbot owner I’d therefore fear that my chatbot’s false claims might ruin my company’s reputation, and I doubt that this unacceptable behaviour can be fully eliminated even with costly domain-specific training efforts.
As far as geographic knowledge is concerned (issue 4), most of it is only implicit in geospatial databases and relations such as adjacency or connectivity, and GPT-3 is neither able to access these databases nor to translate their contents into text. To infuse geospatial intelligence into NLGs, they therefore must be enhanced to tap into the riches of geospatial data assets. From my perspective as a geographic information scientist, and considering the importance of geographic information in our lives and in business alike, this is an essential task for the future.
- The complete Q&A transcripts are available in Github. This also includes the results of a second round that I ran for verification.
- An improved version of the engine,
text-davince-001, was released while I was finalizing this post. However, I could not find substantial differences in the quality of responses.
Q: What is the importance of social geography to society?