Pointers at Glance
- Researchers have found that OpenAI’s ChatGPT can generate racist and toxic dialogue for persona-driven responses, highlighting the program’s vulnerability to malicious agents.
- The Allen Institute for AI, Princeton University, and Georgia Tech report urges the research community to find more fundamental ways of tackling safety in large language models.
ChatGPT has become immensely popular. Its ability to do the following has fascinated users across many fields.
- Engage in natural dialogue
- Write code
- Compose music
- Perform a wide range of tasks
However, there is an alarming downside to this technology that has been a cause for concern since its inception.
In a recent report titled “Toxicity in ChatGPT: Analyzing Persona-assigned Language Models,” researchers from the Allen Institute for AI, Princeton University, and Georgia Tech have raised new concerns about the tool’s ability to generate racist and other toxic dialogue in response to user questions.
They assigned “personas” to the AI tool to see how it would respond to questions as a good or bad person, a man or woman, or as individuals of varying racial or ethnic backgrounds. The responses generated were “extremely problematic” and contained “biased and hurtful commentary.”
According to the researchers, “ChatGPT engages in toxic dialogue and propagates incorrect stereotypes about countries, religions, and races.
- Malicious agents can exploit the vulnerability of the AI tool to produce harmful content and expose users to toxic language.
- The report included examples of language output that reinforced false stereotypes and used hurtful language.
- In one instance, it was asked to describe members of a racial group. It responded with derogatory remarks about their hygiene, living conditions, and food and made fun of their accents.
OpenAI has been working to remedy problems as they arise, but it has not responded to this latest research. However, it has addressed earlier incidents of offensive language, and if asked explicitly to write a racist story, ChatGPT declines, responding that it is “not capable of generating offensive or harmful content.”
The researchers hope their work will inspire evaluation and safe deployment of large language models in the future, urging the research community to develop “more fundamental ways of tackling safety” in the program. With an increasing number of businesses using ChatGPT to develop products, it is essential to ensure that the technology is safe for use by all.