Ever since OpenAI’s “ChatGPT” was released to the public, everyone’s view on the future has changed. Its advanced ability to answer complicated prompts with an alarmingly high accuracy has been causing outrage in the public. It has also been raising questions about the future that concern billions of people on the planet. How will AI affect education? How will it affect the workforce? How will it affect politics? We have already seen several cases where AI has been used in these scenarios, but lately, a study from Stanford University and UC Berkeley have shown that it’s slowing down.
Researchers from Stanford University and UC Berkeley have noticed that ChatGPT has had varying accuracy scores over short periods of time, so they conducted a study. The researchers of the study tested GPT-3.5 and GPT-4.0 using seven different prompts: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge intensive questions, 5) generating code, 6) US Medical License tests, and 7) visual reasoning. These prompts were chosen because they can evaluate the diverse capabilities of ChatGPT. The results of the study showed that in some prompts (dealing with numbers, for example), the accuracy score has decreased over time: “We find that the performance and behavior of both GPT-3.5 and GPT-4 varied significantly across these two releases and that their performance on some tasks have gotten substantially worse over time, while they have improved on other problems.” This backs up the potential idea that AI is getting less effective, but why?
AI Drift is when the program begins to deviate over time from what the creators of the AI wanted. With GPT-3.5 and GPT-4.0, for example, some of the tasks that it can perform have had lower accuracy scores compared to when they were closer to their release: “We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy).” The reason that this drift happens has to do with the training of the AI. It takes a very long time to train an AI (while also being extremely expensive). When the data distribution changes over time, the AI basically becomes outdated.
ChatGPT is undoubtedly one of the most advanced AI language models that exist, yet it’s still being affected. This brings up the question, how will other programmers deal with AI drift? With less funding compared to the larger models, will other AI-powered processes fail? That’s something that will be discovered, and reported on, in the future.