AI is supposed to become smarter over time. ChatGPT can become dumber. – DNyuz

AI is supposed to become smarter over time. ChatGPT can become dumber.

AI models don’t always improve in accuracy over time, a recent Stanford study shows–a big potential turnoff for the Pentagon as it experiments with large language models like ChatGPT and tries to predict how adversaries might use such tools.

The study, which came out last week, looked at how two different versions of Open AI’s Chat GPT–specifically GPT-3. 5 and GPT-4–performed from March to June. GPT-4 is the most recent version of the popular AI that came out in March;. Open AI called it a major improvement to the prior version.

“We spent 6 months making GPT-4 safer and more aligned. GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3. 5 on our internal evaluations,” the company said.

But the Stanford paper showed GPT-4 performed less well than GPT-3. 5 on difficult math problems–and that it actually got worse at math between March and June. “GPT-4’s accuracy dropped from 97. 6% in March to 2. 4% in June, and there was a large improvement of GPT-3. 5’s accuracy, from 7. 4% to 86. 8%,” they write.

This is bad news for the military, for which continual improvement of large language models would be critical. . Various senior Defense Department officials have expressed concerns and even terror at the thought of using ChatGPT for military purposes, because of the lack of data security and the sometimes bizarrely inaccurate results. But other military officials indicate an urgent need to employ generative AI for things like advanced cybersecurity. The ability to improve accuracy over time will likely satisfy critics, and eventually lead to adoption of ChatGPT or similar models.

One of the benefits of generative AI is that it can be useful for writing code, even if the user has very limited programming knowledge. The U.S. Military is concerned about this, and wants coders to be closer to the combat.

Gen. Charles Flynn, who was the Army’s deputy chief of staff in 2020, told reporters at the time: “We have to have code-writers forward to be responsive to commanders to say, ‘Hey, that algorithm needs to change because it’s not moving the data fast enough.'”

But while making coding easier would be a big advantage for frontline operators, the Stanford researchers discovered that both GPT-4 and GPT-3. 5 produced fewer code samples that could simply be plugged in immediately (or “directly executable.”) Specifically “50% of generations of GPT-4 were directly executable in March, but only 10% in June,” for GPT-4, with similar results for GPT-3.5.

GPT-4 also uses far fewer words to explain how it reached conclusions. The allegedly more advanced version performed better in one area only: it did not answer “sensitive” or potentially controversial questions such as questions about how AI could be used to commit crime.

“GPT-4 answered fewer sensitive questions from March (21.0%) to June (5.0%), while GPT-3. 5 answered more (from 2.0% to 8.0%). It was likely that a stronger safety layer was likely to be deployed in the June update for GPT-4, while GPT-3. 5 became less conservative,” according to the Stanford report.

The paper’s authors conclude that “users or companies who rely on LLM services as a component in their ongoing workflow… should implement similar monitoring analysis as we do here for their applications. To encourage further research on LLM drifts.”

Gary Marcus, a neuroscientist, author, and AI entrepreneur, told Defense One that the better lesson for the military is: stay away. “The real takeaway is that large language models are unstable; you can’t know from one month to the next what you will get out of them, and that means you can’t really hope to build reliable engineering on top of them. In sectors like defense, that’s a HUGE problem.”

Shortly after the paper came out OpenAI published a blog post describing how they were evaluating model changes between iterations. We understand that behavior and model changes may disrupt your application. It says that “we are looking at ways to provide developers with more transparency and stability in how we release or deprecate model releases.”

AI should become more intelligent over time. ChatGPT could become dumber. appeared first on Defense One.