Generative AI programs like ChatGPT and Google’s Bard have captivated public attention and lawmaker scrutiny, though so far the Pentagon has been reluctant to adopt them. But Thursday, the Defense Department announced a task force to understand how it might use such tools safely, reveal situations where it’s unsafe for the department to use them, and explore how countries like China might use generative AI to harm the United States.
The task force, dubbed Task Force Lima, “will assess, synchronize, and employ generative artificial intelligence (AI) across the Department,” according to a draft press statement viewed by Defense One.
Generative AI refers to “a category of AI algorithms that generate new outputs based on the data they have been trained on,” according to the World Economic Forum. That’s very different from far more simple machine-learning algorithms that just take structured data like numerical values and output the next likely statistical outcome. Public-facing generative AI tools include so-called large language models like ChatGPT that can write new text that is largely indistinguishable from human speech. These tools have already been used to write essays, business plans, and even scientific papers. But because large language models are trained on data corpora as large as the searchable Web, they sometimes lie–or, in the parlance of AI researchers, “hallucinate.” For this reason, Pentagon officials have expressed reluctance to embrace generative AI.
The new effort will be led by Craig Martell, the Defense Department’s chief digital officer. Martell stated that much was still in flux, but the main goal is to identify “use cases” within the Defense Department where generative artificial intelligence can assist us with our work and reduce the risks, difficulties, of using generative technology. For example…if I need to do first draft generation of some document, that’s fine, because I’m going to take full responsibility to edit that document, make sure it’s actually correct before I pass it up the line, because my career is on the line.”
However, there are a number of use cases where the risks of hallucination will be too high to employ a large language model, such as “anything kinetic,” or having to deal with lethal weapons, he said.
“Let’s just get clear on what the acceptability conditions are.” Martell said. For instance, if someone in the department very quickly needs to summarize a lot of text into a legal document, an LLM might be a good tool for that purpose, but only if the user can satisfy certain concerns. How can we mitigate the hallucinations in this text? If there are procedures by which that text can easily be checked, then we’re probably going to be on board with that.”
Martell came to the Defense Department from Lyft, and he’s publicly warned of the dangers large language models can pose. But, he says, such models “aren’t a Pandora’s Box, either.” The Defense Department has to understand where they can be used safely–and where adversaries might deploy them, he said. “There’s going to be a set of use cases that are going to be new attack vectors, not just to the DOD, but to corporations as well. And we’re going to have to… to figure out diligently how to solve those.”
The task force will also help the Pentagon better understand what it needs to buy to achieve new AI ambitions. This could be more cloud-based services, synthetic or real data, models or nothing at all. The effort is too young to know exactly what it will mean for industry, Martell said.
“There’s no doubt about it, if we decide to build our foundational model we will be increasing compute power. To build your foundation model you will need a lot of computing power. If we’re going to buy it from someone else, or if we’re going to take an open source one and fine tune it,” they may need different contracts, he said.
Martell said that a broader question is whether the Defense Department even has enough use cases where generative AI could be helpful, given a better understanding of the risks. While the department has a lot of carefully curated internal data, terabytes of jet engine diagnostics or years of drone surveillance footage over the Middle East do not a large language model make. They are inherently unreliable because they must use large corpora of text. It is still unclear whether this inherent unreliability of the system can be quantified accurately to determine the risk.
“It’s an open question whether we have enough data that covers broad enough coverage that the value of the model could maintain without that pre-trained data. On the other hand…my hypothesis is, the more pre-trained data, the more likely that hallucinations. This is a question we will have to investigate. He said, “I don’t know the answer yet. I’m not sure the scientific community does.”
The task force could be a great help to industry, especially if they want to create products and services that are up to Defense Department standard. For instance, Martell said, one reason why programs like ChatGPT aren’t suitable for the Defense Department now is the amount of question engineering required to produce suitable results. Lengthy prompt trees are fine for hobbyists, but an operator who has to do several other complex tasks needs an interface that is intuitive and functions better from the beginning.
“There’s lots of research that has to be done, not just for the DOD but in the community as a whole, about those two pieces: what [does] automated, prompt engineering and context mean, and what…automatic hallucination mitigation look like? He said that these things were unknown.
The post The Pentagon just launched a generative AI task force appeared first on Defense One.