Using Large Language Models (like ChatGPT) in the Federal Public Service
In recent months, there has been significant growth in the field of large language models (LLMs). Large language models use machine learning algorithms to process vast amounts of data and generate human-like textual responses based on that data. One product that has gained widespread attention is OpenAI's ChatGPT. Because of its early entry into the market, it currently has a strong advantage over its competitors.
Large language models like ChatGPT make predictions based on patterns they observe in the data. Specifically, these models try to predict which words are likely to come next in a conversation or written passage. With enough training data, LLMs can produce responses that seem natural and coherent. Because of the impressive capabilities of this new technology, many companies and organizations around the world are eagerly exploring its potential applications. Governments, including Canada, are also taking notice of LLMs.
What is ChatGPT?
ChatGPT is a commercial application of an LLM. It's a branded product, the way that a "cup of Tim's" is different from a cup of coffee. OpenAI hosts a huge amount of cloud computing resources that actually run their LLM model, and they provide a web interface to access it. They also offer an application programming interface (API) for those who want to build products that use their LLM.
Concerns with ChatGPT
OpenAI currently hosts ChatGPT as a "free-to-use" product, but they retain the data sent to it and the responses received from it by default. OpenAI has another commercial product named GPT-4 that is more advanced: it can take certain actions on behalf of users (like web searches).
There are also unresolved ethical concerns regarding how some of the data has been gathered. Indeed, the Office of the Privacy Commissioner has opened an investigation that is ongoing at the time of this article.
To be clear, sending information to ChatGPT should be considered as insecure as posting it to a public discussion forum.
What are large language models, more generally?
Large language models (LLMs) work by predicting the next word in a conversation over and over again.
Designers of LLMs collect and use large amounts of data to train the models so they get better at predicting what words will come next in a sequence. This data can come from a wide variety of sources, but most organizations have been extracting and collecting the huge quantities of data required from the internet. They then supplement the scraped data with thousands of human-crafted examples of how to behave when interacting with a user.
It's very expensive to train and fine-tune a large language model from scratch—typically in the millions of dollars. Recent open-source efforts are bringing down that cost. An LLM is also fairly costly to run (typically pennies per question or dollars per hour), but nowhere close to what it costs to train one.
Their size is measured in "parameters," meaning the number of internal connections that make up the model, typically in the tens of billions. ChatGPT, for example, comes in sizes as small as 1.3 billion and up to 175 billion parameters. GPT-4's size hasn't been disclosed by OpenAI, but is widely estimated to have well over a trillion parameters.
Newer LLM-based products can take actions on behalf of users, like looking up web content, providing emotional effects to online avatars, moving robots, or taking business actions (with robotic process automation [RPA] or other vendor-specific integrations).
What alternatives are there to ChatGPT?
There have been a flood of LLM alternatives in the last few months. Some of the more interesting options are listed below:
- HuggingChat: This LLM, which has a similar free-to-use model as ChatGPT, is hosted by HuggingFace. It has 30 billion parameters. The model is open to inspection, but the training data is not.
- LLaMA: This LLM from Meta, the parent company of Facebook, is designed for research purposes. It comes in several sizes ranging from 7 to 65 billion parameters. Although it is open to inspection, the license has restrictions on how the software can be used.
- Dolly 2.0: This LLM from DataBricks is also research oriented and varies in size from 2.8 to 12 billion parameters. It's small enough that it can be self-hosted. Dolly 2.0 is distinct in that it is open to inspection, all its data has been ethically sourced, and the user can control where their data goes.
- Nomic AI's GPT4All: This project gathers a large selection of LLMs with open sources and varying sizes and usage licenses and provides a common interface. GPT4All is distinct in that the models are optimized well enough to run on ordinary desktops or servers.
Large language models exist in a very fast-moving and dynamic space. By the time this article is published, new LLMs are likely to have been developed or existing ones may have been updated. Take the time to explore various LLMs on your own.
What can LLMs do for us as public servants?
Large language models excel at generating lots of text quickly. They can expand on, summarize, or change the tone of something you’ve already written. They can provide a starting point for drafting a document or brainstorming the content of an email or social media post. They can even write simple source code!
What are some of the general concerns with LLMs?
The fundamental challenge with LLMs is that they are not responsible for the content that they generate. That responsibility remains with you, the user.
And although it can be tempting to assume that LLM-generated content is accurate and correct, there are several less-than-obvious ways that LLMs can fail:
- Timeliness: LLMs stop learning at the moment they are created. This means that they may generate content that does not take into account recent events.
- "Hallucination": When an LLM is asked to generate content on which it had little or no training, it will fill in the gaps with the nearest information it has, but without expressing uncertainty. This manifests as confident-sounding content that is simply not true.
- Bias: LLMs are a product of the data on which they were trained. This means that biases in the underlying data will be reflected in the text they generate.
- Privacy: By the same token, information about identifiable individuals that was in the training data can be propagated out to the generated text. The Privacy Act imposes particularly stringent requirements on the public service around the handling and accuracy of information about identifiable people. If a model "hallucinates" details about an identifiable person, the impacts are even more serious.
- Accuracy: If the training set for an LLM includes misinformation, that can also find its way to the output.
How can the Government of Canada adopt LLM technology?
There are five options available to public servants for accessing LLMs, with different levels of practicality, as follows:
- Use a free option over the internet: This is the easiest option but it's not suitable for anything sensitive.
- Wrap a commercial application programming interface (API): Some vendors offer "private" access to LLMs, but privacy and security issues still need to be confirmed as part of procurement.
- Self-deploy an open-source LLM: Smaller models can be hosted off the shelf, in government clouds, or on desktops. They can ease many questions and concerns around security and privacy.
- Fine-tune an LLM: Self-deployed LLMs can be trained on Government of Canada content. This is expensive in terms of effort and computation, but the results can be justified if the content is audited.
- Build one from scratch: For control of bias, ethical sourcing, and security, it is technically possible to train a new LLM from scratch. However, this is an expensive option.
What are some of the government-specific challenges around LLMs?
The adoption of LLMs is particularly tricky in a government context. The industry leans heavily toward English-language products, and this is challenging for official languages. The record trail of data for many models is undisclosed, which brings up licensing and ethics questions. It is also the user's responsibility to ensure that the content generated by the LLM isn't subject to copyright.
Most importantly, LLMs can't vouch for the truthfulness or logic of what they generate. That responsibility remains with individual public servants.
A word of caution
When a new technology comes to market, there's a temptation to try to apply it to every possible use case and see what sticks. Large language models are exciting and powerful tools, and when you're handed a shiny new hammer, it's tempting to want to try to find something to hit with it. However, it's important to practise restraint.
That being said, by learning about and familiarizing yourself with LLMs, you're adding a new tool to your toolkit. It's a tool that can help you better understand a changing world in which LLMs and generative AI are being adopted globally across societies and industries. Large language models might be a solution, or they might not. It all depends on the context. It's always better to understand what you're trying to achieve before you act. This is something at which Design Thinking, as a practice and as a process, excels.
Resources