“Could we be using GPT-3 to automatically respond to people instead of hiring those new customer service agents?”
If that’s a question you heard around budget time this year, this article is for you. If not, then perhaps you have a broader interest in AI and machine learning and the ways they could assist in delivering better customer service.
GPT-3 has been making news recently, so it’s worth taking a look to understand what it is and how it might help.
What is GPT-3?
GPT-3 is a language model — a way for machines to understand what human languages look like. That model can then be used to generate prose (or even code) that seems like it was written by a real person.
In simple terms, language models help computers estimate the probability of word sequences. You have a language model, too; what is the missing word in this sentence? “Why did the ___ cross the road?”
One of the most important differences between GPT-3’s generation of tools and earlier machine learning models is that you don’t need to train it with high-quality, carefully labeled and structured information. Instead, GPT-3 imbibed an enormous (and broad) amount of public online text and used that to develop its model.
That model produces some impressive results across a variety of use cases.
We asked GPT-3 to answer real customer service questions
The potential for customer service usage is clear — could this software read your incoming customer questions and generate accurate, helpful answers? Our own data scientist, Matt Mazur, decided to find out.
The team at OpenAI made GPT-3 available via an API, so Matt signed up. He fed GPT-3 only six examples of real responses from our entirely human (and highly skilled) Help Scout customer service team.
From those six responses, GPT-3 did not learn anything about Help Scout or its products; it only looked at the voice, tone, and structure our team used in providing those answers. That’s why some of the answers sound real, but don’t make much sense.
Next, Matt took some genuine customer questions (different from the six examples) and had GPT-3 generate responses. Please note we have no plans to actually implement GPT-3 at Help Scout — this was a purely experimental exercise. Here is an example from one of Matt’s tests:
Below I’ve included four examples (real questions, edited only for privacy) along with a comparison of our actual customer service team’s answers. Remember, GPT-3 is not using any Help Scout-specific information here, other than what it may have absorbed from the open web during that language model creation.
Hi! I just sent out the first few emails using a template but the text looks purple. It didn’t look like that in the preview. How do I avoid this?
GPT-3 has some convincing technically correct answers here, but they just don’t apply. It is the sort of mistake an inexperienced human could easily make.
Our real support pro has spotted a more likely cause and avoids a lot of unnecessary complexity.
I just downgraded our account from the Standard Plan ($20 per user per month) to the Basic Plan ($12 per user per month), but I am being billed $15 per user per month.
Kristi, from our Customers team, noted: “The customer needs clarity around annual vs. monthly billing, but they don’t realize it — something GPT-3 can’t know.”
This answer from GPT-3 is the type of polite yet completely unhelpful response that is particularly irritating.
Hi, Hope you’re well. We already have a Help Scout account but we are keen to attend the webinar on 15th October on “How to Create Customer Flow with Messaging.” The only thing is that it’s on at 4AM our time. Will you be sending out a recording?
There is an explicit question, but GPT-3 predicts a very generic answer. It’s likely that if it had more examples to work with, it could produce a better result to a pretty simple question.
I have tried multiple times to create an account but keep getting this error: (screenshot removed).
Info I submitted, Tacomia is company name and
firstname.lastname@example.org is the email.
Not sure what the problem is. I’m trying to set up an account for a new company I started. I already have HelpScout for an existing business. Please reply to
Again, GPT-3 is unfailingly polite, as our real team members are, but without the behind-the-scenes context, it doesn’t have a real chance to divine an answer.
Disha, from our Customers team, said: “[some of] the answers were incorrect or incomplete and the AI sounded dismissive by confidently providing short/wrong/vague/incomplete answers. It didn’t leave any room for the possibility of being wrong, and didn’t ask for clarifying info which we would have done.”
All in all, GPT-3’s answers sound very real in many cases, but they are also over-confident and unhelpful. As one team member put it, “This could be me on days where I am sleep deprived and my reading comprehension is non-existent.”
What is GPT-3 actually doing?
How should we understand the answers that GPT-3 is providing? Let’s start by being clear on what it is not doing.
GPT-3 is not:
- Searching a knowledge base or reading help documents to find the “right” answer.
- Understanding anything about Help Scout the company or its products.
- Judging whether an answer is correct or helpful.
What GPT-3 is doing is predicting, based on what it knows about how our English language works, what the response text is statistically most likely to be. Then it is using the six example Help Scout answers it has seen as a model for the tone of voice and structure to use in generating the final response. Note: We’ve set up quite an unreasonable test scenario, due to the limitations of our GPT-3 access.
Given all of that, it is impressive how close it comes to creating plausibly real responses, without ever understanding the context of the individual customer, their goals, or the products. It reveals just how much of our human interaction is almost formulaic — we have routines and phrases that we fit into the right situations over and over.
There’s nothing wrong with that at all — often that is exactly what our customers need. And human support people aren’t immune from giving a practiced response without having noticed a key detail in the question that changes everything. Sometimes, though, that’s going to be insufficient or actively unhelpful.
GPT-3 is probably going to be really good at producing the sort of mostly fine (as long as you aren’t an edge case) answers we have come to expect from … companies I don’t need to name for you. You’re aiming higher than that, though, for your company, as do we at Help Scout.
The verdict: GPT-3 for customer service
We must remember that GPT-3 doesn’t really know anything. A language model has no understanding of people or why they behave in certain ways. It just looks at how people have written in the past and uses that to predict what they would write in a given situation. It won’t judge its own answers, no matter how sneaky, funny or even racist they might be.
GPT-3 is working on a fine-tuning API which would allow us to feed it more specific Help Scout information, giving it additional knowledge to draw on. That might produce better results.
Still, AI isn’t yet able to directly give the sort of nuanced, thoughtful service that helps companies stand out. It falls into a sort of uncanny valley, sounding convincingly human much of the time, but then being unsettlingly close-to-but-not-quite real.
However there are plenty of other ways this technology really could help: offering a “best guess” suggestion for human staff, producing an automatic summary of a conversation, or offering writing tools, for example.
Recently Google trained a trillion-parameter AI language model that should provide even more impressive capabilities. The technology will continue to improve rapidly.
Should a small or medium-sized team be looking to engage with AI customer service tools today? Yes, but only if you have already done the work to understand what good customer service looks like in your company and how you can give your existing team the best chance of success.
AI tools can’t replace a customer-centric mindset or leadership that doesn’t value customer service. If you try, you’ll only be providing mediocre service more quickly.