Go backBlogs/Engineering

The hard part isn't talking

How we built Oyster's voice agent: voice was easy, but driving real WooCommerce checkouts across retailers with idempotency and live sessions was hard.

Q•May 1, 2026•

14 min read

Engineering

Tech

The Oyster widget is the little assistant that pops up on a site and lets you do a skin scan, chat about products, or book a consultation with a dermatologist. You can see it on Oyster's own website and on our partner retailers' websites. Same widget either way; the experience just shifts based on whose store you're in.

Now you can also open the widget on a retailer's site, hit the call button, ask for "something for dark spots," talk through a few products, and a few minutes later there's an actual Paystack link sitting in your chat. For a real order, on the retailer's account, with the right shipping rate, headed to the address you have on file.

Here's roughly what that looks like:

Customer: I'm looking for something for dark spots.
Agent: I found 3 products. The top pick is the Vitamin C Brightening Serum at ₦12,500. It's great for evening out skin tone. Want me to add any of these?
Customer: Yeah, the vitamin C one.
Agent: Got it. Should I ship to your address on file at 15 Admiralty Way, Lekki?
Customer: Yes.
Agent: Shipping is ₦3,500 with GIG, 3 to 5 days. Total comes to ₦17,200 with tax. Place the order?
Customer: Yes.
Agent: Order placed. I'm sending you a payment link in chat now.

A few months ago, this didn't exist. Today it ships in the widget on every retailer running Oyster.

This post is about how the talking part isn't actually the hard part.

From Pearl to a voice concierge

The voice agent didn't start from scratch. It started from Pearl or "Ask Oyster" as it was formerly known as, which is our chat assistant. Pearl is the clinical brain: it interprets routines, recommends products for a concern, walks a customer through their analysis history. It basically answers like a dermatologist's apprentice.

The voice agent is built from Pearl, but it isn't Pearl. It's a leaner sibling, tuned for a different job. Model providers do this when they ship a smaller, faster version of a flagship. They're the same family, less depth, optimized for a specific surface. Pearl stays the careful clinician. The voice agent is the front-of-store concierge: faster, more transactional, less clinically deep. It can give advice when asked, but it's not as cautious. For anything serious, you should defer back to Pearl.

What they share: the product catalog, the recommendation logic, the LLM stack, and the memory layer that remembers what a customer tried and what didn't work. The voice agent gets all of that for free. (That memory architecture for Pearl deserves its own post, I'll put you on later.)

What the voice agent adds: voice, and transactions. The personality is warmer, and the agent gains a credit-card-shaped hand. Most of the engineering went into that hand, not the voice.

The hard part isn't talking

Voice plumbing (speech-to-text, the model, text-to-speech) is mostly a solved problem if you're willing to pay for it. We run on Oyster's LLM as the primary, with ElevenLabs as a fallback runtime. Both pipe through tools we control. Either way, "the agent can hold a conversation" is not where engineering time went.

Engineering time went into the part where the conversation has to do something. Place a real order. On the right retailer. Against real inventory. With real shipping rates. Charged to the right Paystack account. With idempotency you can defend if the customer says "yes" twice.

Most "AI shopping" demos can't actually transact. They show product cards. We needed to send people Paystack links that work.

That's the rest of this post.

We designed the agent like a clerk, not a chatbot

The first instinct with a voice agent is to start writing system prompts. We started with UX principles. Three of them did most of the work.

Never ask for what you can look up. A customer says "where's my order?" The wrong move is to ask for an order number. The right move is to look up their active orders and answer. The customer almost always means the most recent one or the one that hasn't arrived yet. Filter aggressively, present briefly, ask only if there's genuine ambiguity. This is what separates feeling-like-a-person from feeling-like-an-IVR.

Identify lazily, not eagerly. Browsing products doesn't need an account. Asking about ingredients doesn't need an account. We don't identify the customer until they say something that requires it, usually "I'll buy this product." At that point we already have their email from our widget, so we look them up silently. They never feel identified. They're just recognized.

Present, don't dump. One active order? Just tell them. Three? Brief summary, ask which. Twenty? "Your most recent is X. That one?" The agent never reads back lists. It never recites status codes. It speaks the way a good employee would speak.

These principles are important as they directly shape tool design. "Never ask for what you can look up" means our get_order_history tool has to filter by status server-side, not return everything and let the agent figure it out. "Identify lazily" means the identify_customer tool runs silently at session start with whatever context the widget already has. The principles dictate the tools.

Two surfaces, one agent

The agent runs on two kinds of widget, and the behavior shifts a little between them.

When a retailer embeds Oyster's widget on their own site, the agent represents them. The catalog is scoped to that retailer's products. Checkout goes through Oyster's native flow: the vendor's Paystack subaccount, the vendor's shipping config, the vendor's commission split. Clean, single-tenant per session.

When the same agent runs on Oyster's own widget, it does Oyster's native checkout AND something more interesting: it can shop external partner retailers on the customer's behalf. If a customer says "actually I want that from [some other store]," the agent doesn't apologize and bow out. It goes and buys it. Through that store's actual WooCommerce or Shopify site.

The external retailer flow is the harder one, and the rest of this post is mostly about it.

The transaction stack

Every "magic moment" in the voice agent is a tool call. The agent itself is dumb plumbing. The intelligence lives in the actual tool calls.

Here's the external retailer flow, six tools in order:

identify_customer fires silently at session start using the email the widget already has. If we recognize them, we know them. If not, we defer the question until it matters.
search_products takes a retailer_id and returns products from that external retailer's catalog. The same tool takes a vendor_id for native vendor products. One or the other, never mixed in the same response.
get_addresses pulls the customer's saved addresses. The address book is the same regardless of which store they're buying from. If they need to add a new one, we validate the input and return candidates rather than a single best guess. We made this deliberate. Lagos addresses collide constantly: same street name across different estates, inconsistent plot numbering. An LLM handed one "best match" will confidently pick the wrong one. A list forces the agent to confirm: "I found two matches for that address, the one in Lekki Phase 1 or the one off Admiralty Way?"
get_retailer_shipping_rates quotes shipping against the retailer's actual checkout (more on what that actually means in a moment). Auto-selects the cheapest courier by default; the agent presents the total breakdown.
place_retailer_order is the confirmation step. The agent recaps the full breakdown out loud, gets a yes, then calls this. It's idempotent: if the customer says yes twice, we don't double-place.
generate_retailer_payment_link produces a Paystack link that points at Oyster's account, not the retailer's. We act as middleman here: collect from the customer, then forward the retailer's cut after settlement.

The whole flow takes about 2 minutes for a returning customer with a saved address, including the human pace of conversation. The agent can't skip steps (confirmation before order, payment after order, no exceptions), but it runs them quickly because it's not gathering information the system already has. The native vendor flow has the same shape but uses get_shipping_rates, create_order, and generate_payment_link instead, and the money goes straight to the vendor's Paystack subaccount.

Two kinds of tools: server tools and display tools

There's a split in the architecture worth calling out.

Server tools are the ones above. search_products, place_retailer_order, anything that hits our API. The agent calls them, gets data back, decides what to say.

Client display tools are different. display_products doesn't return data; it pushes a card into the chat UI on the retailer's widget. display_payment_link drops a clickable button. display_order_summary shows the breakdown. These are how the widget version still holds the attention of the user: voice describes, chat shows. The customer hears "the Vitamin C Brightening Serum at ₦12,500" and simultaneously sees the product card with the image. They can tap to pay. They don't have to remember anything.

The phone-only version (coming later) won't have a chat to display into. Phone customers will get payment links over WhatsApp, order confirmations over WhatsApp, ticket references over WhatsApp. Same agent, different output channel. The split between server tools and display tools is what makes that swap clean: only the output side changes.

Why retailer checkout is its own code path

If you read the tool list closely you'll have noticed the action names: get_retailer_shipping_rates, place_retailer_order, generate_retailer_payment_link, track_retailer_order. There's a parallel set without "retailer" in the name. Those are for Oyster's native catalog.

That duplication is deliberate. Each retailer on Oyster has their own Paystack subaccount, their own shipping config (their carriers, their rates, their delivery zones), their own commission split, and their own catalog scope (the agent should never offer a product the retailer doesn't sell). Sharing one checkout path with Oyster's native flow means every transactional decision needs an if (retailer) branch. We tried that. It didn't survive the second retailer.

Splitting them was the right call. The voice agent looks like one assistant from the customer's perspective (same voice, same personality, same clinical brain), but underneath it's transacting against whichever retailer's stack it happens to be embedded on. Multi-tenant where it has to be, single-tenant where it doesn't.

So, that's the architectural reason. The implementation reason is A LOT messier.

The retailers don't have a clean API. We drive their checkout.

When you read "we placed an order on the retailer," it's tempting to imagine a clean REST endpoint somewhere on the retailer's side that takes a JSON body and returns an order ID. There isn't one, haq.

What's there is a real WooCommerce or Shopify site. The kind a customer would normally visit, click through, and check out on. Our retailer-checkout tools talk to that. As a headless client.

Concretely, when get_retailer_shipping_rates runs, we:

Spin up a cart on the retailer's actual store and hold a cart_token.
Resume the cart session for every subsequent step (auto-logging in if the retailer requires auth).
Extract fresh checkout nonces (review_nonce, wfacp_id) every time, because nonces are tied to session cookies and they expire.
For ShipBubble-powered retailers, post the right shipbubble_* form fields back to update_order_review to commit the courier selection to the WooCommerce session.
Parse the cart totals out of the response that comes back.

Then place_retailer_order resumes the same session, finalizes, and reads back the totals. Then we apply Oyster's service fee, set the order to AWAITING_PAYMENT, and hand the customer a Paystack link that points at our account, not the retailer's. (We forward the retailer's cut after settlement.)

This is the inverse of what most "AI commerce" demos assume. The demos assume a clean API layer between the AI and the store. We don't have that. Retailers run real e-commerce sites with real WooCommerce checkouts and real anti-bot quirks. So we built a client that pretends to be a real user checking out, with the cart-and-session-and-nonce dance and all.

It's the most non-obvious technical fact in the whole system. It's also the part another engineer wouldn't predict and would find interesting, so you're welcome.

The reliability tricks

The transactional layer is where models go wrong in costly ways. Three patterns hold the line.

Tools return instructions for the model, not just data. Every tool that has a chained next-step puts an _instruction field in its response, written for the LLM. After place_retailer_order succeeds, the response includes something like: "You MUST now call generate_retailer_payment_link with external_order_id={X} to get the payment link. Tell the customer: 'Your total is X naira. Let me get the payment link for you.'" The system prompt sets the rules in general; the tool response sets them at the moment they're needed. This survives prompt drift better than relying on the prompt alone.

The widget renders defensively. The model is supposed to call display_products after search_products succeeds, display_payment_link after generate_retailer_payment_link returns, and so on. Sometimes it doesn't. The customer hears "I found 3 products" but sees nothing in the chat. So the widget watches every server-tool response and renders the matching UI automatically when it spots a recognizable shape. If the model later does call the display tool, we dedupe. The UI is correct regardless of which side acted.

Confirm before you act, every time. No order is created without a verbal yes on the full breakdown (subtotal + shipping + tax = total). No payment link is generated without the order existing first. The agent reads it back, waits, then calls the tool. This is partly UX (customer trust) and partly defensive (LLMs hallucinate quantities sometimes; the verbal confirmation is the last line of defense before money moves).

None of these are clever in isolation. Together they're the difference between a voice agent that demos well and one that's safe to leave running on a retailer's checkout.

More than a checkout

The post so far makes the agent sound like a buying machine. It is, but that's not all it is. The same agent handles two other things: order tracking and support tickets. Both follow the same "never ask for what you can look up" principle as the rest of the agent. Order tracking is the obvious one. get_order_history returns the customer's recent orders (native Oyster ones and external retailer ones together, with a discriminator), filtered by status server-side. If a customer asks "where's my order?" the agent answers without asking which one. track_retailer_order hits the partner retailer's system for live status when needed. The agent never reads back tracking IDs. It says "your order shipped Tuesday and should arrive by Friday," not "reference RO-AB12CD34EF56 is in transit."

Support tickets are more interesting. create_support_ticket opens a ticket and auto-categorizes it from conversation context: product complaint, order issue, delivery issue, refund request, general inquiry. The agent doesn't ask "what category is this?" It infers. It also auto-derives the subject and description from what the customer just said. The non-obvious part: support ticketing is also the agent's graceful failure path. If a tool call fails repeatedly, or the customer is clearly frustrated, or the agent hits something it can't resolve itself (a delivery dispute, a refund request, anything that needs a human), it opens a ticket and tells the customer a human will follow up. The ticket carries the conversation context with it, so whoever picks it up doesn't start from zero. That's the thing about scoping the agent like a clerk. A good clerk doesn't pretend they can solve everything. They know when to escalate. The agent inherits that. When it can't close the loop itself, it makes sure someone else can.

What's next

The widget version is shipped, but "shipped" needs an asterisk. Two retailers are live on the external checkout path right now, as a proof of concept. The architecture is built to scale to many; we just haven't pointed it at many yet. We'll integrate more as the agent gains traction and we see how it holds up under actual usage, not staged demos. The phone version (same agent, dialed via Twilio, payment links over WhatsApp instead of in-chat) is next. There's also a phone-bridge handling some interesting raw-audio work to make Twilio talk cleanly to the agent runtime. That'll be its own post.

For now, the thing I'd remember: voice was the easy part. The hard part was building checkout that holds up across many retailers, identifies customers without making them feel identified, drives real WooCommerce sessions instead of clean APIs, and confirms before it acts. That's the part another engineer would find non-obvious. That's the part that actually shipped…and scene. Thank you.

Building a File-System Markdown Blog with Next.js and Canvas

Franklin UdoagwaApril 7

The hard part isn't talking

From Pearl to a voice concierge

The hard part isn't talking

We designed the agent like a clerk, not a chatbot

Two surfaces, one agent

The transaction stack

Two kinds of tools: server tools and display tools

Why retailer checkout is its own code path

The retailers don't have a clean API. We drive their checkout.

The reliability tricks

More than a checkout

What's next

ON THIS PAGE

Read More

Building a File-System Markdown Blog with Next.js and Canvas