Many Siris

Brian Irace:

If the Lyft app is installed on your iPhone, you can ask Phone Siri to order you a car. But you can’t ask Mac Siri to do the same, because she doesn’t know what Lyft is. Compare and contrast this with the SDKs for Alexa and the Google Assistant – they each run third-party software server-side, such that installing the Lyft Alexa “skill” once gives Alexa the ability to summon a ride regardless of if you’re talking to her on an Echo in your bedroom, a different Echo in your living room, or via the Alexa app on your phone.

This is a major difference in approach between Alexa and Google extensions, which both use a server-side approach, and Siri, which runs extensions client-side. In a nutshell, Siri’s approach allows for a more custom and, at the same time, limited approach, using communication and negotiation between devices to work out what’s what.

Currently, this communication seems limited to which Siri should respond to a request. If you lift your wrist and say “Hey, Siri”, your Apple Watch gets priority. If your HomePod Siri is enabled and your wrist is down, HomePod gets priority. You get the idea.

What’s missing is an intelligent mesh of negotiation and handoff. For example, if HomePod gets a request to make a phone call, that request should be handed off to your iPhone, perhaps verifying the handoff with a “Would you like me to make that call on your iPhone?” first.

There are permissions issue to deal with in this kind of scheme, but it certainly seems a logical need. If I ask HomePod to call a Lyft and HomePod doesn’t have that capability, seems logical for HomePod to hand that task off to another device that can order one for me.

All that said, I can only imagine that Apple is hard at work on a solution for this Siri mesh issue.

UPDATE: I left the word extensions out of the original writeup. Siri is server side, but the extensions are client side. I’ve not actually built a Siri extension, so I’m on shaky understanding here, but I believe this is correct.

  • Brad Fortin

    I imagine it’s more difficult than most people would think. Maybe they can use a method similar to what they’re doing with Messages In iCloud.

  • Meaux

    This seems odd because Siri does all of its voice recognition server side. That is why Siri doesn’t work when you don’t have an internet connection.

    • Bill Krueger

      Since I had to train Siri on my iPhone to recognize my voice I don’t think that’s true. Also, Apple says nothing goes out to the internet until “Hey Siri” is said. But after that it goes anonymously to their servers.

      • “Hey Siri” recognition is done locally. Calling that voice recognition is a stretch, but sure. It’s the exception to “all” voice recognition being done server side.

        Everything after Hey Siri is recognized? Server side. I believe the response comes back to the device as something like p-code: “Hey Siri, turn on the lgihts.” “SAY ok, TURN ON lights” or somesuch.

        • Meaux

          By that logic Echo and Google Home are done locally too. Saying “Alexa” or “OK Google” is figured out by the local hardware.

          • It’s almost as if you read only my first sentence.

          • Meaux

            I was actually agreeing with your general thrust and meant it wouldn’t count as a differentiator.

        • Bill Krueger

          I think you’re right and I agree, the processing of the request, including recognizing what was said, is server side. I was confused by the wording where you said “voice recognition” which I read to be Siri recognizing my voice instead of someone else’s.

          • I find the way people are using “voice recognition” in the context of a homepod pretty slimy. “It can’t recognize two voices!” Of course it can. What it can’t do is distinguish between them.

            It might sound like a nit pick, but a device only able to recognize one voice would be pretty much useless.

            I think people are generally understanding the intent, though, so it doesn’t matte much.

      • Meaux

        Everything aside from “Hey Siri” is server side. Go into airplane mode and tell Siri to set a 5 minute time. She’ll tell you to connect to the internet first, even though executing the request does not require anything from the internet.

    • Dave Mark

      Apologies. I meant that Siri extensions are client side. Google and Alexa extensions are server side.

      I’m no expert here, so please do correct me if I’ve got this wrong.

      — Dave

  • “… and Siri, which is client-side…”

    Spit take. What, Dave?

    • Janak Parekh

      Yeah, I think Dave is missing something here, which weakens his argument. Most of Siri is server-side; try disconnecting from the network on your iPhone and you’ll see.

      • Verna


        blockquote>Gℴℴgle offers the people $98/hr to complete small jobs from home .. Do job for only few peroid of time daily & spend more time together with your family … any individual can join this easy work!!!this Tuesday I bought a gorgeous Lotus Elise just after earnin $16200 this-past/six weeks .it’s actually the easiest job however you could no longer forgive yourself if you don’t view it.!oh543k:⇉⇉⇉ http://GoogleGuruOnlineJobsOpportunity/earn/$97/perhour ♥♥j♥d♥d♥♥h♥♥s♥♥p♥♥♥c♥m♥♥♥d♥i♥r♥g♥v♥i♥♥v♥o♥♥z♥♥♥i♥♥n♥♥a♥♥g♥♥b♥g♥r♥♥♥e:::::!rf071f:wkyuk

    • Meaux

      Reading the linked article it makes sense. The service integrations for Siri are client side, while for Google and Amazon, they are client side. Which means, for Google and Amazon, one integration per account and any device can leverage the link. For Apple it means the link is limited to the single device, no matter what connections you may have established on other devices with the same account.

      • I think you’d be better off saying that some Siri integrations are done client side. There’s nothing that precludes server side, and some of them are done that way.

        On the other hand, Google and Amazon can only do server side, I think? I welcome correction there, though.

  • Bill Krueger

    Excellent article on the differences, the first I’ve seen. Thanks.

  • GG

    That’s an interesting contextual challenge. If you have 4 people with iPhones in the house, 2 with LTE Apple Watches, one of which was left in the same room as HomePod, which device should make that call? They’d all need to listen and know who is who, who is here, and probably still ask for confirmation. It looks easy on sci-fi TV.

  • Colin Mattson

    Brian must be an Echo user himself, because despite his lumping Alexa and Google together as server-side SDKs, Google gets consistency at least as wrong (if not moreso) than Apple does.

    Built-in Google Assistant features don’t even work consistently (or do the same thing) across Assistant devices, and device discrimination pretty much never works, to say nothing of trying to access third-party SDK apps.

    Even Alexa’s competently uniform behavior is tainted by Amazon’s loose reins on third-party Alexa devices. That benefit falls apart fast when suddenly the wrong device(s) starts answering misheard queries from another room.

    tl;dr: It’s an emerging market and all voice assistants suck.

  • rick gregory

    I think the problem is that the assistant is presented as Siri everywhere but it’s more or less capable depending on context and that’s not what we expect from an assistant. Imagine a human assistant who could summon you a Lyft if you called him on the phone but didn’t know what you were talking about if you used an intercom (HomePod). Or who sometimes would only respond if you texted her (macOS) but responded to vocal requests (iOS, Watch, HomePod).

    All of that would make you think your human assistant was playing games or just plain weird but that’s what we have with Siri. Yes, I’m sure there are hard technical issues behind resolving this, but people don’t really care. Right now, the burden of tracking what Siri can do in which context is forced on the person invoking her. That’s the wrong way around.

  • The Cappy

    If Apple wants to segregate functions, and the various devices can communicate with each other (both of which seems to be true) then before the currently chosen device says “I can’t do that”, it should poll your other devices to see if they can.