Social Robots need to see you express yourself!
Last week I attended the first edition of HLTH Europe held in Amsterdam.
I grew up so close to the city so I took the opportunity to reconnect with an old friend for lunch. We arranged to meet near a metro station and as she came up the stairs, we both had this immense grin and kept our gaze locked. It was amazing to meet again and our smiles betrayed just how much we had been looking forward to our conversation. Over a delicious Italian panini from Mercado Feduzzi (I had the Vitello, go and check them out, they’re amazing!) our conversation continued and got ever more animated. We discussed fun things, serious things, and shared a few jokes. At times we had to turn our attention to the waiter mid-sentence, and then we picked up exactly where we were.
Why am I recalling this meeting?
Because I want to point out just how important our facial social signals are for communication. You probably don’t realise it, but you use your face, your gaze, and your body constantly to express not only how you are feeling, but also that you want to interrupt, that you (dis)agree, or that you’re processing what someone is saying. At one point during our conversation, the waiter arrived to take our order - I urgently signalled to my friend to pause for a moment as I was really hungry and food is really important to me 🙂 .
But what if the waiter had been a robot?
This is not as far-fetched as it may sound. At CES this year Richtech Robotics demonstrated their robot barista ADAM, and these “social robots” have many other emerging, and I would argue more important use cases. These include in education and health and social care, for example assisting elderly people at home or in the hospital.
The introduction of large language modes is a major capability upgrade for social robots, be it actual robots or embodied virtual agents, allowing them to have long conversations with you. But these robots can’t see you, the user, nor the people next to you. That means that the robot can’t tell whether you’re joking, whether you are enjoying the conversation, or even whether you’re addressing the robot or your friend across the table.
To be truly useful social robots need to be able to identify your expressed behaviour and interpret your social signals. If not they will be much less effective and much more annoying. Think about your smart speaker. Most people I know regularly complain about how un-smart they are, and more often than not, they’re actually referring to their social skills.
It doesn’t have to be like this.
Expressed emotion, social gaze signals, head actions, visual voice activity detection, all can now be recognised efficiently on-edge with tools such as B-Social. B-Social is a software development kit (SDKs) which comes with easy-to-use APIs and documentation that make integration into your social robot as easy as 1 2 3.
As my friend and I parted ways again after lunch, I had a bittersweet feeling. It was great catching up, but I’d be flying back to the UK in just a few days and we probably wouldn’t meet for a good few months. Did she read my mind? Did she know exactly how I felt? Of course not. Mind-reading isn’t possible. But I’m pretty sure we could both tell the other had a lovely, pleasant, lunch. It’s not mind-reading, it’s what humans do. We signal and we interpret and, on the whole, we get it right far more often than not.
So, in my techno-optimist view, AI for social and emotional signal recognition is here to stay and will make a big, positive commercial and social impact!