Making Friends and Influencing People: Building Rapport in Humans and Virtual Humans
From Anita Borg Institute Wiki
Professor Justine Cassell Technology & Social Behavior @ Northwestern University
Embodied Conversational Agents are useful for creating rapport to enhance human-computer interaction. Professor Cassell gave an overview of her research methodology in modeling, simulating and evaluating rapport using interactive avatars. Key projects discussed were: an overview of the FEMBOT Architecture, REA - a Realtor Avatar that tries to create common ground via smalltalk, SAM - a physical-to-virtual magic castle interface and agent that engages small children in storytelling, studies done on models of conversation and rapport-building over time, and using SAM to teach building rapport to autistic children.
Background: M.A. in Literature, M.A. in Linguistics Ph.D. in Linguistics & Psychology from Univ. of Chicago MIT Media Lab - Gesture & Narrative Language Research Group
Director of a new research center: The ArticuLab and a new Ph.D. program: Technology & Social Behavior at Northwestern, a dual program in Computer Science and Communication Studies that joins social scientists from humanities and technologists - looking for new students and junior faculty
Why she started studying non-verbal behavior: Was thrown out of a restaurant when trying to describe Niagara Falls, accidentally hit her plate, catapaulting a fork into the air, which by chance landed in the shirt-button of man sitting at another table along with her spaghetti, restaurant owners thought it was on purpose although she said it was an accident. Launched a career studying non-verbal behavior.
In the past, "I have been told I'm not a real computer scientist" but real computer science is much broader and more diverse
A talk on Embodied Conversational Agents (ECAs)
Methodology = scaffolding
a key aspect of human-human interaction is RAPPORT How to simulate rapport with human-computer interaction?
rapport - warm fuzzy feeling at an event or dance, feeling better about yourself and your life, psychologists believe part of that comes from seeing yourself in others and forming an instantaneous bond w/ ppl who seem to be like yourself good for mental health and learning and for efficient use of technologies in world around you
How to make people feel they have this bond with a computer they're trying to use?
Methodology
start by looking at social world world around and how we try to make things more humanized
eg. how does interaction with a realtor illustrate fundamental aspects of HCI?
STUDY (acquire data) => MODEL(rules, formal predictive model - given preconditions, what are expectations) => BUILD (implement technology on basis of model) => TEST (evaluate people actually using the technology) => RE-STUDY (rinse & repeat!)
quick overview of phenomena she has looked at:
- how people use gross body shifts - people shift their hips when changing topics
- hand gestures aligned with speech phonemes put on video tape to analyze how they are synchronized
- others: eye gaze, eyebrow raises, intonation
- multi-party interaction
- people engaged in collaborative task - getting from pt.A to B using a map
- previous embodied conversational agents (ECAs) - eg. virtual grandchild who engages old ppl in telling the story of their lives - called reminiscence therapy - seniors are more likely to do this more effectively with a virtual child/avatar than with just a video camera in an empty room (as is done in senior homes today)
- spatial maps - how to improve GPS devices as a function of how humans actually give directions, what is the most effective way to give directions
- Avatars of 3D chat world:
- Laura - helping people keep to an exercise routine
- Sam - physical-to-virtual interactive story-telling interface that encourages proto-literacy skills
- Rea - Virtual Realtor - "Wouldn't it be cool if we build a project and called it 'Virtual Realty'?"
- hire animators to use data on Human-to-Human interaction models - automatically generates non-verbal behavior
Each step is subject to evaluation - there are at least 3 kinds of evaluation:
- Is our model of human behavior an accurate model of human behavior?
- Is our system an accurate instantiation of that model?
- Is evaluation designed to test the model and not other features?
FEMBOT Architecture
FEMBOT
F stands for: propositional & interactional Functions M is for Multimodal B is for Behavior - separation of function and behavior - there are some really bad avatars like Oz or the Palace - beautiful graphics that would automatically generate behaviors - like checking their watch - NOT a good model T is for Time and synchronicity
We cannot NOT communicate - everything we do communicates something Every piece of tech we build that represents humans in any way gives rise to those inferences so must be careful in designing
- Separate behaviors
- Tease and leverage aspect of time into conversations - response time during a conversation changes people's interpretation
- Multiple strands always going on
REA - a realtor avatar
At first avatars were not compelling because they always threw out beginning and end of videotaped conversations with realtors and clients. People would just be chatting at it hadn't seemed important, but they realized that it is omnipresent - so they decided to look at smalltalk. Small talk exists in every dialogue, and not random.
Realtors = really good at smalltalk - chose a really successful (live person) realtor named Diane to have dialogues with house-hunting participants genuine real estate advice to study their conversational behavior.
[During the video excerpt, Diane the realtor asked about the gender of the couple's unborn child and other seemingly non-essential info.]
They designed a formal relational model based on this study
What role does smalltalk serve in human interaction?
Results:
- builds common ground
- reciprocal appreciation - "nice weather. yeah, nice weather. nice weather" - simple physical coordination important in every conversation
- coordination
- familiarity
- solidarity - being like another person
- to build trust (the bottom line)
- to avoid "face threats"
- making another person feel that she is not autonomous in the world and has done something wrong
- all societies try and manage face threat, eg. in US threatening to ask someone her salary is face threatening
- Diane the realtor would engage people in smalltalk so that they had established a relationship before a face threatening topic like income came up
Relationship architecture:
- familiarity, solidarity, power, affect => a discourse planner
- conversational moves
- agent goals are part of discourse panner
How to achieve familiarity in the model?
- came up with an activation net-based natural language generation system
- using formal model = >computation architecture => computational linguistics
- get as familiar as you can then move on
- within familiarity assess where they want to live
Result: Relational REA (an animated virtual realtor agent)
- includes vision system to keep track of person
- a bit slow in interacting - lots of communication among modules - in 2001 needed 5 networked machines to run REA - now she runs on a single PC
- focused on small talk - "are you tired, etc...."
- uses classic strategies to generate trust:
- self-disclosure "I'm sorry about my voice"
- humor - tries to make jokes
If they trust her would they be more willing to work with her as opposed to an ECA that got right down to business?
Evaluation of REA:
Is a social and chatty realtor better or completely task-focused more effective?
method - selected 2 sets of opposing conditions to compare REA with a completely task-focused realtor ECA:
- introverted user participants vs. extroverted user participants
- over the phone vs. embodied, i.e., face-to-face with ECAs on a large screen
extroverts said:
- REA is more trustworthy
- knew REA better and REA knew them better
- interaction was successful and satisfying
introverts said:
- no difference between REA with small talk and task-focused ECA talk
- introverts felt better known than if they didn't engage in small talk
Results:
non-face-to-face: (i.e, over the phone)
- more tedious and less friendly in face-to-face
- phone more friendly, likeable, warmer, etc.
The reason phone interactions received much more positive feedback than in-person:
- she wasn't very warm-looking
- stiff - they had forgotten to consider the non-verbal correlates of small-talk in designer her
- if there is a mismatch between verbal and non-verbal behavior afterwards people feel bad about interaction
SAM: collaborative fantasy play and storytelling with small children
After REA, they went back to the drawing board and looked at small children
- small children build rapport quickly, use rapport a lot, it's important to them
- children learn better from teachers who they believe likes them
- children learn better when their friends are sitting next to them
- learning is more effective when they sit next to their peers than with adults.
- small children will also get right in each other's faces, look each other in the eye, engage rapport with their whole bodies
recorded videos of kids collaborating
Built formal models of how kids collaborate and take turns See "Co-authoring, Corroborating, Criticizing: Collaborative Storytelling between Virtual and Real Children" (paper)
Result: Sam (gender ambiguous on purpose)
- magical toy castle that extended from physical world into Sam's virtual world so kids would be able to tell story with both
- learning storytelling as way of becoming better at reading and writing
- never an adult in the room when kids would play with Sam (kids act differently with adults in the room)
- Sam gets responses wrong a number of times - doesn't bother the children
- kids instantly formed a rapport with Sam and were more likely to use a particular kind of a speech that predicts future literacy, eg.
would use quotes in speech, and use more sophisticated temporaral and spatial information in a story
- Sam forces kids to come out of their heads - to explain the story to someone that is like them but also different, more and more explicitly than playing by themselves
- many would ask: "Sam, do you pee?" most popular question from kids - what makes children think something is alive - this does
- if children played with Sam over a 3-week period, scored higher on the TELD, a test for literacy readiness
Rapport - state of understanding and connection, feeling of deepening understanding, must be co-constructed but hard to do this computationally
How to model and generate this kind of rapport that small children naturally build in an embodied conversational agent?
Studies that model conversation & rapport-building over time:
- they found that it's always important to pay attention to friends, but less important to be nice to them if you've known them for some time
- conversational coordination increases over time - learning another person's conversational turn-taking mechanisms, language, body language, etc.
Next iteration in scaffolding an avatar:
- friends vs. strangers - how do we find out how friends get there?
- 4 situations:
- 2 friends face to face vs. with a wall between them
- 2 strangers face to face vs. with a wall between them
look at 3 subsequent interactions:
- proxy for building relationship
- already existing rapport vs. lack of a rapport
- model of other - interests, speech patterns, non-verbal
Variables:
- Dialogue Acts
- Lexical entrainment
- Interrupts and overlaps - friends interrupt a lot more than strangers do
- Silence, lulls
- Eyegaze, headnods, posture
Data Analysis frame by frame - very time consuming but interesting
Models of Conversation
- Grounding - how to ensure they share a common ground?
- When they don't take an utterance up - need to make sure you both understand each other
- rapport - grounding over a lifetime
- How to possibly generate rapport-building with a computer
How to Teach Rapport?
Who needs to learn rapport? Children with Autism High-function Asperger's children - higher IQ, verbal, but no ability to recognize or generate rapport no imagination, no capacity for reciprocal interaction Can a virtual child model the other mind?
Build a system to engage these children in turn-taking and encourage them to tell stories and teach relational behavior
Book: From Barbie to Mortal Kombat one idea is the "theory of underdetrmined design" - for marginalized populations, need to be careful not stereotyping i.e, let's build girls a technology that's about "what girls do" (assuming it's things like like playing with Barbie) need to overturn that model of buiding for a stereotype, beyond even participatory design but children need to implement it themselves
Eg. A high-functioning autistic girl who doesn't do reciprocal interaction with anybody but does with Sam
Gave her a virtual child in which she could control the interaction herself: girl controls touchpad and can change story and other things like personality while watching video of Sam telling story to an adult researcher in the next room using Sam, while the girl controls the WOZ panel herself - she mastered the knack of it
Script Queue interface - WOZ panel - choice of stoies, collaborative utterances
One of Justine's courses on Discourse
If interested consider a Ph.D. in Technology and Social Behavior or contact for questions:
justine [at] northwestern [dot] edu
Discussion:
Modeling animal behavior?
Bruce Blumberg at MIT Media Lab does work on people interacting with animals
What about different cultures and differences even within culture?
Architecture separates behavior from function - everybody coordinates with somebody else and has social lubricants - but graphics engine has a culture model you can put in.
Deborah Tannen's dissertation about a Thanksgiving Dinner party with NY Jews thrown together with Californians of Scandanavian heritage:\\ Jewish culture: talking at the same time = shows interest Scandanavian-Californians: listening quietly then nodding carefully for 9-10 seconds to show interest Their inferences are different - Architectures need to be careful NOT to hardcode behavior
turn-taking, grounding - cannot do same thing for different cultures
Good online avatars?
Second Life - good personalizability of avatars but not conversationally. Anyone know of any good voice with good graphics?
"I have 2 children who have some Aspberger's tendencies, can they use this?"
She has gotten comments like "oh so you've built a great tool for MIT students..." She is currently planning to collaborate with a Rush Medical Center research group to have a social reciprocal interaction training tool into schools across the U.S.
Any modules for reflecting ppl's behavior back to them?
Sony - system with nonverbal mimicry - grunts when you grunt - ppl seem to like it studies on using eyegaze (via eyetracker) as an input to the conversation - when can you go on b/c you know they understand or when can you tell they need an elaboration