On Emotionally Intelligent AI (with Chris Gagne, Hume AI)
Hey  there  today,  I'm  having  on  Chris  Gagné  in  AI  researcher  and  friend  Chris  manages  AI  research  at  Hume  Which  just  released  an  expressive  text -to -speech  model  in  a  super  impressive  demo  They  also  just  announced  50  million  dollars  in  their  latest  round  of  funding.
 So  that's  pretty  cool  Hume  is  kind  of  the  only  company  out  there  focused  on  AI  for  emotional  understanding,  which  is  pretty  cool  Chris  did  his  PhD  in  cognitive  neuroscience  at  UC  Berkeley  and  postdoctoral  research  at  the  Max  Plink  Institute  for  Biological  Cybernetics.
 Doesn't  that  sound  cool?  I  want  to  talk  to  Chris  about  AI  and  emotions.  I  want  to  hear  from  him  about  the  implications  of  AI  understanding  emotion.  What's  cool  about  it?  What's  scary?  What  are  the  risks  and  opportunities?
 And  I'm  going  to  really  press  him  on  whether  he  thinks  that  AI  can  really  understand  emotion  and  whether  that's  a  good  thing.  Chris  wants  me  to  say  that  all  he  uses  going  to  share  are  his  own,  not  that  of  his  employer,  which  is  good  because  it  means  that  he  can  be  real  with  us.
 All  right.  Let's  dive  in.  Chris,  let's  start  with  the  easy  question.  Can  AI  understand  human  emotions?  I  think  it's  getting  there.  I  think  LLMs  already  have  a  decent  understanding  of  human  emotions.
 I  think  if  you  ask  an  LLM,  for  instance,  to  read  your  writing  and  describe  how  someone  might  emotionally  react  to  that,  I  think  it  actually  can  do  a  pretty  good  job,  especially  GPT -4  right  now.
 It  can  do  a  good  job  at  kind  of  guessing  what  sort  of  emotions  people  might  experience  with  this  writing.  And  I  think  as  they  become  more  multimodal,  this  understanding  is  going  to  grow  and  it's  going  to  extend  beyond  this  purely  linguistic  understanding.
 But  yeah,  I  think  there's  some  degree  of  emotional  understanding  right  now.  Okay,  so  obviously  this  is  like  controversial,  right?  A  lot  of  people  might  turn  back  to  you  and  say  that  makes  no  sense.  Emotional  understanding  for  humans  is  generally  something  much  more  like  a  substrate  dependent,
 right?  You  have  like  emotions  in  your  brain,  you're  reflected.  that  you  have  empathy,  you  feel  an  emotion,  and  that's  how  you  understand  it.  AI  presumably  can't  feel  emotions.  So  how  could  it  possibly  understand  emotions?  - Yeah,
 I  think  this  is  a  good  distinction  that  we'll  probably  keep  for  the  full  conversation,  but  I  think  there's  sort  of  like  a  linguistic  and  cognitive  understanding  of  emotions,  sort  of  how  can  you  describe  the  different  emotions?
 Can  you  describe  the  expressions  that  are  associated  with  them?  Can  you  give  a  sense  of  what  might  have  caused  those,  what  situations  in  the  past,  what  situations...  immediately  would  have  caused  those  emotions,  what  might  lead  to  the  resolution  of  those  emotions,
 all  this  sort  of  like  high  level  linguistic  understanding.  And  I  think  that's  quite  separate  from  sort  of  like  vicariously  experiencing  a  particular  emotion,  the  feelings  that  you  might  have.  I  think  AI  can  very  much  do  and  will  be  able  to  do  the  first  one.
 And  the  experiencing  of  the  emotion  in  the  same  way  that  we  experience  it,  I  think  is  something  that,  you  know,  we'll  see  as  AI  develops  more,  you  know,  awareness  or  consciousness  or  whatever  we  want  to.  call  it.  But  at  the  current  moment,  I  think  it's  very  far  from  that  sort  of  emotional  understanding.
 So  I,  you  know,  I'm  on  the  side  of,  I  think  there's  a  good  amount  of  emotional  understanding  they  can  have  without  this  sort  of  feeling,  just,  you  know,  the  way  that  you  might  sort  of  write  your  own  experiences  down  and  you  might  read  it  later  and  say,  oh,
 that  was  a  good  emotional  understanding  of  the  situation.  I  mean,  I  guess  there's  something  weird  here  then  to  analyze  about  humans  that  we  empathize  at  all  towards  understanding,  right?  Like  I  could  see  you  crying  and  intellectually.
 understand  what  might  be  causing  it.  I  mean,  that's  not  really  how  our  brains  work,  right?  Like,  is  it  possible  that  it's  actually  more  efficient  to  experience  empathy,  like  to  actually  have  some  essentially  like  simulation  of  the  emotion?
 Yeah,  I  think  it  might  be.  And  I  also  think  that  that's  human,  at  least  from  like  going  from  childhood  to  adulthood,  I  think  we  learn  empathy  in  that  direction  where  we  very  much  automatically  can  experience  the  emotions  that  other  people  are  experiencing  at  that  same  time.
 But  then  later  we  learn  to  like  attribute  the  right  words  to  that.  and  describe  the  situation  in  more  detail,  verbalize  it,  sort  of  detach  ourselves  a  little  bit  from  maybe  the  feeling  of  emotion.  But  I  think  therapists  do  this  really  well.
 They've  detached  themselves  in  some  situations  from  fully  experiencing  what  their  patients  are  feeling,  and  yet  they're  able  to  verbalize  this  and  describe  it.  And  I  think  AI  is  going  the  opposite  direction,  where  it's  starting  with  this  more  verbal  understanding  and  then  maybe  we'll  see  in  the  far  future  whether  they  can  have  anything  sort  of  like  the  feelings  that  we  experience.
 I  mean,  why  would  we  possibly  want  it  to  have  these  experiences  that  like  emotional  experience?  It  seems  like  I  would  imagine  we'd  only  want  that  if  it  were  like  necessary  Like  if  we  found  it  really  hard  to  get  AI  to  understand  emotion  without  feeling  basically.
 Yeah,  I  agree  But  if  anything  you  could  say  the  same  about  humans,  right?  Like  I  think  this  point  is  made  in  like  some  books  on  like  the  evolutionary  psychology  side  of  like  Why  would  we  want  humans  to  experience  anything  like  wouldn't  it  be  great  if  humans  could  just  understand  even  your  own  emotion?
 Like  doesn't  kind  of  suck  that  when  you  get  angry  you  you  get  angry?  Like,  wouldn't  it  be  awesome  evolutionarily  speaking  if  humans  could  just  intellectually  understand  that  they  should  feel  angry  and  then  act  rationally  and  accord  us  with  that?
 Like,  do  you  have  any  thoughts  on  like,  why  the  hell  do  humans  actually  experience  emotion?  - I  don't  have  any  thoughts  on  that.  I  mean,  other  than,  I  think  that's  just  the  way  that  I  think  those  signals  were  more  primal  and  sort  of  what  came  first  in  evolution.  And  then  we  sort  of  learned  to,
 you  know,  turn  those  into  more  verbal  thoughts  later.  - And  yes,  somehow  we're  expecting  that  the  opposite  is  true  with  AI.  That  it's--  it's  more  efficient  for  an  AI  to  understand  intellectual  emotion  than  to  do  something  analogous  to  experiencing  it.
 - Yeah,  maybe  not  more  efficient,  but  certainly  what  it  has  access  to  right  now,  and  that  the  way  that  AI  path  is  developing,  in  that  it's  starting  with  language,  and  then  going  back  towards  the  core  modality.  - So  in  terms  of  like,
 I  just  wanna  ask,  your  research  is,  if  I  understand  correctly,  not  on  emotion  technically,  right,  it's  on  prosody?  - Yeah,  it's  on  giving  text  speech  models  the  ability  to  sound  more.  more  human,  sound  more  expressive,
 sound  more  emotive.  That's  part  of  it.  So  they're  also  adding  that  information  to  language  models  so  that  they  can  take  advantage  of  some  of  these  expressive  signals  while  they're  interacting  with  humans.  So  is  there  a  difference?
 A  prosody  is  a  big  word  that  I  had  never  heard  of  until  you  explained  this  to  me,  but  can  you  just  explain  what  is  prosody?  Is  there  actually  a  meaningful  distinction  between  that  and  emotion?  Should  we  care?  Yeah.  I  think  the  prosody  is  all  the  external  signals,
 much  more  closely  related.  to  the  acoustic  qualities  of  the  sound,  that  allow  you  to  refer  the  emotions.  So  I  think  there's  a  lot  of  expressive  signals  that  we  give  off  in  our  faces  and  our  voices.  And  those  are  easily  observable  for  most  people.
 And  then  we  use  those  to  infer  the  emotional  state  that  the  person  is  experiencing.  And  so  I  think  at  least  allowing  machines  to  read  at  least  these  external  signals  that  we  give  off  to  one  another  as  part  of  the  conversation  that  conveys  so  much  more  information  than  if  we're  texting  or  something  like  that  would  allow  to  visit.
 the  communication.  Okay,  so  it's  like  giving  a  picture,  when  a  person  gives  a  picture  into  their  emotional  state  that  they  essentially  choose  to  share  or  subconsciously  choose  to  share.  But  generally,  I  show  you  I'm  happy  by  smiling  and  I  show  you  I'm  frustrated  by  maybe  making  certain  sounds  or  changing  the  speed  with  which  I  talk,
 things  like  that.  Yeah,  exactly.  Yeah,  so  to  go  back  to  Prasad  to  answer  your  question  more  fully,  it  has  things  like  the  speed  at  which  you  talk,  your  imagination,  do  you,  you  know,  for  questions  we  might  rise,  the  pitch  might  rise  at  the  end  to  signify  a  question,
 even  if  you  don't  have,  you  know,  which  is  might  be  hard  otherwise  to  pick  up  when  someone's  speaking  and  so  on.  So  my  imagination  is  that  like  there  are  two  ways  that  you  would  potentially  try  to  understand  emotion,  right,
 whether  you're  human  or  an  AI,  there's  kind  of  like  the  more  natural  way  which  you  know  might  correspond  more  with  like  annotation,  like  I  say  that  sounds  angry,  but  then  there's  kind  of  like  I  know  you're  angry  or  I  know  you're  getting  angry  because  like  minutes  later  you  say  I'm  angry  or  you  start  yelling  at  me  or  you  do  something  essentially  like  it  becomes  far  more  obvious  in  the  future  that  you're  angry.
 So  is  there  a  difference  between  like,  you  know,  the  approaches  in  emotion  of  kind  of  like  trying  to  understand  the  implicit  signals  that  are  like  your  behavior  corresponds  with  anger  and  therefore  you  must  be  angry  versus  something  like  you  sound  angry.
 Is  there  any  distinction  between  it  there?  What  I  mean  to  say  is  sort  of  like  if  I'm  right  now  talking  in  a  very  monotone  voice  and  then  like  a  minute  later,  I  go,  you  know,  Chris,  I  hate  you.  And  I  think,  you  know,  the  way  that  you've  been  talking  is  horrible.
 And  generally  you're  just  a  horrible  person,  right?  Like,  I  don't  sound  angry  in  terms  of  my  voice.  Obviously  the  content  of  what  I'm  saying  sounds  very  angry.  And  you  could  probably  infer  that  I  am  angry,  but  I  definitely  don't  sound  angry  when  from  like  a  prosely  perspective,
 like  that's  kind  of  the  distinction  I'm  trying  to  go  for.  Can  you  think  about  like  the  phenomena  separate  from  like  tone  and  stuff?  Yeah,  I  think  so.  I  mean,  I  think  a  lot  of  what  we're  also  trying  to  do  is  separate  those  two.  two  components  of  signals  and  then  bring  them  back  together  so  that  the  language  model  or  the  ultimate  agent  can  choose  to  listen  to  one  or  the  other.
 And  I  think  a  lot,  I  mean,  Sarcasm  is  famously  like  saying  something  in  a  different  tone  than  we're  actually  saying  it.  So  I  think,  you  know,  separating  the  signals  and  having  them  stream  into  another  system  is  important.  And  I  think  humans  do  this  all  the  time.
 We  can  pick  up  on,  you  know,  you  saying  something,  potentially  in  an  angry  voice,  even  if  you're  talking  about,  you  know,  something  totally  troll  or  vice  versa.  And  it  might  mean  something  to  you.  different  when  you're,  it's  an  interesting  scenario  of  talking  about  something  very  angrily,
 but  not  sounding  angry.  And  that  conveys  something  very  differently  than  if  you  just  said  voice.  So  sarcasm  captures  that  like  contradiction  when  I  say  something  in  deliberately  the  wrong  tone.  Yeah,  that's  one  aspect  of  it.
 I  mean,  I  think,  you  know,  saying  something  very  angry  in  a  neutral  tone  can  be  something  very  interesting.  So  depending  on  the  context,  I  think.  So  it  could  actually  just  be  something  deeper,  like  it's  not  even  just  like  anger  or  lack  thereof.
 It's  like  this  third  thing,  which  is  like,  you  were  speaking  angry  in  a  neutral  tone.  That  tells  me  something  on  its  own.  Yeah.  Fascinating.  I  think  part  of  this  then  like  goes  into,  I  don't  want  to  go  too  far  on  the  sci -fi  stuff,  but  there  is  like  the  emotional  manipulation  thing,
 which  is  like,  if  we  think  of  emotions  as  trajectories,  if  we  think  of  like,  a  word  doesn't  have  an  emotion,  a  sound  doesn't  have  an  emotion.  It  might  emit  something  else  like  tones,  but  really  the  emotion  is  me.
 It's  a  human.  It  can't  change  that  quickly.  Which  means  that  really,  if  you're  like,  modeling  like,  "Oh,  that  sounds  like  you're  angry,"  you  might  be  modeling  you're  like  pre -  angry,  right?  You  might  be  able  to  like  predict  and  understand  pre -emotions  essentially.
 Once  we  start  talking  about  that,  it  sounds  like  AI  could  potentially  emotionally  manipulate.  Like,  what  are  your  ethical  worries  here?  Yeah,  we're  certainly  worried  about  that.  I  think  there  is,  this  doesn't  go  that  far  beyond,
 I  don't  think,  the  just  language  models  and  I  just  having  the  ability  to  manipulate  people  in  general,  once  I  think  in  general.  just  ties  in  with  the  broader  alignment  research  of  choosing,  making  sure  that  language  models  agents  have  the  right  objective.
 There's  a  discussion  we  might  want  to  have  about  like  open  source  versus,  you  know,  proprietary  language  models  to  keep  sort  of  the  gates  on  these  things.  And  then  I  do  think  as  long  as,  you  know,  so  there  is  this  risk  of  worrying  about  giving  language  models  additional  tools  that  they  could  use  to  kind  of  steer  the  conversation  in  a  way  that's  misaligned  with  the  objectives  that  you  want  them  to  do.
 But  I  think  most  of  this  has  to  do  with  choosing  the  right  objectives  for  the  language  models.  and  making  sure  that  they  are,  you  know,  as  much  as  possible  following  those  objectives  and  also  monitoring,  you  know,  monitoring  their  abilities  to  support  those  objectives  and  use  things  like  emotional  signals  to  maybe  steer  the  interaction  in  one.
 - So  there  is  the  open  source,  closed  source  side.  So  from  there,  you're  just  kind  of  thinking  through  like,  are  there  risks  that  we  just  want  to  control  in  terms  of  how  people  choose  to  use  these  models?  - Yeah,
 I  think  there's  a  lot  of  use  cases  that  would  be  beneficial  for  society.  I  think.  I  think,  you  know,  in  your  case  of  AI  therapists  and  other  sort  of  like  customer  interactions,  I  think  those  are,  if  done  in  the  right  way,  using  emotion  and  expressive  signals  to  sort  of  enhance  the  quality  of  those  interactions.
 I  think  it's  useful.  Obviously,  any  sort  of  applications  where  there's  this  risk  of  deception  or  manipulation  would  be  quite  careful  in  how  we  allow  these  use  cases.  So  I  think...  What  about  the  other  way?  Like,  how  do  you  feel  about  humans  deceiving  AI?
 Like,  is  it  going  to  get  harder  with  this  kind  of...  stuff?  I  think  so.  Do  you  have  a  particular  use  case  in  mind  where  we  would  want  to  deceive  AI?  I  mean,  hopefully.  So  last  night  I  was  reading  Ready  Player  2  the  sequel  to  Ready  Player  1  and  in  the  book  There's  a  chapter  where  they're  talking  through  like  the  main  character  goes  to  an  AI  therapist  They  make  a  really  big  thing  of  like  he's  the  therapist  from  Good
 Will  Hunting  played  by  Robin  Williams.  Anyway,  it's  very  funny  But  one  of  the  things  he  says  is  like  the  therapist  asks  him  something  like,  you  know  How  have  you  been  doing  with  whatever?  And  he  doesn't  want  to  talk  about  it  it,  so  he  says,  "I'm  doing  totally  fine."  And  then  he  writes  in  the  book,
 like,  his  narrative,  like,  "I  was  obviously  lying."  And  there's  something  really  interesting  about,  like,  I  was  thinking  about  this  because  it  was  literally  an  AI  therapist  and  the  whole  thing.  The  author,  I  guess,  hadn't  considered  the  idea  that  the  AI  would  know  he's  lying,
 right?  But  humans  must  lie  to  their  therapists  all  the  time,  right?  Like,  there  should  be,  there's  sort  of,  like,  some  control  that  we  all  get  from  the  ability  to  lie.  And  I  do  wonder  if  it  would  have,  like,  really  fundamental  ramifications  if  we  no  longer  could  lie.
 lie  Because  those  who  we're  interacting  with  would  actually  know  like  do  we  almost  like  lose  some  agency  from  telling  the  truth  if  we  can't  lie  You  know,  yeah,  yeah,  I  think  that's  fascinating  I  think  we'll  have  to  have  to  definitely  sort  of  monitor  that  ability  with  of  language  models  of  these  multiple  language  models  to  see  The  degree  to  which  we  still  can  get  away  with  these  with  these  white  lies  I  think  it'll
 depend  on  the  application  whether  we  want  that  or  not  So  I  wonder  if  we  can  in  some  sense  I  mean,  I'm  imagining  we'll  have,  you  know,  AGI  aside  will  have  have  different  forms  of  language  models,  multi -model  language  models  depending  on  the  applications,
 some  of  which  it  might  be  very  beneficial  to  make  them  tuned  into  the,  you  know,  expression,  the  facial  expressions  and  then  voice  prosody  and  some  of  which  we  may  very  much  not  want  them  to  be  tuned  into  this,  especially  if  they're  sort  of  an  intermediary  to  another  person  that  we  might  want  to  keep  some  sort  of  distance  to,
 you  know,  maybe  in  a  negotiation  or  something  like  that.  We  may  not  want  them  to  have  an  AI  that's  reading  our  every  sort  of  signal.  I  think  that  would  be.  never.  - Yeah,  that  makes  a  lot  of  sense.  Like  there  might  be  just  use  case  by  use  case,
 we  want  to  control,  go  out  of  our  way  to  say,  I  mean,  like  what  do  you  think  of  the  school  or  work  examples?  Like  what  if  my  boss  could  tell  from  board  in  a  meeting?  - Yeah,  I  think  these  are  already  cases  that  have  been  sort  of  flagged  by  the  AI  Act  in  the  EU  for  being  things  we  have  to  walk  and  watch  out  for.
 So  I  think  those  are  definitely  things  we  wanna.  - Yeah,  you  can  kind  of  see  it  both  ways,  right?  Like  there's,  it's  interesting  'cause  like  the  narrative  is  usually  the  opposite.  It's  like  how  come  the  AI  can't  tell  my  tone,  you  know  I  told  my  Google  like  I've  heard  people  say  that  they're  Google  home  things  like  yeah  things  a  lot  and  They're  like  you're  welcome,
 and  you're  like  oh  you  really  should  have  heard  that.  I  was  really  obviously  being  sarcastic  Yeah,  so  I  guess  maybe  the  prosody  dynamic  is  something  along  the  lines  of  like  what  we  want  in  the  first  instance  is  if  I'm  trying  to  Communicate  something  with  my  voice.
 It's  almost  like  it  can  be  converted  to  words,  which  is  like  I  am  being  sarcastic  You  want  to  communicate  that?  You  are  clearly  intentionally  communicating  it.  It's  almost  like  you're  giving  a  voice  command  of  like  I'm  being  sarcastic  and  you  want  to  make  sure  the  AI  picks  it  up  and  that's  pretty  separate  from  perhaps  the  like  I  hear  from  the  tone  of  your  voice  that  your  beginnings  get  frustrated  and  in  about  five
 minutes  you're  going  to  start  yelling  at  me.  Yeah,  I  think  it's  very  true.  I  was  like  there's  other  things  where  they  might  not  be  as  describable  or  like  you  just  maybe  start  talking  more  quickly  or  you  sort  of,  I  don't  know,  you  get  a  little  bit  more  bored  tone  in  your  voice  and  the  AI,
 let's  see  what  you  do  like  like  an  interactive  podcast  with  the  AI,  where  you're  like  asking  it  to  describe  the  news  to  you  or  something.  And  you  clearly,  you  know,  you've  already  heard  this  story  or  something  like  that.  Yeah,  it  could  potentially  pick  up  on  the  fact  that  you've,
 you  know,  you're  ready  for  the  next  topic,  just  based  on  the  way  you're  interacting  with  it.  Yeah,  I  mean,  I  have  to,  I  got  to  say,  though,  I  personally  do  worry  a  little  bit  about  the  agency  side  of  like,  I  almost  do  want  to,  you  know,  I  want  to  be  able  to  say  like,
 you  want  to  have  like  the  agency  when  you  interact  with  AI  to  some  extent,  right?  Like  there  is  that  level  of  like,  what  agency  has  lost,  but  I  guess  there's  still  that  level  of  like  just  turn  it  on  or  off  depending  on  whether  or  not  you  want  it.  Like  it  doesn't  have  to  be  on  all  the  time  just  because  it  exists.
 Yeah,  I  was  going  to  say  I  don't  think  it's  going  to  be  interesting  as  a  broader  sort  of  societal  question  about  LLMs  is  who  sets  the  objective  functions.  And  it  could  be,  you  know,  application  specific.  But  I  think  in  a  lot  of  cases,  you  know,  we're  of  the  mind  of  like  a  lot  of  these  applications,
 at  least  a  lot  of  applications  that  we're  interested  in  pursuing  are,  you  know,  aimed  at  improving  human  well -being  in  the  long  run.  which  is  sort  of  a  lot,  but  in  terms  of  their  emotional  well -being,  things  that  you  would  describe  like,
 "Here  are  the  states  I  want  to  be  in  in  my  life.  These  I  want  to  experience  love,  joy,  and  happiness,"  and  all  those  things,  and  here  are  the  things  that  undermine  that.  If  the  AI  is  in  general  trying  to  nudge  you  towards  those  states  that  you  in  sort  of  a  reflective  state  would  want  to  be  in,
 then  I  think  those  are  sort  of  objectives  that  would  be  good  to  have.  But  it's  an  interesting  question  of  like,  "Do  we  want  more  specific  applications  like  your  AI  that's  reading  you  a  podcast  or  something?  Do  we  want  to  see  better  applications?"  for  anything?  Do  we  want  it  like,  can  we  set  its  objective  function  so  that  it's  almost  just  more  neutral?
 We  just  want,  you  know,  like  it  to  not  pick  up  on  these  tones  of  voice  and  like  really  try  to  optimize  what  we're  listening  for  or  like,  you  know  what  I'm  saying?  Yeah.  I  mean,  this  is  fascinating  from  like  an  RL  perspective.  It's  almost  like,
 what  if  every  application  had  some  slight  objective  of  like,  make  me  happy  more  and  sad  less?  And  then  like,  instead  of  having  to  pick  up  very  explicit  signals,  you  are  just,  you  know,  the  back  of  your  mind,  you're  like,  if  there's  something  that  I  could  do  that's  really  small,
 that'll  make  the  user  smile.  Maybe  it's  a  good  idea.  If  it  makes  them  frown,  maybe  do  it  less.  On  the  flip  side,  there's  obviously  this  is  a  terrifying  scenario  too,  right?  Yeah.  Well,  you  don't  want  to  optimize  for  that  in  a  micro  level  and  make  them  smile.
 One  interesting  thing  I  had  read  recently  was  there  was  a  study  where  a  population  was  asked  very  simple.  It  was  like  mental  health  studies.  They  just  asked  a  group  of  people  like,  how  do  you  feel  once  a  day?  I  think  it  was,  I  believe  it  was  like  once  a  day  for  a  few  weeks,
 it  was  like,  how  do  you  feel?  And  it  was  literally  just  like  happy,  sad,  you  know,  whatever,  some  set  of  things  that  they  labeled.  And  in  the  control  group,  they  just  didn't  ask  this.  And  that  was  like  the  whole  thing.  That  was  the  entire  study.  And  what  they  found  is  that  on  average,
 after  a  period  of  time,  just  asking  people  daily,  how  do  you  feel?  The  population  asked  daily  felt  worse.  And  so  the  theory,  we  don't  know  exactly  why  this  was,  was,  but  like  the  theory  is  like,  when  you're  asked  at  random  points  in  the  day  daily,
 how  do  you  you  feel  and  you  reflect  you're  probably  not  happy  because  most  people  aren't  happy  most  of  the  time  not  that  you're  sad  Yeah,  you're  just  engaged  right  like  or  yeah  engaged  or  like  flow.  I  don't  know  about  you  Like  I  code  a  lot  when  I'm  coding  Sometimes  I'm  happy  a  lot  of  the  time.
 I'm  just  like  shut  up.  Someone  asked  me  how  are  you  feeling?  I'd  be  like  shut  up.  I'm  working,  you  know  And  I  wonder  like  is  there  this  actually  the  scary  side  of  like  we  want  to  optimize  now  for  happiness  or  something  like  four  emotions  Whereas  being  engaged  is  not  really  an  emotion.
 It's  sort  of  an  absence  of  emotion  to  some  extent.  You  know,  I'm  focusing  on  someone  else's  well -being.  I'm  trying  to  help  someone  else.  I'm  doing  charity  work.  Like  if  we  are  focused  on  emotion,  is  there  the  chance  that  like  our  objective  function  is  now  like  skewed  in  a  bad  way?
 Yeah.  I  mean,  I  think  we  have  to  probably  have  a,  you  know,  a  deep  thought  about  how  to  choose  that  objective  function  at  the  right  time  scale.  Because  I  do  think  that  they're  like,  you  do  want  it  more  the  months,  the  years  sort  of  timescale  of  optimizing  for  emotional  well -being.
 And  that  can  look  like,  you  know,  certain  amount  of  flow  states  that  may  not  look  like  happiness,  but  that  lead  to  sort  of  this  self -reported  satisfaction  of  life  later  on.  And  then  I  think  what's  really  interesting  about  that  as  you  as  you're  talking  about  this  is  I  think  in  the  therapy  domain,
 I  think  we've  long  wanted  the  ability  to  sort  of  like  have  these  non -invasive  readouts  of  people's  emotional  states  throughout  the  course  of  the  day  for  like  long  periods  of  time.  If  you're  suffering  from  depression  or  anxiety  or  PTSD,
 it's,  you  know,  if  you  just  go  to  the  therapist  once  a  week,  describe  how  you're  feeling,  it's  not  a  very  good  snapshot  into  your  life.  And  I've  talked  to  therapists  who  would  have  loved  to  have  these  sort  of  non -invasive  abilities  to  like  with  the  person's  permission,
 obviously,  to  get  a  sense  of  how  their  emotions  fluctuate  throughout  the  day.  And  that  gives  the  therapist  a  bigger,  better  picture  and  understanding  of,  you  know,  what  the  person's  going  through  and  how  to  sort  of  nudge  that  in  the  direction  that  they  want  to  go  in.  I  mean,
 I  totally  see  that  like,  if  even  asking  the  question,  how  do  you  feel  is  already  invasive,  then  you're  right,  like  it  seems  way  less  invasive  too.  listen  to  the  tone  of  your  voice  or  something,  right?
 Like  it's  almost  like  the  less  it's  exposed  to  the  end  user,  the  better  in  some  ways.  It's  like  sort  of  the,  one  of  the  comments  I  heard  on,  so  just  talking  about  Hume's  demo,  I  guess  like  you  guys  worked  on  this,  you  built  an  end -to -end  LLM  speech -to -speech  use  case,
 right?  Where  the  AI  actually  understands  your  prosody  from  your  voice  and  then  responds  appropriately.  So  hopefully  when  you  speak  in  a  happy  voice,  it  speaks  in  a  happy  voice,  right?  Am  I  getting  this  right?
 Depending  on  the  context,  yeah.  yeah,  or  if  you're  frustrated,  it  now  tries  to  pick  up  on  that  and  do  the  conversation  ways  of  that  frustration.  Which  is  really  cool.  I  think  one  of  the  big  pieces  of  feedback  you're  talking  about  before  that  I  got  from  someone  on  Zoom  was  like,
 I  noticed,  like  in  a  non -invasive  way  on  the  side  of  the  screen,  what  my  emotional  state  was,  or  I  noticed  what  I  was  communicating  with  my  voice.  And  yeah,  there  is  something  already  interesting  just  about  like,  you  weren't  really  expecting  an  AIDA  to  hear  these  things.
 Now,  suddenly  you  get  to  actually  see  it.  Like,  you  have  to  imagine  it.  That  this  sort  of  trend  It's  not  gonna  end  with  Hume  and  it's  probably  gonna  change  actually  how  people  do  interact  with  computers  more  generally  Yeah,
 I  think  people  will  learn  to  pick  up  on  the  fact  that  well  one  I  think  it's  nice  for  people  to  be  aware  that  they're  you  know,  they're  tone  of  voice  for  instance  does  carry  all  this  information  So  I  think  even  just  raising  that  awareness  is  useful  But  then  I  do  think  people  will  you  know  learn  to  interact  with  AI  in  a  different  way  I  think  we've  learned  to  interact  with  Syria  in  a  particular  way  and  we've  learned
 to  interact  obviously  with  Google  and  our  You  know,  just  even  the  search  bar  and  typing  out  a  computer  And  I  think  we're  of  the  mind,  or  I'm  of  the  mind,  that  we  probably  want  to  interact  with,  in  a  lot  of  applications,  we  probably  want  to  interact  with  language  models  and  AI  in  a  way  that  we  interact  with  humans.
 I  mean,  potentially,  the  alternative  is  to  learn  a  new  form  of  interaction  that  doesn't  do  this.  But  I  think  that  it'd  be  more  efficient,  and  it  will  leverage  so  much  of  what  we  already  communicate  if  we  tailor  this  as  a  sort  of  human -to -human  conversation,
 as  long  as  we  sort  of  are  aware  that  this  is  we're  interacting  with  a  language  model  or  not.  not,  you  know,  not  something.  Well,  I  want  to  push  back  here.  It's  really  funny.  One  thing  when  I  use  chat  GBT,  I'm  never  nice  to  chat  GBT  personally  because  I  tend  to  think  that  the  model  performs  worse  when  I  am.
 Right.  So  if  I'm  very  like,  please,  if  you  don't  mind,  could  you  then  it  responds  in  the  same  way  as  it  would  respond  to  someone  who's  hesitant  like  that.  And  if  I  say  something  like,  you  know,  do  this,  it  does,  you  know,
 what's  interesting  by  contrast  is  like  when  my  mom  interacts  with  chat  GBT,  she  does  all  the.  please,  you  know,  you  don't  know  how  often  she'll  like  end  the  conversation  with  thank  you.  And  I'm  like,  you're  not  even  like,  that's  the  whole  message.
 She  would  just  say  like,  thank  you.  And  like,  of  course,  it's  not  doing  anything.  She's  obviously  personified  the  AI  in  her  mind.  One  ramification  of  this  is  that  I  think  she'll  get  lower  quality  responses.  Like  obviously  we  could  tune  the  LLM  accordingly,
 but  she  is  personifying,  imagining  it  to  be  a  human.  I  mean,  aren't  there  a  lot  of  levels  to  dive  into  this,  but  I  think  this  is  actually  the  crux  of  the  discussion.  Like  we're  changing.  how  people  interact  with  computers.  We're  making  it  more  human.
 We're  making  it  closer.  We're  moving  from  like  programming  and  clicking  a  mouse  and  a  button  to  like  interact  like  you  interact  with  the  human.  There  are  a  lot  of  benefits.  It's  easier.  There  are  a  lot  of  downsides,  right?  And  I  think  I  do  worry  about  the  idea  of  people  interacting  with  computers  the  way  they  interact  with  humans,
 personifying,  especially  when  computers  don't  have  all  the  capabilities  humans  do.  Yeah.  So  I  think  there's  two,  yeah,  there's  two  parts  of  that.  So  I  think  one  is  that  I  do  really  think  we  need  to  to,  there  needs  to  be  a  distance.  So  like  people  need  to  be  aware  that  they're  interacting  with  an  AI  and  not  another  human.
 And  it  doesn't  share  all  the  same,  you  know,  as  we  talked  about  in  the  beginning,  like,  doesn't  have  this  sort  of  vicarious  feelings  or  experiencing  exactly  what  we're  experiencing.  And  so  I  do  think  that  people  even  awareness  of  this,  that  they're,
 you  know,  they  are  interacting  with  something  else  and  not  a  human.  And  then  there's  this  sort  of  like,  can  we  take  advantage  of  all  the  sort  of  natural  things  we  do  when  interacting  with  each  other,  just  to  speed  up  the  conversation.  and  make  it  more  fluid  and  allow  us  to  sort  of  think  more  naturally  and  communicate  more  naturally,
 being  aware  that  we're  not  communicating  with  another  human  that's  going  to  experience  things  the  same  way  that  we're  experiencing  things.  So  I  think  it's  a  tight  line  to  kind  of  tell,  but  I  think  that  would  be--  I  get  you.
 It's  more  like,  what  can  we  add  into  the  dimensionality  of  the  space?  Like  instead  of  having  a  mouse  where  you  move  it  around  in  two  dimensions  and  click,  what  else  can  we  add  in  additionally  that  we've  already  been  trained  to  do  that  could  just  just  add  to  the  richness  of  the  interaction?
 Can  I  get  that?  If  an  AI  could  hear  I'm  frustrated,  then  I  don't  have  to  tell  it  I'm  frustrated  because  it  can  hear  I'm  frustrated.  On  the  flip  side,  I  think  there  is  the  counter  argument,  which  is  like,  maybe  there  is  a  benefit  to  commands.  Maybe  in  the  long  term,
 you  want  to  have  a  Hume  AI  where  you  have  to  say,  "Hey,  Hume,"  and  you  have  to  say  it  with  every  message  to  remind  you  that  if  you  don't  say  it,  it  won't  hear  you  because  it's  a  computer,  not  a  human.  Yeah,  I  think  a  lot  of  people  are  in  the  process  of  figuring  this  out.
 I  think,  you  know,  one  of  the  things  the  big  language  model  companies  did  was,  you  know,  make  sure  that  it  always  says  I'm  an  AI  language  model.  And,  you  know,  I  have,  you  know,  I  don't  experience  emotions,  these  kinds  of  things.  And  I  think  having  these  sort  of  disclaimers  is  useful  the  right  amount  of  them.
 I  think  we'll  have  to  figure  out  and  whether  we  need  sort  of  additional,  you  know,  interactive  features  that  make  it,  that  constantly  sort  of  remind  the  user  that  it  is  this  language  model  in  the  back  end  and  not  another  human.  I  think  that's  something  that  I  think  we'll  figure  out  as,
 you  know,  the  whole  industry.  - Yeah,  that's  a  good  point.  It's  probably  going  to  be  everywhere.  Can  we  talk  about  uncanny  valley?  So  uncanny  valley,  more  generally  the  idea  of  uncanny  valley,  if  I  remember  correctly,  I  think  it  comes  from  like  the  animation  space,
 which  was  just  kind  of  like  when  we  have  very  simple  animations  like,  you  know,  brick  and  mortar  or  the  Simpsons  or  whatever,  it  doesn't  look  like  you're  trying  to  be  a  person,  but  there's  like  a  scale  where  you  get  more  and  more  photo  realistic  looking  more  and  more  like  humans.
 And  then  there's  sort  of,  and  people.  like  it  more  and  more,  the  more  realistic  you  get,  until  you  hit  this  weird  part  called  uncanny  valley,  where  people  start  disliking  the  animation.  They'd  rather  it  be  less  realistic  because  it  just  feels  wrong.
 It  feels  uncanny.  It  feels  like  it's  almost  there.  It's  almost  real,  but  it's  not  quite  real.  So  I  guess  the  question  is,  do  you  think  there's  an  uncanny  valley  in  the  speech  domain?  I  do  think  there  is,  but  maybe  it  doesn't  seem  as  extreme  as  in  other  domains,
 because  there's  certain,  I  mean,  speech  has  not  been  great  for  a  long  time,  and  yet  we've  been  fine.  having  AI  speech.  And  I  think  we're  sort  of  gotten  to  the  point  now  where  we're,  you  know,  there's  these  issues  of  voice  cloning  where  you  can  get  very,
 very  naturalistic  speech.  And  yet  I  don't  really,  you  can't  really  point  to  a  point  in  between  where  it  sounds  almost  that  natural,  that  people  are  like,  Oh,  I  can't  listen  to  that.  I  don't  want  to  interact  with  that  kind  of  voice.  And  so  I  wonder  if  it's  a  little  bit  of  a  more  shallow  valley  in  the  case  of  speech,
 but  I'm  not  sure  what  your  thoughts  are.  I  think  so.  So  some  of  the,  some  of  the  specific  things  are  like  right  now,  Humes  AI  AI  if  you  start  with  like  chat  GPT  chat  GPT  has  a  voice -to -voice  setting  and  The  way  it  works  is  on  the  screen  It  shows  you  whether  it's  listening  or  not  and  then  when  it  starts  listening  and  then  when  it  stops  listening  and  then  it  stops  It's  like  processing  and  then  it  gives  you  one
 block  response  It  is  a  pretty  realistic  voice  and  that  already  like  reaches  people  in  a  lot  of  ways  But  there's  a  lot  that  the  AI  doesn't  understand  it  doesn't  understand  the  tone  of  your  voice  the  speed  It  doesn't  know  if  you  if  it  mistranscribes  something  it  can't  be  interrupted.
 All  the  naturalistic  things  you  can  do  with  humans,  it  doesn't  know  if  your  voice  sounds  more  masculine  or  feminine  or  anything  at  your  age.  And  then  incrementally,  there's  that  scale  of  throwing  more.
 So  with  Hume,  you  add  one  more  dimension,  which  is  now  we  also  know  if  you're  frustrated.  We  also  know  if  you  sound  down.  But  unless  I'm  mistaken,  Hume's  AI  doesn't  know  the  age  of  my  voice,
 right?  No.  I  mean,  none  of  that  is  explicit  that  we're  reading  out.  out.  - Yeah,  but  it  could  be.  Like  you  guys  could  add  that  tomorrow  if  you  wanted  to.  Who  knows?  There's  probably  a  lot  of  dimensions  you  could  add  in.  - Yeah,  I'm  sure  there  are  a  lot  of  dimensions  that,
 yeah.  - So  one  thing,  I  mean,  I'm  curious  if  you've  even  heard  this  from  feedback  on  the  product,  but  like,  are  there  people  interacting  that  are  like  surprised  that  the  AI  is  quote  unquote  ignoring  things  from  their  voice  because  it  literally  just  doesn't  know  it's  there?
 - Yeah,  I  haven't  seen  any  of  this  feedback  yet.  I  mean,  I'm  certainly  not  the  only  person  looking  at  a  lot  of  this.  feedback.  So  I  think  some  people  on  the  team,  you  know,  we're  probably  aware  of  this,  but  I  haven't  seen  any  direct  feedback  that's  been  like,  "Hey,  can  you  pick  up  on  these  other  signals  "in  the  voice  and  loop  that  into  the  conversation?"  I  think  there's  already  so  much  that  it's  doing  that  I  think
 people  are,  you  know,  surprised  with  this  amount  already,  so.  - Yeah,  so  some  of  the  things  that  we  see,  I  think  we  were  talking  about,  we  were  talking  about  this  before  we  got  on  today,  but  both  of  us,  I  think,  have  had  the  experience  of  like,
 you  create  a  version  of  the  AI.  you  ship  it,  people  use  it,  they  give  feedback,  then  they  think  you  shipped  a  new  version,  but  you  actually  haven't.  Nothing's  changed.  They  use  it  again,  and  then  they  comment  on  all  the  things  that  you've  changed  in  the  AI.
 Do  you  think  that's  related  to  this  topic?  I  think  a  little  bit.  I  think  people  will  read  into  the  capabilities  of  these  from  just  interacting  with  it.  I  think  part  of  what  we're  trying  to  do  is  also  just  be  clear  as  much  as  we  can  about  what  the  current  state  is  of  the  system  and  what  it  is  and  isn't  capable  of.
 So  for  us,  just  some  examples,  I  don't  know.  if  you've  seen  any  of  these,  but  like  one  thing  that  our  AI  has  done,  very  random.  Like  our  AI  had,  at  some  point,  someone  asked,  like,  could  you  set  a  reminder  on  my  phone  or  something?
 And  yeah,  I  was  like,  yeah,  totally.  What  time  do  you  want  me  to  set  it  for?  For  instance,  like  8  PM.  And  they  were  like,  OK,  cool,  I  just  set  it  for  you.  None  of  this  is  true.  It  has  no  access  to  your  phone.  And  the  feedback  we  got  from  that  conversation  was,
 I  love  that  you  guys  introduced  this  new  feature  where  it  could  set  reminders  on  my  phone.  I  know,  it's  definitely  a  broader  problem  about  how  to  prompt.  language  models  and  how  to  make  sure  that  they  interact  with,  and  have  the  conversation  go  in  a  way  that  where  the  language  model  is  being  as  truthful  as  possible  about  what  its  capabilities  are.
 This  is  just  a  bigger  problem  with  using  language  models  in  the  back  end  of  any  application.  So  it's  a  really  interesting  problem,  I  think  for  all  of  us.  >>  So  that  one's  really  explicit.  Another  one  we've  had  was  because  of  the  topic  I  brought  up,
 the  AI  started  speaking  really  quickly  all  of  a  sudden.  >>  Yeah.  Yeah,  that's  fascinating  I  mean,  this  is  something  that  we're  you  know  part  of  the  reason  why  we're  using  explicit  signals  of  different  emotions  is  to  gain  a  Little  bit  more  control  over  how  the  AI  responds  speech  rate  is  something  I'm  you  know  personally  looking  into  and  it  is  interesting  that  it  will  because  these  are  all  you  know  They're  models  that  are
 built  similarly  to  language  models  They  you  know  they  have  some  flexibility  in  the  way  they're  going  to  respond  and  there's  you  don't  always  have  control  over  that  And  so  things  like  speeding  up  for  an  exciting  topic  is  sometimes  a  desirable  feature,
 but  other  times  it'll  be  the  case  where,  you  know,  you  don't  want  it  to  speed  up,  but  that's  going  to  convey  the  wrong  signal  to  the  user.  And  I  think,  you  know,  using,  you  know,  larger  language  models  sort  of  drive  those  characteristics  of  the  speech  is  ultimately  what  we're,
 you  know,  trying  to  do.  But  it's  quite  funny  and  interesting  when  it  does  this  sort  of,  you  know,  something  that  a  human  would  never  do  in  this  case,  yet  it's  like  trying  to,  you  know,  change  the  speed  of  its  voice  to,  to  match  this  situation.  Yeah,
 we're,  but  in  this  case,  I  mean,  like,  I'm  imagine  pretty  often  your  AI  is  going  to  change  speed  for  no  particular  reason.  But  like,  if  you  haven't  implemented  it,  for  example,  like  we  hadn't  implemented  it,  I  don't  know  if  you've  implemented  the  speed  thing.
 Was  that  there?  Like,  can  it  change  its  speed?  It's  partially  implemented.  So  I  just  mean  there  could  be  like,  you  know,  oh,  I  said  something  embarrassing,  so  it  started  speaking  quickly.  And  you're  like,  no,  it  didn't  speak  quickly  because  you  said  something  embarrassing,
 it  can't  detect  embarrassment,  and  it  can't  control  its  speed.  I'm  making  this  up.  I  don't  know  if  you  can  check  this  too.  But  people.  experience  it  anyway.  And  then  some  of  the  frustration.  So  another  one  for  you  guys,  right?  Like  occasionally  the  voice  is  just  inconsistent.
 Yeah.  Engine  interruptions  are  another  one.  By  the  way,  like,  do  you  allow  for  interruptions?  Like  user  interrupt  Ash,  uh,  sorry,  user  interrupt  the  AI,  AI  interrupt  the  user.  Yeah.  We  do  allow  that  right  now.  It's  something  that  we're  working  on.
 A  lot  of  different  groups  are  working  on  this.  It's  surprisingly  difficult  problem.  So  I  don't  really  focus  on  this.  There's,  you  know,  we  have  great  engineers  that  work  on  this,  but  it  is,  you  know,  surprisingly  difficult.  to  know  when  to  interrupt.  You  probably  don't  know  when  to  interrupt  me  right  now  based  on,
 you  know,  when  I'm  going  to  stop  speaking.  So.  And  then  on  the  flip  side,  when  the  user  interrupts  the  AI,  you  know,  how  do  you  train  a  language  model  to  be  okay  with  being  interrupted?  Right?  Yeah,  that's  typical.  I  think  the  language  models  as  they  are  right  now  are,
 you  know,  the  most  capable  language  models  are  you  have  the  ability  to  pick  up  on  their  previous  train  of  thought  sort  of  and  continue  the  conversation,  but  it  is  something  that,  you  know,  careful  prompting  and  training  have  to  be  done.  So  what  do  you  even  train?
 if  you  train  a  model  with  an  interruption,  won't  it  learn  that  with  an  interruption  where  it  just  stops  speaking  halfway  through  a  sentence,  wouldn't  it  then  learn  that  it's  okay  to  stop  speaking  halfway  through  a  sentence?  Yeah,
 if  you  were  doing  supervised  fine -tuning  on  those  types  of  transcripts,  but  I  think  there  are  probably  other  ways  to  get  around  that.  Like  prompting  stuff?  Yeah.  Yeah,  I  guess  it's  interesting  because  I'm  trying  to  go  through  these  set  of  behaviors  that  in  my  head  are  like,
 these  are  the  ones  where  people  would  feel  weird.  And  I  guess  part  of  your  response  is  like,  yeah.  but  each  one  of  these,  we  could  tackle  one  at  a  time."  Yeah,  I  think  so.  And  I  think  a  lot  of  people  are  out  there  trying  to  tackle  that.  And  do  you  think  that  voice -to -voice  interaction  with  Lineman  models  is  the  next  thing  for  the  next  six  months  for  a  lot  of  these  companies?
 Do  you  interact  with  AI  emotively  when  you  personally  speak  to  a  Google  Home  and  Alexa,  chat,  GVT,  et  cetera?  I  don't  really,  but  in  so  much  as  that  I'm  not  trying  to,
 but  I  imagine  I  do.  I  don't  really  alter  the  way  I  --  well,  with  the  original  ones.  Yeah.  Yeah.  Siri,  for  instance,  I,  I  definitely  just  interact  in  a  more  neutral  way,  but  we've  learned  those  behaviors.  And  so  I  think,  you  know,  as  we  have  more  expressive  agents,
 then  I  think  the,  you  know,  we'll,  we'll  go  back,  we'll  relax  back  to  the  way  we  might  communicate  with  each  other  over  the  phone  or  something.  That's  fascinating.  I  guess  I  never  thought  of  it  as  a  learned  behavior,  because  we  really  noticed  like  when  people  talk  to  RAI,
 they'll  often  talk  in  a  way  that  they  would  never  talk  to  a  human  where  they  say  like,  I  was  walking  down  the  street.  street  and  I  saw  a  house  and  they  like  speak  in  this  way  because  they  know  that  they  can  basically.
 And  I  guess  there  is  something  here  that  is  just  the  learned  behavior.  Is  that  a  bad  thing?  Like,  you  know,  what  are  the,  if  you  were  to  make  the  case  against  yourself  right  now,  which  was  just  like,  here's  why  we  want  to  be  able  to  interact  with  AI  in  a  command  format  with,
 you  know,  like  where  you  want  it  to  be  like,  you're  in  control,  the  AI  doesn't,  you  don't  have  to  actually  like  speak  the  way  you  speak  to  a  human.  What  would  be  the  case  for  that?  Yeah.  I  think,  I  mean,  humans  are  really  adaptable.
 And  I  think  there  could  be,  I  can't  imagine  exactly  how  this  would  work,  but  I  could  imagine  developing  a  different  interface  where  you  could,  you  know,  almost  learn,  let's  say  use  your  eyes,  if  you're  interacting  with  a  screen  or  something,  use  your  eyes  or  other  aspects  of  your  voice  to  kind  of  trigger  more  information  that  you  wouldn't  do  with  another  human  that  could  make  the  conversation  extremely  efficient,
 but  in  a  totally  new  way.  I  think  humans  could  maybe  learn  to  do  this.  I  guess  it's  like  what  we'll  see.  in  the  future  if  people  develop  these  applications.  I  think  that  it's  easier  to  leverage  our  natural  way  of  communicating.
 We  don't  switch  between  talking  to  humans  and  talking  to  our  LLM  applications.  But  yeah.  I  do  tend  to  think  like  it's  hard  for  me  to  square  this  image  that  like  we  use  emotion  in  our  voice  for  the  sake  of  efficiency.
 Like  that  statement  seems  to  hit  me  in  a  way  where  I'm  like,  I  don't  feel  like  the  way  I'm  talking  to  you  right  now  is  to  convey  efficiently  information.  I  feel  like  it's  just  me  interacting.  And  part  of  the  way  that  you  receive  it  is  very  feeling -centric,
 right?  Not  that  you  actually  got  a  bunch  more,  like,  to  convey  the  same  information  as  what's  in  my  voice  probably  wouldn't  take  that  many  bits,  but  it  wouldn't  hit  you  the  same  way  you  being  another  human,  you  know?  Yeah.  I  don't  know.
 I  think  it's  something  we  definitely  should  look  into  than  the  research  side  of  things.  But  I  get  the  sense  of  just  writing  or  emailing  or  texting.  There's  so  much  more  that  I  could  convey  when  I  just  pick  up  the  phone  and  call  my  friend.
 And  I  don't  even  really,  a  lot  of  it  is  I  don't  even  really  need  to  say  the  contents  sometimes.  It's  just  the  way  I'm  saying  it,  then  he'll  pick  up  on,  you  know,  what  I'm  actually  trying  to  say.  And  so  I  think  some  of  that  maybe  is  just,
 you  know,  about  close  friendships.  But  I  think  some  of  that  would  also  could  be,  could  also  be  information  efficiency,  right?  Like  just  you  convey  a  lot  more  information  on  a  five  second  phone  call  than  you  do  with  it.  Like  that's  a  five  seconds  is  a  long  time.
 to  convey  a  lot  of  information  Yeah,  you're  literally  saying  more  words  on  the  call  and  people  are  like  it's  so  much  easier  for  me  to  communicate  this  on  a  call  Yeah,  you  also  said  way  more  words  on  that  call.  Yeah,  exactly  Part  of  the  reason  I  think  voice  interfaces  are  useful  in  general  whether  or  not  you  know  the  degree  to  which  prosody  itself  Contains  in  additional  information.
 I  have  the  intuition  that  it  does  but  you  know  some  degree  It's  just  allowing  for  a  easier  allowing  for  voice  interface  and  then  allowing,  you  know,  these  other  characteristics  of  interruption  and  those  kind  of  things,  which  I  think  are  gonna  go  a  long  way  to  just  improve  the  efficiency  of  LLM  human  interactions.
 - So  it  is  interesting,  I  guess,  like  there's  this  dimensionality  thing.  There's  like,  you  can  use  your  voice,  now  you  can  add  prosody,  you  could  also  have  a  mouse  where  you  like  show  in  a  two  dimensional  plane  what  you're  looking  at.  You  can  click  to  indicate  that  you  wanna  select  something,
 like  maybe  you're  using  your  eyes  now  to  indicate  something.  So  prosody  kind  of  just  adds  a  dimension  and  happens  to  be  a  dimension  that  we  already  use  every  day  in  our  voice.  It's  interesting.  I  guess  what  we'll  find  out  is  like,
 is  it  more  natural  for  people  to  just  show  they're  frustrated  than  to  say  I'm  frustrated?  We'll  see.  I  don't  know.  I  have  mixed  feelings  here.  It's  interesting  'cause  like  we're  obviously  working  on  an  AI  therapist,  which  means  that  these  questions  come  to  us  all  the  time  too.
 Like,  do  you  want  to  analyze  prosody  and  voice?  And  I  have  such  mixed  feelings  'cause  I  get  why  people  would  want  it.  But  at  the  same  time,  like,  I  also  kind  of  think  there  should  be  a  wall  where,  like,  when  you  are  relating  to  a  computer,
 you  know  It's  not  the  same  feeling  or  like  at  least  for  a  while  It  won't  be  the  same  feeling  that  you  get  back  as  the  feeling  you  get  talking  to  another  human  And  I  almost  do  feel  like  having  those  walls  is  valuable  like  knowing  like  when  you  talk  to  an  AI  therapist  about  an  issue  You're  going  to  get  back  computer  advice  and  you're  going  to  you're  not  going  to  be  able  to  get  the  same  connection  as  you  will  with
 a  human  Hopefully  you  can  still  gain  something  but  I  worry  about  like  I  think  there  is  from  my  perspective  some  reason  to  worry,  especially  if  you're  not  building  AI  there,  so  it's  a  little  bit  less  worrisome.  But  for  us,
 like,  we  want  to  make  sure  that  you  don't  think  that  the  AI  can  feel  your  emotions,  you  know?  Yeah.  Because  we  want  you  to  find  humans  that  do.  Yeah,  exactly.  I  think  it's  very  AI,  application  specific.  And  there's  a  group  to  which  I  think,
 as  we  were  talking  about  earlier,  I  think  different  applications  should  have  the  ability  to  turn  on  or  off  the  different  signals  that  these  language  models  can  utilize  in  their  interactions.  And  I  think  so,  certain  situations  where  we  do  want  a  wall  or  have  like  a  different  sort  of  interface  with  language  models  might  be  beneficial  in  your  case  or  you  know,
 sure  there's  a  lot  of  other  cases  where  it  might  also  be  true.  - Yeah,  I  agree  with  that.  All  right,  so  last  thing  just  to  go  through,  I  guess  we  talked  a  lot  about  emotion,  a  little  bit  about  empathy.  I  don't  think  we  really  need  to  dive  too  far  down  the  empathy  path  though.
 I  think  it  is  really  interesting.  One  thing  I'd  wonder  is  like,  there's  a  big  debate  in  academic,  you  know,  research  ML  which  is  basically  like,  do  models  actually  gain  instrumental  skills,  right?  So  on  the  other  hand,  the  optimistic  side,
 there's  this  idea  that  models  gain  instrumental  skills.  If  you  teach,  you  know,  reasoning,  if  you  show  enough  math,  it  actually  learns  how  additional  works,  multiplication,  division,  it  doesn't  just  memorize.  And  there's  like  a  flip  side  argument,
 which  is  like,  no,  it  actually  doesn't  even  learn  those  things.  All  it  learns  to  do  is  to  memorize.  Do  you  think  that  there's  any  legitimacy  to  the  memorize  argument?  And  if  so,  does  that  change  anything  for  AI  learning  emotion?
 So  I  think  it  does  more  than  just  memorize,  but  there's  an  interesting  middle.  middle  ground  of  having  the  entire  training  data  set  and  then  interpolating  in  between  it.  And  it's  an  interesting  question  of  how  much  of  human  knowledge  and  how  much  of  mathematical  knowledge  and  emotional  knowledge  is  just  this  interpolation  between  examples  in  some  high  dimensional  space  versus  some  truly  novel  construction  of  existing  knowledge
 that  branches  out  that  would  be  more  extrapolation.  And  I  think  for,  it's  interesting,  I  can't  imagine  because  we  can't  can't  imagine  how  much  information  is  on  the  internet,  it's  really  hard  to  imagine  like  how  much  of  this,  you  know,  how  much  of  this  mathematical  ability  even  can  just  be  like  stitched  together  by  interpolating  existing  examples  and  how  much  of  it  needs  to  be  like  a  genuinely  new  kind  of  knowledge
 construction.  I  think  for  a  lot  of  emotional  understanding,  there's  been  so  much  discussion  on  the  internet  that  I  think  you  can  get  a  lot  of,  you  know,  a  lot  of  the  way  there  by  not  memorizing  examples,  but  by,  you  know,  interpolating  between  existing  examples.
 And  then  I  think,  you  know,  it's  an  open  question  of  like,  how  much  it  needs  to  act.  sort  of  engage  interactively,  actively  with  the  environment  to  kind  of  learn  more  or  go  through  some  sort  of  like  interactive  training  with  other,
 you  know,  experts  in  these  kind  of  domains.  It'll  be  interesting  to  see  like,  you  see  where  we  go  beyond  pre -training,  obviously,  with  these  sort  of  as  language  models  interact,  you  know,  with  reasoning  and  they  get,  you  know,  true  and  false  feedback  from  the  environment  that  sort  of,
 you  know,  how  much  that's  going  to  build  on  top  of  there  in  a  sort  of  like  corpus  that  they've  been  trained  on.  Yeah.  I  guess  on  the  understanding  side  of  something  like  emotion,  it  makes  sense  that  like,  if  you're  hearing  my  voice,
 you've  probably  heard  enough,  like  even  as  a  human,  you've  probably  heard  enough  voices  that  you  are  like  interpolating,  you're  like,  this  sounds  like  a  thing  I've  heard  a  million  times  in  similar  ways.  On  the  other  side,  if  you're  generating  with  a  language  model  or  with  a  voice  model  where  you're  like  injecting  something,
 that's  kind  of  where  I  wonder  like,  there  is  something  I  think  to  like  emotional  reasoning.  And  that's  where  I'd  wonder  like,  is  there  a  risk  or  is  there  something  we  could  even  just  say?  here  and  find  out  factually  the  answer  of  like,
 are  models  doing  emotional  reasoning  or  are  they  like  quote  unquote  cheating?  Like,  are  they  just  able  to  say  like,  it  sounds  like  you  feel  sad  or  did  they  actually  like  emotionally  reason?  What  is  it  that  might  have  made  you  feel  sad  and  how  can  I  change  what  I  say  next  in  order  to  make  sure  that  you  no  longer  feel  sad?
 You  know?  Yeah,  I  think  this  is  something,  you  know,  good  evaluations  on  this  would  be  useful  in  developing  like  an  actual  benchmark  for  this  kind  of  thing  would  be  really  useful  to  look  at.  Yeah,  I  think  a  lot  of  it  is  these,  you  know,  a  lot  of  what  we  we  test  right  now  is  surface  level  features.
 So  it's  like,  if  you  gave  this  audio  clip  to  some  human,  how  would  they  describe  it  essentially?  And  that  seems  to  me  much  more  like,  that  is  much  more  like  interpolation.  Obviously,  there's  no  reason  behind  that  single  audio  clip.  In  a  larger  context,
 then,  yeah,  I  think  it's,  it's  a  no  big  question  of  like,  is  the  prediction  about  how  the  humans  express  this  right  now,  driven  just  by  the,  you  know,  the  acoustic  characteristics  in  that  moment,  or  is  it  driven  by,  you  know,
 the  whole,  like,  context  of  the  conversation.  conversation?  And  this  is  something  that  I'm  really  interested  in  figuring  out  and  evaluating,  but  it's  very  much  an  open  problem.  So  we  should  expect  to  see  the  answers  really  soon.  In  this  space,  what  are  the  big  developments  that  you're  most  excited  to  see  happen  soon?
 I  am--  well,  I'm  really  excited,  obviously,  about  the  capabilities  of  multi -model  language  models.  And  so,  obviously,  the  GBD4  has--  GBD4  and  Gemini  have  amazing  capabilities  with  images.  I  think  the  audio  demand  is  a  little  bit  laggy  behind,
 but  I'm  really  curious  to  see  where  that  goes.  And  so,  I'm  really  excited  to  see  where  that  goes.  that's,  I  mean,  that's  one  of  the  most  interesting  things.  I  also,  you  know,  you  and  I  are  both  interested  in  language  model  reasoning.  So  we'll  see  where  that  goes.  I  think  actually,  long  time  back,
 but  our  first  conversation,  I  believe  it  was,  is  when  I  asked  you,  "When  do  you  think  we'll  get  to  speech -to -speech  models  that  are  just  end -to -end  whisper  style,  enter  in  speech,  get  speech  back?"  Yeah,  that's,  I  mean,
 so  hard  to  estimate  when  that's  going  to  happen.  I  think  ultimately,  ultimately  something  like  that  will  happen.  I  mean,  end -to -end  models  are  just  the  way  that...  things  move.  It's  just  that  right  now  it's  much  more  of  a  hierarchical  piece  together  system  that  everyone's  using.
 And  I  think  mostly  just  practical.  Audio  information  is  such  a  high  bit  rate.  And  you  just  can't  have  a  traditional  language  model  operate  at  that  level  yet.  So  I  would  give  it  a  few  years,
 I  would  say.  Although,  I'm  thinking  open  AI  released  Whisper,  not  that  long  ago.  And  Whisper  was  literally,  like,  one  thing.  it  can  do  is  Take  in  speech  as  audio  and  output  a  translation  into  another  language  Yeah,
 so  it  is  like  an  end -to -end  language  model  that  not  only  you  know  received  speech  It  actually  understands  it  well  enough  that  it  can  translate  it  to  another  language  and  output  like  you  know  Speak  in  Portuguese  get  English  text  back.
 That  is  so  cool.  Like  it  is.  Yeah  makes  me  think  exactly  So  it  can  be  done  in  the  local  context  for  sure  for  a  very  short  context  You  know,  I  am  a  little  bit  skeptical  that  can  be  scaled  up  to  like,
 you  know  full  paragraphs  or  not  paragraphs  for  like  full  Conversation  level  context,  but  you  know,  we'll  see  people  are  moving  a  lot  of  people  are  working  on  these  things  So  it  could  be  a  long  way  away  It  also  could  be  a  bitter  lesson  moment  where  we  all  are  a  little  sad  about  all  the  work  We  did  and  then  just  last  question  any  books  or  papers  that  you'd  recommend  for  people  to  read  if  they  want  to  learn  more
 about  this  Space  about  this  space  Well,  I  would  suggest  people  read  go  to  our  website  actually  and  look  at  all  the  scientific  papers  that  a  lot  of  my  colleagues  have  done  in  this  space.  A  lot  of  people  before  moving  into  the  AI  space  were  doing  research  on  just  making  emotion  science  more  rigorous  and  quantitative.
 I  think  a  lot  of  those  papers  are  a  great  foundation.  Awesome.  Well,  thanks  so  much  for  joining  us,  Chris.  Yeah,  thanks  for  having  me.  All  right.  That  was  Chris  Gagnier  on  AI  and  Emotion.  That  was  a  lot  of  fun.
 Like  you  said,  if  you  want  to  check  out  some  more  from  Hume,  you  can  check  out  their  website.  website.  They  have  some  research  that  they've  shared  there  from  Chris  and  his  colleagues.  Also,  we'd  love  to  hear  any  feedback  you  have  to  share,  so  feel  free  to  reach  out  with  any  ideas  or  notes  at  daniel at slingshot dot xyz.
Creators and Guests