Interview Example Youtube link

Teacher Example Youtube link

Clinical Client Example Youtube link

Inspiration

In some fields, there's a practice among professionals to roleplay with colleagues in order to train for difficult situations - be it a doctor delivering diagnoses, or a mental health professional handling ethical dilemmas.

Back in 2020, I developed an application in coordination with the Portuguese Psychologists Association and HEI-Lab to help train junior psychologists in ethical decision making (link). While the initial feedback was quite positive, the recent boom in AR and conversational AI models vastly outshines decision tree scenarios, as these technologies have the possibility to re-define how we can implement roleplaying sessions.

I took this dev jam as an opportunity to venture solo beyond research laboratories and into the real world, as often research projects aren't made available to the general public given their restrictive scope. My hope is to eventually deliver something accessible that both professionals and casual users can benefit from, as roleplaying is helpful in a vast number of situations.

What it does

RolePlaiAR is a sandbox application to train communication-related skills with AI conversational agents in a close-to-real setting. To do this, it follows several steps:

  1. AR environment usage with room calibration to see and position the digital character (NPC) and lighting in the real world (includes optional use of HDRIs);
  2. Naturalistic interaction based on the PICO 4 sensors (microphone and hand tracking cameras);
  3. NPC randomized appearance with extensive customization options (age, size, gender, hair, clothing);
  4. Base instruction choice to provide a roleplay background for conversational models;
  5. Tandem implementation of online speech-to-text (STT), generative text models, and text-to-speech (TTS) models
  6. Procedural lipsync animation, randomized blinking, and upper spine/head clamped follow of users head position

How I built it

To make this all work, there are a number of systems I had to integrate: from AR and input functionalities, to high quality graphics and AI models all in a standalone android device. As such, the base concept had to be highly influenced by technical development, as ideas don't always correspond to feasible implementations.

Considering I don't (yet) own a PICO 4, the XR systems were based on core Open XR UE5 system and bare minimum Quest functionalities (specifically scene understanding and SDK for deployment). All the other functionalities were implemented to be compatible across multiple devices (e.g., overlap events for hand interaction, base android functionalities for device and web access).

A minimum level of polish was implemented to provide a smoother experience overall, from animated material instances to sound feedback on actions. To safeguard depth perception and multiple collision issues, menus were made transparent and placed as far apart as possible.

With accessibility in mind, this application follows an icon-driven UI with subtle guidance (e.g., icons light up in a specific order, beep sound prompts users to speak). This also matches the multilinguistic capabilities of the chosen AI models, as the application is by no means limited to english.

The TTS, STT and generative models are the current best online multilingual models available with a good quality-to-price ratio. I initially attempted to use local models on the device through llamacpp and whispercpp but soon faced a significant quality drop in the models' output coupled with performance issues.

Plugins and Asset Packs: Audio Analyzer Plugin, AzSpeech, Runtime Audio Importer, Runtime Speech Recognizer, VaRest, Twinmotion Content, Oculus Lipsync, Character Customizer, HDRI Backdrop, MetaXR

Challenges I ran into

The first major challenge was AR on XR, as documentation is still relatively poor and limited within the unreal community. Shout-out to the indepth PICO's AR documentation, it would've spared me more than a week of blind troubleshooting if I could've used the platform.

The second one was plugin compability. As a social sciences person with superficial programming knowledge, solving versioning issues on outdated C++ plugins soon became a recurring nightmare. The fun part was they mostly happened when packaging the project, so testing on the device became mandatory whenever making significant changes.

There is still a noticeable latency between speaking-replying, and the characters still need more lifelike dynamic animations. The ever present complexity of rest communications, dealing with performance issues, and Windows to Android migration (hair, HDRIs, and lods are still a problem) will make for some sleepless nights.

Accomplishments that we're proud of

Learning how to work with AR felt just like working with VR a couple of years ago. The documentation and tools are still in a rough state, so making an actual, semi-functional app is a great dopamine rush.

However, the greatest achievement might still be that I was able to keep up with development and managed to integrate AI with 3D characters. While many within psychology are still theorizing it as the future, having an actual verbal conversation with an avatar of GPT4 just by putting on the headset was a fascinating experience.

What we learned

The transition from Unity to Unreal caused some relapse whenever dealing with engine specific problems, but ultimately I manage to punch through with quite a few new skills. The possibility of integrating different services for the same tasks helped me better understand how to improve inter-component communication (the visual code could've be vastly simplified with a few macros though).

The other part was working with AR. It felt great not having to deal with cybersickness or space restrictions. The possibilities are truly endless, and while the hardware is maturing quite fast, software hasn't really catch-up as much.

What's next for RolePlaiAR

I intend to keep working on this project, as at the bare minimum it offers me a great personal tool to train my professional skills. Since I'm likely not the only one, I believe commercialization is a strong option.

My primary concern is privacy above all else. PICO so far has shown GDPR acknowledgement and possibility for HIPAA compliance, as happened with other companies such as Psious.

In what regards AI models, the safest option is always to run them locally on a nearby computer. Nonetheless, OpenAI's first dev event on the 6th of November revealed their investment on privacy, security, quality, and affordability. Allowing all three AI services (TTS, GPT, STT) in the same key is a much welcome convenience, and would make it much easier for users to use their personal key instead of relying on a subcription to use RolePlaiAR.

Another key feature is usability for research and education, as I'll very likely put this app through fire once it's more feature complete and polished.

Some general tasks to be done:

  • Performance optimization
  • Bug fixes on LODs, HDRI loading, and possible crashes due to either rest or lipsync cook moments
  • STT, TTS, and generative models running locally on a computer and sent to the device
  • Standing and sitting animation variety and procedural upper limbs movements
  • Detailed character customization menus complementar to random selection
  • Virtual keyboard for in-game instruction template editing and insert personal OpenAI key (opens the option for single time purchase)
  • Saving and encrypting local keys and templates. (especially useful for research purposes)

Built With

Share this project:

Updates