Inspiration
I've always wanted a cat but can't have one due to personal circumstances. At the same time, I often find myself working for hours in mixed reality with large virtual screens—and I know I'm not alone. VR fatigue is real: we lose track of time, skip breaks, and end up exhausted.
One day it clicked: what if I could combine these two desires? An AR cat companion that lives in my actual room, understands my real furniture, and reminds me to take breaks. Not a notification that I'd dismiss, but a living presence that walks up to me and meows for attention.
The Immersive Web SDK made this vision possible. The best part? On Meta Quest, web apps can run in the background while you work on other things. Even when I'm deep in a remote desktop session, I can still hear my cat meowing—a gentle nudge from another reality that it's time to step away.
What it does
Paws & Pause is a mixed reality cat companion built entirely for the web using Meta's Immersive Web SDK. The cat:
Lives in Your Real Space
- Uses Scene Understanding to detect your floors, walls, and furniture
- Walks on your actual floor, avoids real walls, and jumps onto real furniture
- Casts real-time shadows that ground it in your environment
- Gets occluded by real-world objects for true AR immersion
Responds to Three Interaction Modes
- Command Mode: Point at any surface and pinch to direct the cat there. It will walk or jump to reach the destination.
- Laser Mode: Hold your pinch to project a laser pointer. Watch your cat crouch, stalk, chase, and pounce—complete with attack animations.
- Feeding Mode: Fill a virtual food bowl. The cat will notice and walk over to eat.
Reminds You to Take Breaks
- Configurable timer: 30 seconds (for demos), 30 minutes, 1 hour, or 2 hours
- When time's up, the cat walks toward you, meows, and displays a speech bubble to remind you to take a break
- Dismiss by pinching at the speech bubble, feed it, or let it auto-dismiss after 30 seconds
- Any action done with the cat resets the timer, and the cycle continues
Runs in the Background
- Thanks to the Immersive Web, the app persists while you use other Quest apps
- The cat meows even when you're in Remote Desktop mode
- True ambient companion experience
How I built it
Technical Foundation:
- Immersive Web SDK (IWSDK) for WebXR with TypeScript and Three.js
- Entity-Component-System (ECS) architecture for clean behavior management
- Scene Understanding API for real-time room mapping (floors, walls, furniture)
- Hand Tracking for natural pinch-based interactions
- GitHub Pages for instant deployment and sharing
Cat AI & Animation System:
- 14-state behavior state machine (Idle, Roam, Rest, Sitting, Jumping, ChaseLaser, Attack, Eating, and more)
- 30+ animations with smooth crossfade transitions
- Natural movement with cycle-aligned locomotion (no foot sliding)
- Procedural neck bone rotation for head tracking during laser chase
Navigation System:
- Three-layer collision system: FLOOR_LAYER, WALL_LAYER, FURNITURE_LAYER
- Ground vs. Elevated mode detection for furniture-aware navigation
- Edge detection to prevent falling off tables
- Parabolic jump trajectories that scale with distance
- Whisker-based wall avoidance with configurable detection
Visual Realism:
- Real-time shadow casting using DirectionalLight + ShadowMaterial
- Mesh-based depth occlusion from Scene Understanding
- WebXR Depth Sensing research (documented limitations with Quest 3's texture-array format)
UI System:
- UIKitML-based floating menu with mode selection and timer configuration
- Live countdown display (HH:MM:SS format)
- Manual raycast-based interaction for Quest 3 compatibility
- Speech bubble using PNG billboard sprites
Challenges I ran into
WebXR Depth API Limitations:
I spent significant time trying to implement real-time soft occlusion using the WebXR Depth Sensing API. I discovered that Quest 3 uses TEXTURE_2D_ARRAY format for stereo depth, requiring sampler2DArray in GLSL 3.0—incompatible with Three.js's standard onBeforeCompile approach. I documented my findings extensively in SOFT-OCCLUSION-RESEARCH.md for the community's benefit. For now, I use mesh-based occlusion from Scene Understanding, which works but has blockier edges.
Cat Getting Trapped in Furniture: Scene Understanding creates green floor planes INSIDE blue furniture bounding boxes (like couch seats). Without careful handling, the cat would get trapped inside furniture. I solved this with a hybrid ground-floor detection system: primary detection uses wall attachment (floors touch walls, furniture doesn't), with Y-height fallback during early scanning.
XR Frame Access in Event Handlers:
IWSDK's session.select events don't provide XR frame access, breaking hand-position raycasting. I implemented a pattern of storing ray origin/direction each frame in the update loop, then referencing stored values in event callbacks.
UIKit Interaction on Quest 3:
The built-in UIKit click events didn't work reliably on Quest 3 hardware. I implemented manual raycast-based button detection, zone-based hit testing, and programmatic style updates via setProperties().
Animation Foot Sliding: Cycle-aligned locomotion required careful tuning. I measure animation clip duration, calculate distance per cycle, and round down to complete cycles—ensuring feet "stick" to the ground naturally.
Accomplishments that I'm proud of
Complete Behavior System: My cat has personality. It roams naturally (preferring forward movement, respecting turn-angle limits), sits down occasionally (with enter/loop/exit phases), grooms itself, and responds dynamically to user interaction. The laser chase sequence—notice, stalk, crouch, track with head, chase at 2x speed, pounce—feels genuinely cat-like.
Furniture-Aware Navigation: The cat truly understands your room. It walks on your floor, avoids your walls, jumps onto your couch, and knows the difference between being on the ground versus elevated on furniture. This required a sophisticated three-layer raycasting system with mode-based collision detection.
Production-Ready Polish: Real-time shadows, smooth animation blending, natural movement curves, and a clean UI that works reliably on Quest 3 hardware. The app maintains 80-90 FPS consistently.
Comprehensive Documentation: Over 3,000 lines of design documents capturing architectural decisions, implementation details, and lessons learned. This isn't throwaway hackathon code—it's a foundation for continued development.
True Background Companion: Thanks to the Immersive Web's ability to run in the background on Quest, the cat can meow at you even while you're working in other apps. This transforms it from a toy into a genuine ambient companion.
What I learned
WebXR Depth Sensing is Nascent: The Depth API exists but practical integration with Three.js shaders on Quest's stereo texture-array format remains challenging. I documented a complete research spike for future developers.
Scene Understanding is Powerful: IWSDK's Scene Understanding provides rich spatial data. The key is building robust systems on top: layer-based filtering, mode-aware navigation, and hybrid detection methods.
Animation is Everything: A character's believability comes from animation quality and appropriate state transitions. My 14-state machine with 30+ animations makes the cat feel alive. Shortcuts here would have killed the experience.
UIKit Needs Manual Handling: For robust Quest 3 interaction, expect to implement manual raycasting and programmatic style updates alongside the declarative UIKitML system.
Plan for Complexity: My navigation system went through multiple iterations—from simple raycasting to three-layer mode-based collision with edge detection and jump accessibility validation. Document architectural decisions; you'll need them.
What's next for Paws & Pause
Soft Occlusion (When Tools Mature):
I have a complete design for WebXR Depth API integration using custom GLSL3 ShaderMaterial. When Three.js or IWSDK adds better TEXTURE_2D_ARRAY support, I'm ready.
Petting Interaction:
Hand-mesh collision detection to trigger caress animations when users pet the cat. The animations exist (Caress_idle, Caress_sit, Caress_lie)—I just need the detection.
Multiple Cats & Customization: Spawn multiple cats with different personalities (lazy vs. active, bold vs. cautious). Let users customize appearance.
Social Features: Shared AR experiences where friends can see and interact with each other's cats.
Health & Wellness Integration: Connect to health APIs to suggest breaks based on actual usage patterns, not just timers.
Autonomous Jumping: Currently, jumping is user-directed. I want the cat to occasionally explore furniture on its own, making the space feel more lived-in.
Built With
- iwsdk
- three.js
- typescript
- vite







Log in or sign up for Devpost to join the conversation.