Photo and voice become one meal.
The video shows the spread of ingredients. VoCal combines what the camera sees with what you say is hidden, mixed in, or customized.
The video shows the spread of ingredients. VoCal combines what the camera sees with what you say is hidden, mixed in, or customized.
VoCal result
742
kcal
38g
protein
+22g
vs photo
Why VoCal
Manual logging takes 2-3 minutes. CalAI-style photo logging is about 90 seconds. VoCal captures the meal in roughly 20 seconds, while voice adds the hidden details photos miss.
20s
VoCal log time
4.5x
400-500% speedup
Average meal logging time
7.5x faster than manual
Compared with a 150-second manual midpoint.
More accurate context
Voice captures buried, mixed, off-frame, and customized ingredients.
Why VoCal
Talk like a normal person. VoCal turns messy voice notes into structured food, portions, modifiers, and confidence signals.
Voice note
"Half a Cava bowl, extra chicken, dressing on the side."
Name the place or dish and VoCal can search restaurant foods, compare likely menu items, and anchor the estimate to a real meal.
Best match
Chicken burrito bowl
If the camera and voice still leave ambiguity, VoCal asks one or two targeted questions before locking the log.
VoCal asks
Was the chicken grilled or fried, and did you use all of the sauce?
You answer
Grilled chicken, about half the sauce, and rice underneath.
Weight, protein consistency, calories, and check-ins sit together so the user sees trend, not just today's log.
Weight trend
181.4 -> 178.9
Progress photos can estimate body-fat direction over time, while the interface keeps it framed as guidance, not a diagnosis.
Estimate range
17-19%
trend confidence rising
Apple Watch, Whoop, Apple Health, and other fitness trackers help VoCal adjust coaching to recovery and output.
Three seconds
The photo gives VoCal a starting point. Your voice fills in what the lens cannot know, then the app turns both into a cleaner nutrition record.
VoCal identifies what is visible on the surface: rice, greens, sauce, drink, package, or restaurant dish.
Say what is underneath, mixed in, customized, or from a restaurant so the estimate is not limited to the photo.
If something is uncertain, VoCal asks about the exact thing that changes the estimate: sauce amount, prep method, hidden base, or portion size.
Pricing
VoCal Pro
Most loved$4.99/month
Unlimited history, nutrition coach, progress imaging, wrist app, full export.
Go ProQuestions
A photo can only see the surface. VoCal uses voice to add hidden ingredients, restaurant details, cooking methods, and rough portions, then reconciles that context with the image.
Yes. Capture works fully offline and syncs when you reconnect. The Apple Watch flow is voice-first, so you can add context mid-walk without your phone.
Audio is transcribed on-device and discarded immediately — only the structured meal is stored. Nothing is sold or used for advertising.
Free on iOS and Android. Snap the meal, say what the camera missed, and keep your progress moving.