Nutri-Extract is a zero-install, single-file web application that leverages Google’s Gemini 2.5 Flash to perform Optical Character Recognition (OCR) on nutrition labels. It transforms a simple image URL of a nutrition table into a structured data table without requiring a dedicated backend server.
- Serverless Architecture: Runs entirely in the browser using ES modules.
- Gemini 2.5 Flash Integration: Uses the latest high-efficiency model for rapid extraction.
- Structured JSON Output: Automatically maps messy label text to clean keys like
energy-kcal,proteins, andfat.
The app uses an importmap to load the @google/generative-ai library directly from a CDN (esm.run). This allows the script to remain a single file without needing npm install or a build step.
When a URL is provided, the app performs the following steps:
- Fetch: It attempts to download the image as a Blob.
- Conversion: Uses
FileReaderto encode the blob into a Base64 string, which is the format Gemini’sinlineDatarequires. - Transmission: Sends the Base64 data and a specific prompt to the Gemini API.
The system uses a strict prompt to force Gemini to return only a JSON object. It explicitly instructs the AI to:
- Extract values specifically for 100g/100ml.
- Ignore units (e.g., return
450instead of450kcal). - Handle missing data by returning
null.
<script type="importmap">
{
"imports": {
"@google/generative-ai": "https://esm.run/@google/generative-ai"
}
}
</script>
<script type="module">
import { GoogleGenerativeAI } from "@google/generative-ai";
</script>- The Statement: The
importmapacts like a "phone book" for the browser. - The Detail: Usually, to use Google's AI library, you would need to use
npmand a bundler. By usingesm.run, you are telling the browser to download the library as a modern ES Module. Thetype="module"in the second script tag is mandatory; without it, theimportstatement would cause a syntax error.
Because the Gemini API doesn't "visit" the URL you provide, the browser has to download the image and package it for transport.
const blob = await fetch(imgUrl).then(res => res.blob());- The Statement: This fetches the image and converts it into a
Blob(Binary Large Object). - The Detail: A Blob is the raw, unencoded data of the file. We need this because we need to know the
mimeType(e.g.,image/jpeg) to tell Gemini what it is looking at.
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result.split(',')[1]);
reader.readAsDataURL(blob);- The Statement: This converts the binary Blob into a Base64 string.
- The Detail:
readAsDataURLcreates a string that starts withdata:image/png;base64,.... Gemini only wants the raw code after the comma. The.split(',')[1]is a surgical strike to remove that header, leaving only the pure data.
const prompt = `Extract nutritional values per 100g.
Return ONLY a JSON object with these keys:
"energy-kcal", "proteins", "carbohydrates"...`;- The Statement: This is the "System Instruction."
- The Detail: You aren't just asking for text; you are defining the Schema. By listing the exact keys you want, you ensure the AI doesn't give you "Protein" one time and "Proteins" (plural) the next. This consistency is what makes the automatic table-filling possible later.
const result = await model.generateContent([
prompt,
{ inlineData: { data: base64Data, mimeType: blob.type } }
]);- The Statement: This sends the text and the image bytes in one single array.
- The Detail: This is the definition of Multimodal. The AI doesn't "read" the label and then "answer" the prompt separately. It uses the prompt as a lens through which it views the image data.
const text = result.response.text().replace(/```json|```/g, "").trim();
const data = JSON.parse(text);- The Statement: This strips out formatting marks and turns text into a JavaScript Object.
- The Detail: LLMs are trained to be chatty. Even when told "ONLY JSON," they often wrap the answer in "code blocks" (backticks).
- The Regex
/```json|```/glooks for those backticks and removes them globally. JSON.parse()takes the cleaned string and turns it into a live object that code can interact with.
function displayTable(data) {
for (const [key, value] of Object.entries(data)) {
html += `<tr>
<td style="text-transform: capitalize;">${key.replace('-', ' ')}</td>
<td>${value || 'N/A'}</td>
</tr>`;
}
// ...
}What it does: This is a simple loop. It takes the "Keys" (like energy-kcal) and "Values" (like 450) from the JSON and creates HTML rows on the fly. It also replaces hyphens with spaces to make it look cleaner for the user.
- The Statement: This iterates through every piece of data Gemini found.
- The Detail: *
Object.entries(data)turns your JSON into a list of pairs. ${key.replace('-', ' ')}takes a technical key likesaturated-fatand makes it human-readable assaturated fat.${value || 'N/A'}is a "fallback." If the AI returnednullfor a missing value, it displays "N/A" instead of an empty box.
The answer lies in Step 4 and 5. Standard OCR just gives you a pile of words. These statements prove your app actually understands the relationship between the words "Protein" and the number "5g" and turns it into usable data.
The most "fragile" line in the code is:
const blob = await fetch(imgUrl).then(res => res.blob());
This line is the "CORS Trigger." If a user tries an image from a site like Amazon or Walmart, the browser's security policy will stop the code right there. Adding a "File Upload" button (using <input type="file">) would bypass this because the file comes from the user's hard drive, not a foreign server.
Most modern websites block other websites from "stealing" their images via fetch. If you try to use an image from a site like Amazon or a news outlet, the browser will likely throw a CORS Error.
- Solution: This app works best when images are hosted on a service with open CORS (like Imgur) or when the file is run from a local development server.
This app handles your GEMINI_API_KEY in plain text on the client side.
- Warning: Never host this publicly with your key hardcoded. Only use this for local development or private tools. If you share the link to your hosted version, anyone who inspects the page can potentially see how you handle data.
The current code relies on a URL. For a better user experience on GitHub, consider adding a <input type="file"> so users can upload photos directly from their phones, which bypasses the CORS issue entirely.
- Clone the repository or download
nutriextract.html. - Open the file in any modern web browser.
- Generate an API Key at Google AI Studio.
- Paste your key, provide a CORS-compliant image URL, and click Extract.