Skip to content

Feature Request: multiple llama-server WebUI FRs #16839

@ABJ4403

Description

@ABJ4403

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

  • "Continue response" button (something with icon that looks like this: [>>]) when inference is stopped
  • WebUI offline caching (similiar to how hexed.it doing it, for example)
  • (somewhat related to 2nd item) When model is loading, instead of showing just "Model is loading" text, load the homepage like usual, but there's phase where it "waits for server" (forgot what its called), we can try send the "error model is still loading" response thing there, and error handler that catches this will show in homepage "loading the model" and of course it gets rechecked every 5 seconds
  • Multiple languages (assuming its not that hard or costly in size to implement)(idk if this was implemented already or not, bcz i dont see language toggle)
  • For mobile users, pressing enter should not send query, for that user can press the submit button instead
  • After done inferencing, make another inference to generate short summary for the chat title

Motivation

  • In case when user stopped inference or output limit (not context) reaches maximum, instead of user having to tell the AI to "continue" response, which likely break, provide "continue response" button (maybe next to "regenerate response" button) which when pressed continues the inference. I've seen this somewhere but i forgot where it was, maybe DeepSeek has it?
  • Faster loading times
  • Cleaner UI for showing when the model is loading
  • Not all people can understand English, and some AI frontends have multiple language support
  • For mobile users, the "Shift" key on almost all virtual keyboards can only be used to switch letters/symbols only it cant send literal shift key when pressing enter for example
  • makes the chat title readable, but could be slow to generate, especially if long texts are involved, so this should be optional toggle. So far i only see this on ChatGPT

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions