Skip to content

Local API or Gradio Client Support focus. #3

@waefrebeorn

Description

@waefrebeorn

Gradio clients that run local language models such as “OobaBooga” and allow api support should be a major consideration for the roadmap process. Creating usable model swapping with a cache functionality is feasible. I made an example chart months ago when I saw the potential in MinP greedy sampling that Kalomaze did work on being helpful for memory driven tasked recall due to the token accuracy.
image

Please note that current projects like MemoryGPT allow api usage but no widespread application allows for effective model swapping or multi system offloading. It’s also important to note that a side server “chain” of cheaper machines or a GGML focused network solution could allow for more garage labs.

Current Roadblocks are memory management, non-useful hallucinations (effective hallucinations could generate better idea tokens in a agent focus), and ineffective inter model conversation solutions that are actually open source for System prompting style implementation.

The most feasible multi model solution is to allow for most elements to be cpu offloaded but for features like live training a model with a model doing RLHF being a “drop in” use that requires a GPU with enough vram for training. Unless a Traditional ram based training solution is usable with current model base such as mistral.

To summarize, a focus on using API solutions such as chatgpt or Claude will stagnate research on local language model feasibility. Creating a feasible framework for agent structures and Lora based live tuning for memory retention elements on a version based task list will most likely be the best course.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions