Add option to use Compute Horde by slawomir-gorawski-reef · Pull Request #68 · deval-core/De-Val

slawomir-gorawski-reef · 2025-02-27T13:18:25Z

Adds flag --neuron.use_compute_horde for the validator, to perform computations on the Compute Horde instead of locally.

Warning

Compute Horde is not 100% production ready yet, don't set this flag in real validators.

Usage

Requires some new COMPUTE_HORDE_ settings in .env:

COMPUTE_HORDE_VALIDATOR_HOTKEY=<your hotkey>
# optional, defaults to production
# COMPUTE_HORDE_FACILITATOR_URL=
# optional, defaults to LLM A6000
# COMPUTE_HORDE_EXECUTOR_CLASS=
# This is this repo's Docker image, so you can docker build . -t <your image> and use that
COMPUTE_HORDE_JOB_DOCKER_IMAGE=backenddevelopersltd/slawek-test:v0-latest
COMPUTE_HORDE_JOB_NAMESPACE=SN15.0

To run the validator using Compute Horde, register as validator on SN15 (see the README for validators) and run:

poetry run python neurons/validator.py --netuid 15     --subtensor.network finney     --wallet.name default     --wallet.hotkey default     --logging.debug     --logging.trace     --axon.port 3000     --neuron.model_ids 'gpt-4o,gpt-4o-mini,mistral-7b,Claude-3.5,command-r-plus' --neuron.axon_off --neuron.num_task_examples=1 --neuron.disable_set_weights --neuron.use_compute_horde

You can also use pm2 as recommended.

To run a single job (miner model) on the Compute Horde without starting a validator loop – this doesn't require registration and acts as a "sanity check":

poetry run python scripts/compute_horde_e2e_test.py --hf-id=AltEinstein/emc9 --hotkey=5G9BymEWsPac2CXZsar9DpScZAjacr69SgPRNpSDb1fPza8M --coldkey=5H4MtMvFdK2UKdwXMwpSc6h95K7MsK5n6BttYw7Gen93Ck4e

(replace the flag values with any model you want to validate)

Implementation

The flow looks roughly like this:

A Docker image is created of this codebase
The validator loop is running as usual
When a miner model is to be validated, instead of doing that locally, a Job is sent to the Compute Horde with Docker image from step 1 and miner model as an input
Validation is performed on the Compute Horde (runs the same code as it would locally, but the entrypoint is different, neurons/compute_horde_entrypoint.py)
Results are returned to the validator loop (step 2) and processed further as usual (ranking, setting weights)

TODO

Install compute-horde-sdk from PyPI rather than git

deval-core · 2025-03-04T11:25:25Z

            return False

-        if miner_state.chain_model_hash != model_hash:
+        if model_hash is not None and miner_state.chain_model_hash != model_hash:


These changes seem unnecessary? it's reducing code lines but obfuscating the log (i.e., unable to tease out if missing or a mismatch)

I did it this way so that it's possible to do some of the checks before we have the model hash or miner coldkey from docker (to avoid downloading the model here and in compute horde later too, like in my other comment). I changed the caller code to account for that: https://github.com/deval-core/De-Val/pull/68/files/6e8da07badffeac501b426c2f19285ff18af1f5d#diff-274d3bc59fd308b41d1dcd439b1385875eb9dbf11c1dfe915c3f596c3907cb15R185-R192

but if you'd like it to be done in a different way I'll see what I can do

deval-core · 2025-03-04T11:47:16Z


+    async def run_epoch_on_compute_horde(self, miner_state: ModelState) -> ModelState:
+        # Local validation that does not require Docker container.
+        is_valid = self.contest.validate_model(miner_state, None, None, 0, constants.max_model_size_gbs + 2)


This seems like it bypasses many of the validation checks we put in place to prevent cheating? Why would we need a separate validation step?

Yes, this does bypass some of the checks that were done here, but they are done later in the compute horde job: https://github.com/deval-core/De-Val/pull/68/files#diff-2c0eaf30c9dd4bcfecbb25b71de8aa26a5661f2bf60a057af68ad719a73485b6R43-R58 (all of them except the docker container size)

This was split this way due to some technical limitations:

doing all in the deval validator would require downloading the miner model twice (here and then in the compute horde)

doing all in the compute horde is not possible because it does not allow network connections

slawomir-gorawski-reef marked this pull request as ready for review February 27, 2025 13:23

deval-core reviewed Mar 4, 2025

View reviewed changes

Add option to use Compute Horde

dcc480f

slawomir-gorawski-reef force-pushed the compute-horde branch from 6e8da07 to dcc480f Compare March 5, 2025 14:27

slawomir-gorawski-reef changed the base branch from main to v1.5.0_staging March 5, 2025 14:27

ap-choji merged commit 1b18f4a into deval-core:v1.5.0_staging Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to use Compute Horde#68

Add option to use Compute Horde#68
ap-choji merged 1 commit into
deval-core:v1.5.0_stagingfrom
backend-developers-ltd:compute-horde

slawomir-gorawski-reef commented Feb 27, 2025 •

edited by ap-choji

Loading

Uh oh!

deval-core Mar 4, 2025

Uh oh!

slawomir-gorawski-reef Mar 4, 2025 •

edited

Loading

Uh oh!

deval-core Mar 4, 2025

Uh oh!

slawomir-gorawski-reef Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

slawomir-gorawski-reef commented Feb 27, 2025 • edited by ap-choji Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Implementation

TODO

Uh oh!

deval-core Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

slawomir-gorawski-reef Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deval-core Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

slawomir-gorawski-reef Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slawomir-gorawski-reef commented Feb 27, 2025 •

edited by ap-choji

Loading

slawomir-gorawski-reef Mar 4, 2025 •

edited

Loading