Skip to content

Add option to use Compute Horde#68

Merged
ap-choji merged 1 commit into
deval-core:v1.5.0_stagingfrom
backend-developers-ltd:compute-horde
Mar 5, 2025
Merged

Add option to use Compute Horde#68
ap-choji merged 1 commit into
deval-core:v1.5.0_stagingfrom
backend-developers-ltd:compute-horde

Conversation

@slawomir-gorawski-reef

@slawomir-gorawski-reef slawomir-gorawski-reef commented Feb 27, 2025

Copy link
Copy Markdown

Adds flag --neuron.use_compute_horde for the validator, to perform computations on the Compute Horde instead of locally.

Warning

Compute Horde is not 100% production ready yet, don't set this flag in real validators.

Usage

Requires some new COMPUTE_HORDE_ settings in .env:

COMPUTE_HORDE_VALIDATOR_HOTKEY=<your hotkey>
# optional, defaults to production
# COMPUTE_HORDE_FACILITATOR_URL=
# optional, defaults to LLM A6000
# COMPUTE_HORDE_EXECUTOR_CLASS=
# This is this repo's Docker image, so you can docker build . -t <your image> and use that
COMPUTE_HORDE_JOB_DOCKER_IMAGE=backenddevelopersltd/slawek-test:v0-latest
COMPUTE_HORDE_JOB_NAMESPACE=SN15.0

To run the validator using Compute Horde, register as validator on SN15 (see the README for validators) and run:

poetry run python neurons/validator.py --netuid 15     --subtensor.network finney     --wallet.name default     --wallet.hotkey default     --logging.debug     --logging.trace     --axon.port 3000     --neuron.model_ids 'gpt-4o,gpt-4o-mini,mistral-7b,Claude-3.5,command-r-plus' --neuron.axon_off --neuron.num_task_examples=1 --neuron.disable_set_weights --neuron.use_compute_horde

You can also use pm2 as recommended.

To run a single job (miner model) on the Compute Horde without starting a validator loop – this doesn't require registration and acts as a "sanity check":

poetry run python scripts/compute_horde_e2e_test.py --hf-id=AltEinstein/emc9 --hotkey=5G9BymEWsPac2CXZsar9DpScZAjacr69SgPRNpSDb1fPza8M --coldkey=5H4MtMvFdK2UKdwXMwpSc6h95K7MsK5n6BttYw7Gen93Ck4e

(replace the flag values with any model you want to validate)

Implementation

The flow looks roughly like this:

  1. A Docker image is created of this codebase
  2. The validator loop is running as usual
  3. When a miner model is to be validated, instead of doing that locally, a Job is sent to the Compute Horde with Docker image from step 1 and miner model as an input
  4. Validation is performed on the Compute Horde (runs the same code as it would locally, but the entrypoint is different, neurons/compute_horde_entrypoint.py)
  5. Results are returned to the validator loop (step 2) and processed further as usual (ranking, setting weights)

TODO

  • Install compute-horde-sdk from PyPI rather than git

@slawomir-gorawski-reef slawomir-gorawski-reef marked this pull request as ready for review February 27, 2025 13:23
Comment thread deval/contest.py
return False

if miner_state.chain_model_hash != model_hash:
if model_hash is not None and miner_state.chain_model_hash != model_hash:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem unnecessary? it's reducing code lines but obfuscating the log (i.e., unable to tease out if missing or a mismatch)

@slawomir-gorawski-reef slawomir-gorawski-reef Mar 4, 2025

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way so that it's possible to do some of the checks before we have the model hash or miner coldkey from docker (to avoid downloading the model here and in compute horde later too, like in my other comment). I changed the caller code to account for that: https://github.com/deval-core/De-Val/pull/68/files/6e8da07badffeac501b426c2f19285ff18af1f5d#diff-274d3bc59fd308b41d1dcd439b1385875eb9dbf11c1dfe915c3f596c3907cb15R185-R192

but if you'd like it to be done in a different way I'll see what I can do

Comment thread deval/validator.py

async def run_epoch_on_compute_horde(self, miner_state: ModelState) -> ModelState:
# Local validation that does not require Docker container.
is_valid = self.contest.validate_model(miner_state, None, None, 0, constants.max_model_size_gbs + 2)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it bypasses many of the validation checks we put in place to prevent cheating? Why would we need a separate validation step?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does bypass some of the checks that were done here, but they are done later in the compute horde job: https://github.com/deval-core/De-Val/pull/68/files#diff-2c0eaf30c9dd4bcfecbb25b71de8aa26a5661f2bf60a057af68ad719a73485b6R43-R58 (all of them except the docker container size)

This was split this way due to some technical limitations:

  • doing all in the deval validator would require downloading the miner model twice (here and then in the compute horde)
  • doing all in the compute horde is not possible because it does not allow network connections

@slawomir-gorawski-reef slawomir-gorawski-reef changed the base branch from main to v1.5.0_staging March 5, 2025 14:27
@ap-choji ap-choji merged commit 1b18f4a into deval-core:v1.5.0_staging Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants