Skip to content

Feature/sid migration task#87

Merged
dpetran merged 10 commits intomainfrom
feature/sid-migration-task
Aug 19, 2024
Merged

Feature/sid migration task#87
dpetran merged 10 commits intomainfrom
feature/sid-migration-task

Conversation

@dpetran
Copy link
Contributor

@dpetran dpetran commented Aug 10, 2024

This introduces the concept of a task.

You can supply a config with a task key whose value is a map that identifies a task and any arguments it needs. The task is then dispatched on id with a connection and any arguments supplied. A task config will not run the http server or consensus, though that will need to be revisited once we add back support for raft consensus.

The sid migration takes a list of ledger aliases and migrates them one at a time.

There is an example task config in the resources directory.

You would run the sid migration by starting server like this:

<regular jar invocation start> -p task -c <path-to-task-config>

This introduces the concept of a task. You can supply a config with a task key whose
value is a map that identifies a task and any arguments it needs. The task is then
dispatched on id with a connection and any arguments supplied. Tasks do not allow the
http server to be running, though once we support consensus again that decision will
need to be revisited. There is an example task config in the resources directory.

The sid migration takes a list of ledger aliases and migrates them one at a time.
Can override the skipping by supplying a "force" true argument in the task config.
@dpetran
Copy link
Contributor Author

dpetran commented Aug 12, 2024

Depends on fluree/db#859

@dpetran dpetran requested a review from a team August 12, 2024 22:24
read-file
(finalize profile))))
(cond->
(= profile :task) (task-config profile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overloads the concept of a profile. A profile is meant to override values in a pre-defined configuration, and a task is a wholly new configuration. Is there an issue with a user supplying a separate configuration file when they want to run a migration?

Comment on lines +17 to +21
"task": {
"id": "migrate/sid",
"ledgers": ["ledger1", "ledger2"],
"force": false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer if it was just:

Suggested change
"task": {
"id": "migrate/sid",
"ledgers": ["ledger1", "ledger2"],
"force": false
}
"sidMigration": {
"ledgers": ["ledger1", "ledger2"],
"force": false
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually started with that design, but if we add more tasks in the future then we'll have a proliferation of config schemas, whereas by nesting it under the "task" key we only need to adapt that one schema. If you feel strongly about it I can change it back, but I thought this would lead to a more stable abstraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a difference, but I could be wrong. The way it is now, we have to check specific arguments for each task with a different schema as we add more tasks, and we dispatch on the "id" key to decide which schema to use. I don't think we can expect to know the precise arguments for each new task type before they are defined.

Instead, I'm proposing that we configure specific modules in each system in their own sub-map. For example, what if people want to run multiple tasks at once? What about multiple tasks on different sets of ledgers?

Unless I'm missing something, each new task will add a new schema. The only difference is whether that new schema is validated directly or if all of the "task" schemas are grouped into a multi schema dispatched on "id", but there will be at least one schema for each task either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These I think are the important parts of a task abstraction:

  • no need to add new integrant handlers
  • localized schema updates
  • defined order of execution

I like nesting the task config underneath the task keyword because it lets us reuse the integrant handler for every task.

I think both approaches are rather localized for schema updates, though if you want to define multiple tasks we'd either need a vector of task-maps in the style that I wrote or a vector of task-ids in the style you wrote. Or we could just elide that entirely and say you can only run one task at time and that will force sequential execution.

I do think we should try to get the abstraction right so users don't have to learn multiple incompatible approaches to running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate key for the migration, and any other commands like this we'll run because the configuration is clearer. "reusing the integrant handler" doesn't make any sense to me. Each migration, or "task" as you put it, will have an entry point. Either there is a case statement that we'll have to modify each time we need to add something, or we define a new method for a new key. There is no difference.

These "tasks" are independent components in the system, so they should get independent configurations, with independent initializers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just rip out the whole task abstraction then? I think it's useful but if you don't we can do it your way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it out, and I see what you mean - the difference is dispatching on ig/init-key vs fluree.server.task/run-task, so they're basically the same. No task abstraction means slightly more config but a shallower call stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the task abstraction actually had more config. With tasks, you had a task config which included an id along side the config for that id. The config is flatter how it is now, and you just have the config for each task that lives in its own key; no extra task key, no extra id key, and no need for an extra dispatch in the config parser.

[fluree.db.util.log :as log]
[malli.core :as m]))

(def MigrateSidTask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do all validation up front, and not scatter malli throughout the codebase. This schema should live in the fluree.server.config namespace.

@@ -0,0 +1,21 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file should have a different name now.


(def coerce
(m/coercer ::config transform/string-transformer {:registry registry}))
(m/coercer [:or ::config ::sid-migration-config] transform/string-transformer {:registry registry}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these configs would be more extensible going forward as a multi schema instead of ::or. The initial dispatch function could just be this:

(fn [scm]
  (if (contains? scm "sidMigration")
    ::sid-migration-config
    ::config))

We could modify that function (probably by replacing the if with a cond and adding more contains? checks) as we add more ways of running the system and their corresponding config types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the multi-schema add that :or doesn't provide? I started doing this style of indirection before but I didn't want to invent a new name for regular runtime configs and the :or seemed to deliver all the functionality I needed without any additional fuss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality wise in this situation? Not much. I was more thinking about extensibility and clarifying intent, but I'm fine with it if you prefer it this way.

Comment on lines +59 to +62
(defmethod ig/expand-key ::config/sid-migration
[_ sid-migration]
{:fluree/sid-migration {:sid-migration sid-migration
:conn (ig/ref :fluree/connection)}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor, but the resulting config map would be easier to deal with if it was flattened didn't have this extra ::sid-migragion key inside of it. I think this would be better:

Suggested change
(defmethod ig/expand-key ::config/sid-migration
[_ sid-migration]
{:fluree/sid-migration {:sid-migration sid-migration
:conn (ig/ref :fluree/connection)}})
(defmethod ig/expand-key ::config/sid-migration
[_ config]
(let [config* (assoc config :conn (ig/ref :fluree/connection))]
{:fluree/sid-migration config*}))


(defmethod ig/init-key :fluree/sid-migration
[_ {:keys [conn sid-migration]}]
(let [{:keys [ledgers force]} sid-migration]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is why I think my previous suggestion 👆🏾 would be clearer

Copy link
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩰

@dpetran dpetran merged commit 39b9891 into main Aug 19, 2024
@dpetran dpetran deleted the feature/sid-migration-task branch August 19, 2024 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants