Feature/sid migration task by dpetran · Pull Request #87 · fluree/server

dpetran · 2024-08-10T14:11:10Z

This introduces the concept of a task.

You can supply a config with a task key whose value is a map that identifies a task and any arguments it needs. The task is then dispatched on id with a connection and any arguments supplied. A task config will not run the http server or consensus, though that will need to be revisited once we add back support for raft consensus.

The sid migration takes a list of ledger aliases and migrates them one at a time.

There is an example task config in the resources directory.

You would run the sid migration by starting server like this:

<regular jar invocation start> -p task -c <path-to-task-config>

This introduces the concept of a task. You can supply a config with a task key whose value is a map that identifies a task and any arguments it needs. The task is then dispatched on id with a connection and any arguments supplied. Tasks do not allow the http server to be running, though once we support consensus again that decision will need to be revisited. There is an example task config in the resources directory. The sid migration takes a list of ledger aliases and migrates them one at a time.

Can override the skipping by supplying a "force" true argument in the task config.

dpetran · 2024-08-12T22:23:54Z

Depends on fluree/db#859

zonotope · 2024-08-14T12:27:45Z

src/fluree/server/config.clj

       read-file
-       (finalize profile))))
+       (cond->
+        (= profile :task) (task-config profile)


This overloads the concept of a profile. A profile is meant to override values in a pre-defined configuration, and a task is a wholly new configuration. Is there an issue with a user supplying a separate configuration file when they want to run a migration?

zonotope · 2024-08-14T12:32:57Z

resources/config-task.json

+  "task": {
+    "id": "migrate/sid",
+    "ledgers": ["ledger1", "ledger2"],
+    "force": false
+  }


I think this would be clearer if it was just:

Suggested change

"task": {

"id": "migrate/sid",

"ledgers": ["ledger1", "ledger2"],

"force": false

}

"sidMigration": {

"ledgers": ["ledger1", "ledger2"],

"force": false

}

I actually started with that design, but if we add more tasks in the future then we'll have a proliferation of config schemas, whereas by nesting it under the "task" key we only need to adapt that one schema. If you feel strongly about it I can change it back, but I thought this would lead to a more stable abstraction.

I don't think there's a difference, but I could be wrong. The way it is now, we have to check specific arguments for each task with a different schema as we add more tasks, and we dispatch on the "id" key to decide which schema to use. I don't think we can expect to know the precise arguments for each new task type before they are defined.

Instead, I'm proposing that we configure specific modules in each system in their own sub-map. For example, what if people want to run multiple tasks at once? What about multiple tasks on different sets of ledgers?

Unless I'm missing something, each new task will add a new schema. The only difference is whether that new schema is validated directly or if all of the "task" schemas are grouped into a multi schema dispatched on "id", but there will be at least one schema for each task either way.

These I think are the important parts of a task abstraction:

no need to add new integrant handlers

localized schema updates

defined order of execution

I like nesting the task config underneath the task keyword because it lets us reuse the integrant handler for every task.

I think both approaches are rather localized for schema updates, though if you want to define multiple tasks we'd either need a vector of task-maps in the style that I wrote or a vector of task-ids in the style you wrote. Or we could just elide that entirely and say you can only run one task at time and that will force sequential execution.

I do think we should try to get the abstraction right so users don't have to learn multiple incompatible approaches to running.

I think we should have a separate key for the migration, and any other commands like this we'll run because the configuration is clearer. "reusing the integrant handler" doesn't make any sense to me. Each migration, or "task" as you put it, will have an entry point. Either there is a case statement that we'll have to modify each time we need to add something, or we define a new method for a new key. There is no difference.

These "tasks" are independent components in the system, so they should get independent configurations, with independent initializers.

Should I just rip out the whole task abstraction then? I think it's useful but if you don't we can do it your way.

I took it out, and I see what you mean - the difference is dispatching on ig/init-key vs fluree.server.task/run-task, so they're basically the same. No task abstraction means slightly more config but a shallower call stack.

I think the task abstraction actually had more config. With tasks, you had a task config which included an id along side the config for that id. The config is flatter how it is now, and you just have the config for each task that lives in its own key; no extra task key, no extra id key, and no need for an extra dispatch in the config parser.

zonotope · 2024-08-14T12:37:00Z

src/fluree/server/task/migrate_sid.clj

+            [fluree.db.util.log :as log]
+            [malli.core :as m]))
+
+(def MigrateSidTask


I think we should do all validation up front, and not scatter malli throughout the codebase. This schema should live in the fluree.server.config namespace.

merged the sid migration branch

Also removed profile overloading.

zonotope · 2024-08-19T13:26:02Z

resources/config-task.json

@@ -0,0 +1,21 @@
+{


I think this file should have a different name now.

zonotope · 2024-08-19T13:35:15Z

src/fluree/server/config.clj


 (def coerce
-  (m/coercer ::config transform/string-transformer {:registry registry}))
+  (m/coercer [:or ::config ::sid-migration-config] transform/string-transformer {:registry registry}))


I think these configs would be more extensible going forward as a multi schema instead of ::or. The initial dispatch function could just be this:

(fn [scm] (if (contains? scm "sidMigration") ::sid-migration-config ::config))

We could modify that function (probably by replacing the if with a cond and adding more contains? checks) as we add more ways of running the system and their corresponding config types.

What does the multi-schema add that :or doesn't provide? I started doing this style of indirection before but I didn't want to invent a new name for regular runtime configs and the :or seemed to deliver all the functionality I needed without any additional fuss.

Functionality wise in this situation? Not much. I was more thinking about extensibility and clarifying intent, but I'm fine with it if you prefer it this way.

zonotope · 2024-08-19T13:49:25Z

src/fluree/server/system.clj

+(defmethod ig/expand-key ::config/sid-migration
+  [_ sid-migration]
+  {:fluree/sid-migration {:sid-migration sid-migration
+                          :conn (ig/ref :fluree/connection)}})


This is minor, but the resulting config map would be easier to deal with if it was flattened didn't have this extra ::sid-migragion key inside of it. I think this would be better:

Suggested change

(defmethod ig/expand-key ::config/sid-migration

[_ sid-migration]

{:fluree/sid-migration {:sid-migration sid-migration

:conn (ig/ref :fluree/connection)}})

(defmethod ig/expand-key ::config/sid-migration

[_ config]

(let [config* (assoc config :conn (ig/ref :fluree/connection))]

{:fluree/sid-migration config*}))

zonotope · 2024-08-19T13:50:31Z

src/fluree/server/system.clj


+(defmethod ig/init-key :fluree/sid-migration
+  [_ {:keys [conn sid-migration]}]
+  (let [{:keys [ledgers force]} sid-migration]


And this is why I think my previous suggestion 👆🏾 would be clearer

zonotope

🩰

dpetran added 4 commits August 10, 2024 08:43

add support for skipping already-migrated ledgers

ac5e3d5

Can override the skipping by supplying a "force" true argument in the task config.

fix formatting

3e535ad

update db dep

5276dd9

dpetran requested a review from a team August 12, 2024 22:24

zonotope reviewed Aug 14, 2024

View reviewed changes

dpetran added 4 commits August 14, 2024 09:54

update db dep to main

bc84c4b

merged the sid migration branch

consolidate config validation

ee75182

Also removed profile overloading.

dispatch migration from integrant instead of run-task

824882b

remove task abstraction

ce18cc8

zonotope reviewed Aug 19, 2024

View reviewed changes

dpetran added 2 commits August 19, 2024 13:22

rename example migration config file

253f893

flatten sid migration config

a684d29

zonotope approved these changes Aug 19, 2024

View reviewed changes

dpetran merged commit 39b9891 into main Aug 19, 2024

dpetran deleted the feature/sid-migration-task branch August 19, 2024 21:48

Conversation

dpetran commented Aug 10, 2024

Uh oh!

dpetran commented Aug 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zonotope left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants