Conversation
This introduces the concept of a task. You can supply a config with a task key whose value is a map that identifies a task and any arguments it needs. The task is then dispatched on id with a connection and any arguments supplied. Tasks do not allow the http server to be running, though once we support consensus again that decision will need to be revisited. There is an example task config in the resources directory. The sid migration takes a list of ledger aliases and migrates them one at a time.
Can override the skipping by supplying a "force" true argument in the task config.
|
Depends on fluree/db#859 |
src/fluree/server/config.clj
Outdated
| read-file | ||
| (finalize profile)))) | ||
| (cond-> | ||
| (= profile :task) (task-config profile) |
There was a problem hiding this comment.
This overloads the concept of a profile. A profile is meant to override values in a pre-defined configuration, and a task is a wholly new configuration. Is there an issue with a user supplying a separate configuration file when they want to run a migration?
resources/config-task.json
Outdated
| "task": { | ||
| "id": "migrate/sid", | ||
| "ledgers": ["ledger1", "ledger2"], | ||
| "force": false | ||
| } |
There was a problem hiding this comment.
I think this would be clearer if it was just:
| "task": { | |
| "id": "migrate/sid", | |
| "ledgers": ["ledger1", "ledger2"], | |
| "force": false | |
| } | |
| "sidMigration": { | |
| "ledgers": ["ledger1", "ledger2"], | |
| "force": false | |
| } |
There was a problem hiding this comment.
I actually started with that design, but if we add more tasks in the future then we'll have a proliferation of config schemas, whereas by nesting it under the "task" key we only need to adapt that one schema. If you feel strongly about it I can change it back, but I thought this would lead to a more stable abstraction.
There was a problem hiding this comment.
I don't think there's a difference, but I could be wrong. The way it is now, we have to check specific arguments for each task with a different schema as we add more tasks, and we dispatch on the "id" key to decide which schema to use. I don't think we can expect to know the precise arguments for each new task type before they are defined.
Instead, I'm proposing that we configure specific modules in each system in their own sub-map. For example, what if people want to run multiple tasks at once? What about multiple tasks on different sets of ledgers?
Unless I'm missing something, each new task will add a new schema. The only difference is whether that new schema is validated directly or if all of the "task" schemas are grouped into a multi schema dispatched on "id", but there will be at least one schema for each task either way.
There was a problem hiding this comment.
These I think are the important parts of a task abstraction:
- no need to add new integrant handlers
- localized schema updates
- defined order of execution
I like nesting the task config underneath the task keyword because it lets us reuse the integrant handler for every task.
I think both approaches are rather localized for schema updates, though if you want to define multiple tasks we'd either need a vector of task-maps in the style that I wrote or a vector of task-ids in the style you wrote. Or we could just elide that entirely and say you can only run one task at time and that will force sequential execution.
I do think we should try to get the abstraction right so users don't have to learn multiple incompatible approaches to running.
There was a problem hiding this comment.
I think we should have a separate key for the migration, and any other commands like this we'll run because the configuration is clearer. "reusing the integrant handler" doesn't make any sense to me. Each migration, or "task" as you put it, will have an entry point. Either there is a case statement that we'll have to modify each time we need to add something, or we define a new method for a new key. There is no difference.
These "tasks" are independent components in the system, so they should get independent configurations, with independent initializers.
There was a problem hiding this comment.
Should I just rip out the whole task abstraction then? I think it's useful but if you don't we can do it your way.
There was a problem hiding this comment.
I took it out, and I see what you mean - the difference is dispatching on ig/init-key vs fluree.server.task/run-task, so they're basically the same. No task abstraction means slightly more config but a shallower call stack.
There was a problem hiding this comment.
I think the task abstraction actually had more config. With tasks, you had a task config which included an id along side the config for that id. The config is flatter how it is now, and you just have the config for each task that lives in its own key; no extra task key, no extra id key, and no need for an extra dispatch in the config parser.
| [fluree.db.util.log :as log] | ||
| [malli.core :as m])) | ||
|
|
||
| (def MigrateSidTask |
There was a problem hiding this comment.
I think we should do all validation up front, and not scatter malli throughout the codebase. This schema should live in the fluree.server.config namespace.
merged the sid migration branch
Also removed profile overloading.
| @@ -0,0 +1,21 @@ | |||
| { | |||
There was a problem hiding this comment.
I think this file should have a different name now.
|
|
||
| (def coerce | ||
| (m/coercer ::config transform/string-transformer {:registry registry})) | ||
| (m/coercer [:or ::config ::sid-migration-config] transform/string-transformer {:registry registry})) |
There was a problem hiding this comment.
I think these configs would be more extensible going forward as a multi schema instead of ::or. The initial dispatch function could just be this:
(fn [scm]
(if (contains? scm "sidMigration")
::sid-migration-config
::config))We could modify that function (probably by replacing the if with a cond and adding more contains? checks) as we add more ways of running the system and their corresponding config types.
There was a problem hiding this comment.
What does the multi-schema add that :or doesn't provide? I started doing this style of indirection before but I didn't want to invent a new name for regular runtime configs and the :or seemed to deliver all the functionality I needed without any additional fuss.
There was a problem hiding this comment.
Functionality wise in this situation? Not much. I was more thinking about extensibility and clarifying intent, but I'm fine with it if you prefer it this way.
src/fluree/server/system.clj
Outdated
| (defmethod ig/expand-key ::config/sid-migration | ||
| [_ sid-migration] | ||
| {:fluree/sid-migration {:sid-migration sid-migration | ||
| :conn (ig/ref :fluree/connection)}}) |
There was a problem hiding this comment.
This is minor, but the resulting config map would be easier to deal with if it was flattened didn't have this extra ::sid-migragion key inside of it. I think this would be better:
| (defmethod ig/expand-key ::config/sid-migration | |
| [_ sid-migration] | |
| {:fluree/sid-migration {:sid-migration sid-migration | |
| :conn (ig/ref :fluree/connection)}}) | |
| (defmethod ig/expand-key ::config/sid-migration | |
| [_ config] | |
| (let [config* (assoc config :conn (ig/ref :fluree/connection))] | |
| {:fluree/sid-migration config*})) |
src/fluree/server/system.clj
Outdated
|
|
||
| (defmethod ig/init-key :fluree/sid-migration | ||
| [_ {:keys [conn sid-migration]}] | ||
| (let [{:keys [ledgers force]} sid-migration] |
There was a problem hiding this comment.
And this is why I think my previous suggestion 👆🏾 would be clearer
This introduces the concept of a task.
You can supply a config with a task key whose value is a map that identifies a task and any arguments it needs. The task is then dispatched on id with a connection and any arguments supplied. A task config will not run the http server or consensus, though that will need to be revisited once we add back support for raft consensus.
The sid migration takes a list of ledger aliases and migrates them one at a time.
There is an example task config in the resources directory.
You would run the sid migration by starting server like this:
<regular jar invocation start> -p task -c <path-to-task-config>