Allow environment variables in connection configs#943
Conversation
| #?(:clj (or (System/getenv v) | ||
| (throw (ex-info (str "Environment variable " v " unset.") | ||
| {:status 400 :error :db/missing-env-var}))) | ||
| :cljs nil) ; TODO: Support environment variable overrides in cljs |
There was a problem hiding this comment.
I think we should throw in this case rather than emit nil just to be on the safe side.
dpetran
left a comment
There was a problem hiding this comment.
Nice work! When this issue came up I figured it would be really tough to specify env vars for a configuration graph, but the @type "envVar" approach is really elegant.
♻️
yeah, implementing this the way i was trying to do it at first was really tough. the old solution was far less flexible both for the user and the implementation, and much less powerful. thankfully, something higher priority came up (vector search) so i laid this down for a bit to work on that. that gave me some unintentional hammock time on this problem that allowed the back of my brain to marinate on it for a while. when the priority for this bumped back up this idea came to me pretty quickly after i started working on it again. i think it's better than my previous solution in every way. |
|
Suggestion on syntax. I think instead of these holding a custom data type, they should hold instead a node... this IMO aligns much better with RDF So the node would simply look like: Which you can image, actually looks more like below once expanded/inferred... but that step doesn't need to happen... just better illustrates what the node would really look like 'virtually': So to rewrite your example above, it would instead look like below. Where the value for e.g. {
"@context": {
"@base": "https://ns.flur.ee/dev/config/",
"@vocab": "https://ns.flur.ee/system#"
},
"@id": "testSystem",
"@graph": [
...
{
"@id": "testConnection",
"@type": "Connection",
"parallelism": {
"envVar": "FLUREE_CONNECTION_PARALLELISM",
},
"cacheMaxMb": {
"envVar": "FLUREE_CACHE_MAX_MB"
}
}
]
} |
|
🤔
|
|
I'm curious if you could leverage integrant to do the resolution |
|
Any thoughts about ability to specifying default values? |
|
Also curious about supporting java properties in addition to env vars? The reason is that I can use System/setProperty in the repl and define values in deps.edn via jvmOpts in the profile. Env vars are readonly and I'm not sure of an easy way to set env vars when starting the repl form vscode. The best workaround I can think of to do this is sourcing an env file in bash and starting repl using command line and then attaching to vscode. Ability to use properties would be more convenient in repl and easier to write unit tests, if env and property resolution code paths are very similar than unit tests for properties could cover most of the env resolution as well. |
|
Also perhaps not directly related to this PR, but I'm trying to understand how profiles work |
|
I also find the parsed integrant config map somewhat difficult to read with the angle bracket substitutions ... |
Integrant would only work for the resolution if the configuration value being initialized/loaded was a subject node with an explicit Brian is proposing that we use an implicit subject node to represent environment variables instead of an |
Yes, that would be nice. I can't immediately think of a nice way to represent this using the I think it's another point for Brian's suggestion. We could do something like this if the env vars are represented as implicit subjects instead: {
"@context": {
"@base": "https://ns.flur.ee/dev/config/",
"@vocab": "https://ns.flur.ee/system#"
},
"@id": "testSystem",
"@graph": [
...
{
"@id": "testConnection",
"@type": "Connection",
"parallelism": {
"envVar": "FLUREE_CONNECTION_PARALLELISM",
"defaultVal": 32
},
"cacheMaxMb": {
"envVar": "FLUREE_CACHE_MAX_MB",
"defaultVal": 512
}
}
]
} |
For this particular use case, I think it would be best to just use a different "development" config file with explicit values that don't use environment variables or system properties at all. The config itself is just data, so you'd be able to change the explicit scalars in the config maps before passing it off to starting the system. That will always be simpler than relying on environment variables or system properties. In general though, I think we should eventually support Java properties because it's another way people will try to manage system configuration in production. We could and should support Java properties the same way we support env vars because there are lots of parallels between the two. I think either an Another option could be a modified subject node approach similar to Brian's suggestion, but we would think of the implied/inferred type of the subject node to be {
"@context": {
"@base": "https://ns.flur.ee/dev/config/",
"@vocab": "https://ns.flur.ee/system#"
},
"@id": "testSystem",
"@graph": [
...
{
"@id": "testConnection",
"@type": "Connection",
"parallelism": {
"envVar": "FLUREE_CONNECTION_PARALLELISM",
"javaProp": "connectionParallelism",
"defaultVal": 32
},
"cacheMaxMb": {
"envVar": "FLUREE_CACHE_MAX_MB",
"javaProp": "cachMaxMb",
"defaultVal": 512
}
}
]
} |
profiles used to work in the old configuration system, but they're as yet unimplemented here in the new JSON-LD based configuration. That's why you see profile arguments scattered throughout the code unused. They are meant to allow users to use the same config in different environments. Say in development, staging, and production. You would define a base config, and then you'd add a profiles map where the key was the environment and value was overrides that would be merged on top of the base config when you started in that environment. Then you would do When it's (re)implemented, a config map with profiles would look something like this: {
"@context": {
"@base": "https://ns.flur.ee/config/main/",
"@vocab": "https://ns.flur.ee/system#",
"profiles": {
"@container": [ "@graph", "@index" ]
}
},
"@id": "standaloneServer",
"@graph": [
{
"@id": "localDiskStorage",
"@type": "Storage",
"filePath": "/opt/fluree-server/data"
},
{
"@id": "connection",
"@type": "Connection",
"parallelism": 4,
"cacheMaxMb": 1000,
"commitStorage": {
"@id": "localDiskStorage"
},
"indexStorage": {
"@id": "localDiskStorage"
},
"primaryPublisher": {
"@type": "Publisher",
"storage": {
"@id": "localDiskStorage"
}
}
},
{
"@id": "consensus",
"@type": "Consensus",
"consensusProtocol": "standalone",
"maxPendingTxns": 512,
"connection": {
"@id": "connection"
}
},
{
"@id": "http",
"@type": "API",
"httpPort": 8090,
"maxTxnWaitMs": 120000
}
],
"profiles": {
"dev": [
{
"@id": "localDiskStorage",
"filePath": "dev/data"
},
{
"@id": "connection",
"cacheMaxMb": 200
},
{
"@id": "consensus",
"maxPendingTxns": 16
}
],
"staging": [
{
"@id": "connection",
"cacheMaxMb": 512
},
{
"@id": "consensus",
"maxPendingTxns": 64
}
]
}
} |
No. Integrant initializes components identified by keyword keys specifically because it uses Clojure's built in keyword hierarchy along with the
This is what the
Here are the official rules for what can appear in a Clojure symbol. The rules for keywords are the same as those for symbols, with some additional rules around colons. The reader itself is much more permissive in what you can get away with (now) because there is no validation when creating keywords/symbols, but we shouldn't rely on keywords that explicitly break the documented rules because that could cause subtle bugs later on. There is also the possibility that the reader could be made more stringent in the future and validate it's input, which would mean that our code would break if we relied on undocumented behavior. Furthermore, slashes, colons, and dots have special meaning in a keyword which are not intended in these config symbols, which is why I also decided to escape those as well. I opted for this bit of complexity to allow us to use Integrant for initialization. It's a feature rich library for initializing and halting complex component hierarchies, which is exactly what we need here. There needed to be some massaging to transform the JSON-LD config format into an integrant config map in exchange for being able to use all the features integrant provides. The alternative would have been to reimplement what's in integrant ourselves. Also, the compiled integrant config map is an internal representation that users (or even developers) should not have to interact with in practice after everything settles down. |
|
I like the below, because it allows for something that has bothered me:
We've had issues where people have a misspelling in env var, and we happily assign a default which in some circumstances could be a significant issue. I like a default being explicit, and if not default and no envVar, I think we should throw/exit - as it probably is an environment variable not coming through properly. |
|
Oh I really like that @bplatz, sensible or at least explicit defaults can make a big difference for usability. |
|
@fluree/core I've re-worked this based on the discussion above. Instead of using a value node to encode environment variables, I went with an approach inspired by @bplatz's suggestion for using subject nodes with specific properties and, informed by @kurtharriger's questions about default values and java system properties, I've included properties so users can reference environment variables, java system properties, and default values in the same node. I've set up a hierarchy for these. First, we check the environment variable if that property is specified, then, if the environment variable isn't set or the env var property isn't specified, we check if the java system property property is specified. If that property isn't set ore the java property property isn't specified, we fall back to the default value. if that isn't set, then we throw an exception. Here is an example json-ld config using this new scheme: {
"@context": {
"@base": "https://ns.flur.ee/dev/config/",
"@vocab": "https://ns.flur.ee/system#"
},
"@id": "testSystem",
"@graph": [
...
{
"@id": "testConnection",
"@type": "Connection",
"parallelism": {
"envVar": "FLUREE_CONNECTION_PARALLELISM",
"javaProp": "connectionParallelism",
"defaultVal": 32
},
"cacheMaxMb": {
"envVar": "FLUREE_CACHE_MAX_MB",
"javaProp": "cachMaxMb",
"defaultVal": 512
}
}
]
}EDIT: |
|
I would suggest that if both env and java system property is specified the system property should have priority. For 2 reasons: |
I think you bring up a good point about the scope arguments, so I'll switch the priority of system properties and environment variables. As far as the tests go however, we would again just use a different config that wouldn't reference any environment variables or system properties for tests and development. Env vars and java properties really only make sense for production and production like deployments where some configuration values may be sensitive so they should be stored separately from the configuration and executable of the system and injected outside of band at startup. I don't think any of our credentials should be sensitive for development or test environments, so we should use specific configs that contain all the information the development or test systems need to run. |
This patch allows a user to specify environment variables in json-ld connection configs. The config system will then load the actual config values from the specified environment variable. Users specify the environment variable by setting the
"@type"of a value map to "https://ns.flur.ee/system#envVar", as in the following example{ "@context": { "@base": "https://ns.flur.ee/dev/config/", "@vocab": "https://ns.flur.ee/system#" }, "@id": "testSystem", "@graph": [ ... { "@id": "testConnection", "@type": "Connection", "parallelism": { "@value": "FLUREE_CONNECTION_PARALLELISM", "@type": "envVar" }, "cacheMaxMb": { "@value": "FLUREE_CACHE_MAX_MB", "@type": "envVar" } } ] }