Conversation
|
@dmehra @demmer @dmajda @mstemm fixing the situation identified by issue #6 and I'll update the documentation here in this PR once the initial review for the change goes through. I'm thinking that if we set the limit for the only existing format for write to be 100K right now then what we can do in a follow up PR is to add the ability to |
095a42e to
6bd1558
Compare
|
@demmer initial approach of using the scheduler, doing this on a 1s interval seems a little "harsh" so I'm open to better suggestion but if we do things with a larger interval than the program would have to wait up to N additional seconds after finishing to actually complete and have the data flushed. |
lib/adapters/file/write.js
Outdated
There was a problem hiding this comment.
It is also possible to use this.limit = options.limit || 100000, which is somewhat shorter.
There was a problem hiding this comment.
We're missing a check that limit should be non-negative.
Nice! So no more OOMs? IIUC, we still read & parse the whole file...
+10k - I agree with you that this is the way to go. |
|
@bkutil taking care of the various suggestions all good catches. The issue with OOM'ing because we'd have an existing 1GB JSON file I'm not sure how we'd fix that at this point. I think another simple limitation would be that we'd fail to open files biger than 200MB ? impose some hard limit on the size of the file ? @demmer @dmajda what do you think of this ? I guess we'd also have to expose some way of overriding the limitation but at least this way we could avoid blowing things up in a nasty way. |
lib/adapters/file/write.js
Outdated
There was a problem hiding this comment.
It seems better to me to have eof trigger an immediate flush instead of just setting isdone and waiting.
This should be trivial to handle now since all the fs operations are async, but as @bkutil pointed out, they should be changed to be async, so you'll need to make sure you don't race between the scheduler callback and the immediate eof from flush. I think it's easy to handle by just setting isdone immediately after calling flush so that you know a subsequent callback won't race.
|
As far as the limit on the size of the file, I guess you'll need to have two options: |
|
But I'll document that maxSize is the file size while -limit is the number of points in said file. |
6bd1558 to
5372fef
Compare
lib/adapters/file/write.js
Outdated
There was a problem hiding this comment.
Looks like this won't ever call done() if there aren't any points buffered in the queue when the EOF arrives.
You should fix this by always entering the promise chain even if the queue length is empty and have a test case for it.
5372fef to
6c4f0ed
Compare
|
@demmer @bkutil another look at the current solution. I opted for not having a |
|
Things to finish up:
|
a9643e7 to
b99cbbd
Compare
b99cbbd to
9f4ca85
Compare
|
Overall this looks good to me now since the above comments are just a few minor suggestions. |
easier to further developer features on the file adapter.
9f4ca85 to
58dce8e
Compare
|
Ok merging on green as I've taken care of all @demmer 's suggestions and I think this is ready for prime time. |
58dce8e to
3815d6b
Compare
|
@dmehra quick review on documentation additions: https://github.com/juttle/juttle/pull/11/files#diff-8cca8547a0a58a3918b9eb089cdda8a6 |
|
Doc review comment: how about a rephrase so defaults go into the "Required?" column, and don't mention |
the fix for issue #6 is currently to set a write limit so that we simply limit writing out no more than 100K JSON points in the currently supported writing format of JSON. while fixing this issue I've also moved the writes to happen asynchronously using the scheduler to schedule the syncing of data to and from disk. future work could be to include streamable writing formats such as CSV, or JSONL (jsonlines: http://jsonlines.org/) made all filesystem interactions async and cleaned up any outstanding issues with limit with this added a new parameter -limit to dictate the maximum number of points that can be handled when writing to a JSON file and also added the -maxFilesize which dictates the maximum size in bytes of a JSON file that we can handle when writing. In order to avoid out of memory decided on using a -flushFrequency option to specify how many points to wait before flushing and opting for not doing the flushing based on time since 1) the code was getting way more complicated to manage the constant async callbacks flushing along with the final flush from eof() 2) its odd to keep doing async flushes every N seconds when nothing is there to flush and having all the gaurds in place to avoid reading/writing when there is no data to actually flush.
3815d6b to
956e7be
Compare
|
@dmehra fixed! |
…iles Fix file adapter to handle large files
this PR reworks the file adapter so its easier to add more features in
the future by splitting the read and write up into individual modules.
the fix for issue #6 is currently to set a write limit so that we
simply limit writing out no more than 100K JSON points in the currently
supported writing format of JSON.
future work could be to include streamable writing formats such as CSV,
or JSONL (jsonlines: http://jsonlines.org/)