Any ideas for scaling?#53
Conversation
|
Hi @mitom Thats a pretty good question, and I must admit that so far, I haven't thought about this issue.
This is exactly what I was thinking. Gaufrette is quite helpful considering this use case.
Well, even if it is un-effective, it sounds like a reasonable way to go. Assumed that Gaufrette is able to move files directly between two filesystems. And actually I'm pretty sure someone told me once that this is not the case. If this is not given, the chunk directory must be on the same connected filesystem like the final place for uploaded files. Another solution - even if it seems more like a hack for me - would be to maintain a list of chunks on some kind of shared service along with their dedicated server (and of course offset and other metadata). This could be a Redis instance, memcache or the like. I'm not a big fan of this because it sounds like it would be pretty prone to failure. I'll take a look at Gaufrettes source code the other day. But maybe their are other possible solutions? I'm open for everything. |
If I remember correctly it does not move files, only reads them and writes them so moving things with gaufrette is a bit overhead-ish. However I am not certain in this, I'll also take a look at the source tomorrow to see if there are other options On the other hand, we could use the same method for writing the chunks as you used for uploading them. As in simply reading the current chunk, and appending it to the end of the already assembled chunks-file (we should agree on naming these things for clarity..) in The only problem would be if someone used an uploader that uploads multiple parts at once - I have no idea for this scenario, I think the assembling of chunks at once already makes this an unefficient way, but still.. I think introducing a cache to store the location for the chunks adds a lot of dependencies - including having to maintain the database itself for it, so I would agree to avoid that if there are other ways. At the moment I have no other ideas, but I'll give it another thought tomorrow. |
This one I can already answer for myself, only the Local backend implements StreamFactory otherwise InMemory adapter is used which would mean moving the file forth and back and into memory for every chunk. I guess that wouldn't work with bigger files. On the other hand if it were a The above is obviously the most optimal case, if someone would store there files on some other backend the So all in one actually if someone would be storing their files on e.g. S3, this would be quite the overhead for them I guess, but I still see no other way around it other than solving it on the load-balancing level, which I think could be risky. Still I think you also agree this behaviour should be configurable, if they disable it and go with an other way, the changes wouldn't affect them. An other option that came to my mind, which might fall out of the scope of this bundle and would only improve things for those with other than Let me know your thoughts on these, or if you had a better idea? |
|
Any remarks on my propositions? I'd be happy to help you out (though I am quite busy these days). Let me know your opinion, then I'll try to find some time for it and make a PR on the gaufrette tmp part. The other one is less important to me, but if I get more free time I would also be happy to work on that too. |
|
Hi @mitom I'm sorry it took long, but I'm unfortunately quite busy too. Apperently a load balancer can be configured to be session sticky. This would be the simplest and unfortunately the worst solution. You mentioned that in your initial comment:
I think you correctly summarized the problem (as well as the pitfalls) of using Gaufrette for the temporary chunk directory. There is however a limitation you already mentioned. Using Gaufrette with two different storage layers (one for chunk directory and one for the final storage) is a terribly bad solution, as it would suck up way too much memory/time. I'm not sure if this should be prevented, or just mentioned in the documentation, because I think someone should be free to hazard the consequences.
Is this really possible? I guess the only thing we could check is if the provided services matches. Should be enough however. :)
Recently I came accross an idea regarding this issue. Someone mentioned that uploading the final file to a backend like S3 should take place asynchronously. Otherwise it could happen that a client faces a timeout after having uploaded the last chunk. (After which the actual move to the final destination takes place.) How about introducing a configurable option, if the fully uploaded files should be moved asynchronously, performed by a command (run a cronjob or the like). This way, a user could still choose to upload the chunks to a Local backend and move it to his S3/whatever afterwards. But this way the user won't be able return a file path to the frontend to immediately address the file. Actually this option is something that does not fully belong to this issue. This is why I made it a seperate point in the list of the Release-1.0 pull request. To summarize the above:
Is this more or less what you meant? Thanks again for putting such effort in this! |
I think that was me in my previous comment, only I had a Resque job in mind, but you are right, this should be a separate issue^^
You summed up my proposition correctly, I'll start working on this in a couple of days. |
Ha, sorry, I guess I missed the last paragraph. :) |
|
Definitely not merge ready PR, just want to see your opinion @sheeep if it's going in the right direction. edit oh and forgot to say, regardless of all the possible disasters it can cause at the moment and the quickfixes, it works. ^^ |
|
Whoa. @mitom Thanks for that PR! Seems to be awesome, just some little things I stumbled upon. But you have to tell me: What kind of sorcery is that? How did you manage it to transform this issue to a pull request? =D I'll take a more thorough look at it tomorrow morning. |
|
This is an extremely sophisticated pull request, I'd love to merge it!
As you seems to know what you are doing by contributing to projects on Github: Do you know if it is possible for me to push into this pull request? Or should I open a pull request on your branch for that? |
As I said I didn't mean this to be merged yet, so please don't even think about it until I claim it ready from my side.
I ask github's api to do it for me as described here. I usually curl, as the second answer says.
You should PR to my branch and after I merge it would appear here, but if you don't mind waiting a bit, please don't start working on it until tomorrow. I intend to refactor this a bit more today and fix things up. Depending on what did you intend to do it might make it hard to merge them.
Sorry I didn't bother with the tests, it was late when I finished making it work. I was intending to also write tests (and will do so when I finish refactoring), simply didn't get time and energy for it, that's why I stated that it was not to be merged yet. I opened the PR so you could take a look if the direction is good. |
|
@sheeep I've finished refactoring I think, but it is still not merge ready. There are 4 things that remain to do:
In the end this PR also includes several fixes for the scrutinizer Whatever you wanted to PR about is now welcome, especially if you wanted to write tests :P |
|
Ha!! Look at that!! A passing version!! |
|
Wow, quite amazing work again. I just scanned it briefly and will have a deeper look at it tomorrow morning.
I never though about this, but if Symfony 2.1 is not able to resolve
Ahh, writing test cases. My secret hobby =D To be honest, a good idea, because it forces me to get a deep understanding of the changes. |
Aaaamazing! ^^ |
We can do that together actually, honestly I had more trouble understanding how the tests work than the functionality itself ^^ I'll try and write some docs tomorrow about the new configs and their effects to make it a bit easier to you, but I guess that's the trivial part. I probably also won't make big commits anymore, unless the orphanage is broken heavily, so things should be easy to merge. I would also welcome your insight on the changes as there are quite a few mission critical things in there, and I did these when I took a break from my job, so no guarantee that my logic is spot on. |
|
Hi @mitom A few questions arised.
Okay, I checked all your changes, but haven't tested it yet. More feedback is yet to come. :) |
|
Good morning @sheeep
Yeah, it should be
I had trouble with the interfaces there... The truth is it is a bit faulty at design now, as Also, the storages should implement |
|
@sheeep I've written some docs, they won't be much of help, but at least they are there. |
|
Okay @sheep, I've also written tests and extended docs too (raining here ☔ ). They went surprisingly easy, I didn't realise my changes were so compatible with the existing things. So what this pull requests does:
I think I'm done with what I wanted to do. Feedback, questions and suggestions/changes are still welcome. Please make sure you go through everything before you merge it though. At last, I'd like to thank you for this bundle, and your efforts on my suggestions and propositions. |
Must be a terrible weather. ^^
What do you think. Should we also drop general support for Symfony 2.1? See #53 (comment).
It's a fairly large PR. I need some time to fully look through, but I'm optimistic that this is merge-ready until Sunday. Thanks for all your contributions and effort! |
|
Hmm. Strage thing happing on my local machine. The travis build of mitom@71d6118 seems to pass. However, locally I get 2 errors. Any idea what this could be? // edit: composer updated, latest commit pulled |
I removed the travis test because it would fail all the time on one of the methods of UploadedFile (can't recall which), so I think it would make sense to raise the deps yes.
That is strange. I just tried it with a fresh checkout, and everything went smooth. I can't tell what's wrong with it at your side as travis seems to be able to work with it too. What php version do you have? (I don't really think it matters tbh) What you could try is add a sleep or exit or whatever before the for loop in one of those methods and dump out what is it going to loop through, then check manually if the files are in place. public function testUpload()
{
var_dump($this->payloads);
exit();
for ($i = 0; $i < $this->numberOfPayloads; $i ++) {
$this->orphanage->upload($this->payloads[$i], $i . 'notsogrumpyanymore.jpeg');
}
}edit: I just noticed that it say /private/chunks/somefile is missing. That is strange as it should start with /chunks/somefile //gaufrette needs the key relative to it's root
$fileKey = str_replace($this->realDirectory, '', $file);
echo "original: $file - parsed -> $fileKey" .PHP_EOL;
$this->payloads[] = new GaufretteFile(new File($fileKey, $filesystem), $filesystem);that should give you something similar to this: |
Documentation fixes.
Raised requirement of symfony/framework-bundle to >=2.2.
|
Hi @mitom Without having changed anything, the tests passes suddenly. No idea what that could have been, but it seems that my computer chocked on something. I've tested your changes and have found some errors (I guess). It could easily be that I'm doing something wrong. First of all, I tried to use a Gaufrette filesystem for all the tests cases. It will lead to an error. Take a look at https://github.com/sheeep/OneupUploaderBundle/commit/4e36e2b014200a131b84e67e6db13cab71aa7602 for the changes made. Changing the I also tried to use it in a minimal application. Given the following configuration, it throws a Gaufrette error after the last chunk was uploaded, because the file was $this->filesystem->rename($path.$target, $path.$name);config.yml knp_gaufrette:
adapters:
gallery:
local:
directory: %kernel.root_dir%/../web/uploads
create: true
chunks:
local:
directory: %kernel.root_dir%/../web/chunks
create: true
filesystems:
gallery:
adapter: gallery
chunks:
adapter: chunks
oneup_uploader:
chunks:
maxage: 86400
storage:
type: gaufrette
filesystem: gaufrette.chunks_filesystem
mappings:
gallery:
frontend: fineuploader
storage:
type: gaufrette
filesystem: gaufrette.gallery_filesystem
sync_buffer_size: 1M |
I looked into it. You did something 'wrong' (only if you want to do it the most effective way) by using different gaufrette filesystems for the chunk and file storage. However this is also a behaviour that is expected to work, so in the end you did something good by trying it out. I could reproduce the error only on the 2nd attempt to upload a file. The cause of the error was that the file wasn't deleted from the chunk storage after it was uploaded. 47c9e32 takes care of that, hopefully it will work correctly now (or at least I can't reproduce that error anymore) |
|
About the other one. I noticed that in the configuration you defined the chunk storage as a gaufrette storage, and the file storage as a local directory. This is not supported (at least not by me), and SHOULD not be either. There is no valid usecase (or tell me one) where you would need to upload the chunks to a remote directory, and then download and store the whole to a local one. This applies to all the other mappings too. I did try changing all of them, but that ended up in lots of failing validation tests, honestly I haven't checked them. I know that some of them are because of mime-types, which could be helped if the I personally think that those tests should rather be unit tests with mocked files, as in the current case you can either test a gaufrette chunk storage, or a non-gaufrette one. If the assembling and validating tests would be unit tests, then those functional tests could be way lighter and could be limited to testing only on local files. Sorry to say, that's something I'm not willing to do, that will be test-writing hell :D I think you should either leave those alone, or rewrite that part of the tests before merging this, if you PR me, I'll continue to merge them into this one. Then again, I'm terribly sorry for not working on that part, but I really, really hate writing tests :( |
Nice. I can confirm that is works now.
You're right. There is no valid use case. I just thought it would be the easiest way to test this feature, but never mind. I'll put this fact into the documentation for future users.
Nah. We'll leave it this way. :) Maybe if I'm facing times of major boredom in the near future, I'll give it a shot. |
For me it correctly ends up as |
|
Hi @mitom I tested your changes and it seems to work fine. Just pushed some small fixes to your branch (https://github.com/mitom/OneupUploaderBundle/pull/3).
What do you think. Is this merge ready? Is there something left that I missed? |
Documentation / Test / Little Improvement
|
I added a check to eliminate the possibility of setting it up wrong, one less thing people can duck up :P
Well, I couldn't break it further, so if you couldn't either and you agree with my conclusions and theories which the changes and limitations are based on, then I'd say it is ready Others also didn't report having any problems with the changes proposed, so I think they either don't care, or agree. I hope if there is anything wrong with it, it will turn out while it's in a WIP branch. So, if you agree, I'd say it can be merged. |
Correct, just a sec. |
This reverts commit 5e789d8.
Very well. Let the adventure begin. (Will have to manually merge.) |
Sorry ^^ |
|
That went easy! Yey, we made it! 🎉 |
I was wondering if you have any idea about putting behind a load-balancer. (Given that we avoid trying to route requests from the same client to the same machine)
What I am wondering about is chunked uploads would only work if the temporary place where the chunks (chunk since one of our previous discussion) is stored in a shared place.
But then again, it would mean the latest-chunk is uploaded to the server, the remote file read, the latest-chunk written to it, and saved to the shared place again so the next request can access it from a possibly different server and then moved to it's final place. Sounds pretty un-effective to me.
Do you have any ideas how we could improve this, or if my I miss something?