G-API: Adding a Python script to download onnx models#831
Conversation
Legal/copyright information is required about the used image. |
Until downloaded these LFS files EXIST and store lfs metadata. Git repository itself should go into dedicated "download_cache" directory. What is about repository commit revision? |
| # and "lfs" folder will exist during download | ||
| lfs_cache_path = CACHE_DIR + '.git/lfs' | ||
| if os.path.exists(lfs_cache_path): | ||
| rmdir_with_data(lfs_cache_path) |
There was a problem hiding this comment.
@alalek, it is questionable step. Cache directory gets some data after calling git lfs pull. I remove this folder and don't know. Should cache directory have constant size (180 MB) or should contain received data (500 mb)? This (without deleting lfs folder) avoids downloading models after deleting onnx_models folder.
|
@smirnov-alexey, can you review this? |
OrestChura
left a comment
There was a problem hiding this comment.
Some comments, most are just questions to double-check
OrestChura
left a comment
There was a problem hiding this comment.
Some comment cavils last, but I'm all good with the script. Thanks!
|
@alalek, does script have correct behavior (check description)? If yes, can we merge it? |
|
@alalek can we merge this thing? I believe it is required for a more correct ONNX models coverage we do now. |
| def clear_pulled_files(): | ||
| # "lfs" directory contains some data | ||
| # Remove this folder then "onnx_models_cache" will have standard size - 180 MB | ||
| # If we remove "onnx_models" folder then models will be downloaded again | ||
| # and "lfs" folder will exist during download | ||
| lfs_cache_path = CACHE_DIR + '.git/lfs' | ||
| if os.path.exists(lfs_cache_path): | ||
| rmdir_with_data(lfs_cache_path) |
There was a problem hiding this comment.
Do we really need to clean '.git' internals? Does it break anything?
There was a problem hiding this comment.
No, but without cleaning models won't be downloaded again and will be moved from cache. And for models update you should remove cache directory.
It is main question about behavior. Do we need update models from GH when we remove models directory (NOT cache)?
There was a problem hiding this comment.
Don't clear the cache now.
| os.system('git clone --recursive https://github.com/onnx/models.git onnx_models_cache') | ||
| else: |
There was a problem hiding this comment.
git clone
Need to fetch+checkout the latest commits from the upstream repository if folder already exists.
There was a problem hiding this comment.
Added git pull of master branch.
|
|
||
| CUR_DIR = os.getcwd() | ||
| # This directory contains result of "git clone" | ||
| CACHE_DIR = CUR_DIR + '/onnx_models_cache/' |
There was a problem hiding this comment.
'/onnx_models_cache/'
Please use '/.cache/onnx_models/'
8287c09 to
cf77691
Compare
cf77691 to
4e325ce
Compare
|
@alalek, Could this be merged if all comments are applied correctly? |
Added python script which downloads models from https://github.com/onnx/models
Behavior (UPDATED):
onnx_models_cacheandonnx_models;onnx_models_cachecontains result of git clone (small files);onnx_modelscontains large ONNX models which will be used for something;onnx_modelsfolder and skip download if model already exists and pass sha check;onnx_modelsand call script again, then models will be downloaded again;onnx_models_cacheand removes lfs folder fromonnx_models_cachefor that;failedModels.Also you can call
download_onnx_models.pywithmodel_name/print,model_namewill be downloaded in this case;Added new directories: