SMIT
SMIT copied to clipboard
SMIT: A Simple Modality Integration Tool
The current workflow leads to a certain amount of catastrophic forgetting, the base model used [abacaj/phi-2-super](https://huggingface.co/abacaj/phi-2-super) reach an average of $62.13$ on the [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) while the resulting model [Thytu/phi-2-audio-super](https://huggingface.co/Thytu/phi-2-audio-super) falls...
In order to allow user to better find and filter their runs, each run should be named based on the model used. One option could be `{decoder_name}_{speech_encoder_name}` (i.e `abacaj/phi-2-super_facebook/hubert-large-ls960-ft`). However...
Currently the only metrics available during evaluation is the model loss however this does not provide enough granularity about the model performance on each of the modality its training on....
In the current situation there is no way to detect a bug introduced by a PR, has showcased by #8 SMIT should integrate a test suite.
Currently SMIT will only use either the `cpu` when no gpu available, or `cuda:0` when at least one gpu is found. This considerably limits the size performance of SMIT when...
In the current state, SMIT. will save the whole model (encoder + projector + decoder) during the pretraining. As only the `linear_projector` is trained during pre-training, this unnecessarily consumes disk...