These are the papers mentioned at least one in the codebase. - [x] https://huggingface.co/papers/1707.06347 #5085 - [x] https://huggingface.co/papers/1909.08593 (only mentioned in notebook, no need to have in paper index) - [x] https://huggingface.co/papers/1910.02054 #4551 - [x] https://huggingface.co/papers/1910.10683 #5084 - [x] https://huggingface.co/papers/2106.09685 #4441 - [x] https://huggingface.co/papers/2211.14275 #5083 - [x] https://huggingface.co/papers/2305.10425 #3990 - [x] https://huggingface.co/papers/2305.18290 #3937 - [x] https://huggingface.co/papers/2306.13649 #5082 - [x] https://huggingface.co/papers/2307.09288 #4094 - [x] https://huggingface.co/papers/2309.06657 #4441 - [x] https://huggingface.co/papers/2309.16240 #3906 - [x] https://huggingface.co/papers/2310.12036 #3990 - [x] https://huggingface.co/papers/2312.00886 #4860 - [x] https://huggingface.co/papers/2312.09244 #4094 - [x] https://huggingface.co/papers/2401.08417 #5081 - [x] https://huggingface.co/papers/2402.00856 #3990 - [x] https://huggingface.co/papers/2402.01306 #4440 - [x] https://huggingface.co/papers/2402.03300 #4441 - [x] https://huggingface.co/papers/2402.04792 #5037 - [x] https://huggingface.co/papers/2402.05369 #3990 - [x] https://huggingface.co/papers/2402.09353 #4892 - [x] https://huggingface.co/papers/2402.14740 #3801 - [x] https://huggingface.co/papers/2403.00409 #3990 - [x] https://huggingface.co/papers/2403.07691 #5080 - [x] https://huggingface.co/papers/2403.17031 (these are implementations details, no need to have in paper index) - [x] https://huggingface.co/papers/2404.04656 #3990 - [x] https://huggingface.co/papers/2404.09656 #5078 - [x] https://huggingface.co/papers/2404.19733 #3906 - [x] https://huggingface.co/papers/2405.00675 #3990 - [x] https://huggingface.co/papers/2405.14734 #5071 - [x] https://huggingface.co/papers/2405.16436 #5070 - [x] https://huggingface.co/papers/2405.21046 #5068 - [x] https://huggingface.co/papers/2406.05882 #3990 - [x] https://huggingface.co/papers/2406.08414 #3990 - [x] https://huggingface.co/papers/2406.11827 #3906 - [x] https://huggingface.co/papers/2407.21783 (LLaMA 3 paper, no need to have in paper index) - [x] https://huggingface.co/papers/2408.06266 #3990 - [x] https://huggingface.co/papers/2409.06411 #3906 - [x] https://huggingface.co/papers/2409.20370 #5002 - [x] https://huggingface.co/papers/2411.10442 #5089 - [x] https://huggingface.co/papers/2501.03262 #5062 - [x] https://huggingface.co/papers/2501.03884 #3824 - [x] https://huggingface.co/papers/2501.12599 (Kimi 1.5 paper mentioned in an example, no need to have in paper index) - [x] https://huggingface.co/papers/2501.12948 #5053 - [x] https://huggingface.co/papers/2503.14476 #3937 - [x] https://huggingface.co/papers/2503.20783 #3937 - [x] https://huggingface.co/papers/2503.24290 (link to justify beta=0 in the doc, no need to have in paper index) - [x] https://huggingface.co/papers/2505.07291 #5061 - [x] https://huggingface.co/papers/2506.01939 #4580 - [x] https://huggingface.co/papers/2507.18071 #3775 - [x] https://huggingface.co/papers/2508.00180 #3855 - [x] https://huggingface.co/papers/2508.05629 #4042 - [x] https://huggingface.co/papers/2508.08221 #3935 - [x] https://huggingface.co/papers/2508.09726 #3989
These are the papers mentioned at least one in the codebase.
device_map=Nonefor DeepSpeed and add ZeRO paper (1910.02054) to Paper Index #4551