* softcapping
* soft cap before the mask
* style
* ...
* super nit
* update
* fixes
* update
* small issue with modular
* fix modular imports
* update
* fixup
* simplify a hell lot
* simplify cleaning imports
* finish fixing
* update our design
* nits
* use a deprecation cycle
* updates
* Fix modular (recursive deps need to always be computed after merges!)
* push
* fix
* update
* fix modular order
* make fix-copies
* updates
* update
* ?
* don't compile for now
* ?
* fix some stuff
* donc!
* fix copies
* update
* fixup
* ?
* fix two tests
* fix?
* for now, don't use head info
* eager when output attentoin and sdpa or flash as it's the simplest behaviour (for our tests as well :))
* fix-copies
* revert sdpa check
* Apply suggestions from code review
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* rebase, fix-copies and push
* add a slow integration test
* update the test
* fix left padding issue
* fix test
* remove duplicate scaling
* quality
* add a small test and make sure it works
* 2b
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* doc: Trainer.hyperparameter_search docstring discrepancy solved
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
19d58d31f has introduced a context manager to manage subtests of
test_training_gradient_checkpointing. However, test body was not
moved under "with" statement. Thus, while tests are correctly
marked as skipped, test bodies were still executed. In some cases,
as with llama this caused attribute errors.
Fixes: #34722
Fixes: 19d58d31f ("Add MLLama (#33703)")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Add model skeletion with transformers-cli add-new-model-like
* Convert config to modular, add rms_norm_eps, delete clip_qkv
* Convert model to modular, add RMSNorm
* Add flash attention with qk norm and no qkv clipping
* Add decoder layer with RMSNorm after attention/feedforward layers
* Add base and causal model
* Add converter improvements from OLMo repo
* Update weight loading in OLMo to HF converter
* Set correct default for rms_norm_eps
* Set correct pipeline_model_mapping in test
* Run make fixup
* Fix model type
* Re-run modular conversion
* Manually set config docs to fix build errors
* Convert olmo-1124 to olmo_1124 to fix flash attention docs errors
* Start updating tests
* Update tests
* Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124
* Rename input_layernorm and post_attention_layernorm to reflect their ops better
* Use correct tokenizer
* Remove test unsupported by GPT2 tokenizer
* Create GenerationConfig outside of from_pretrained call
* Use simpler init file structure
* Add explicit __all__ to support simplified init
* Make safetensor serialization the default
* Update OLMo November 2024 docs
* remove v4.44 deprecations
* PR comments
* deprecations scheduled for v4.50
* hub version update
* make fiuxp
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove FSDP wrapping from sub-models.
* solve conflict trainer.py
* make fixup
* add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size
* put back extract_model_from_parallel
* use transformers unwrap_model
* Retain newlines in chat template when
* Add try/except
* Add regression test
* Simplify test
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* add XPU path
* use accelerate API
* Update docs/source/en/tasks/semantic_segmentation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update more places with accelerate API
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add docs/source/ar/torchscript.md to Add_docs_source_ar_torchscript.md
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Merge troubleshooting.md with this Branch
* Update _toctree.yml
* Update torchscript.md
* Update troubleshooting.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update llm_engine.py
- Added support for optional token and max_tokens parameters in the constructor.
- Provided usage examples and detailed documentation for each method.
* Add docs/source/ar/trainer.md to Add_docs_source_ar_trainer.md
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update trainer.md
* Update trainer.md
* Update trainer.md
* Create _toctree.yml
* Delete docs/source/ar/_toctree.yml
* Update _toctree.yml - add trainer
* Update _toctree.yml
* merge serialization.md into this branch
* merge sagemaker.md into this PR
* Update _toctree.yml
* Update docs/source/ar/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>