* universal checkpoint
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer
in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this.
* Simplify setting self.add_prefix_space, ensure pre_tok exists
* Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized'
862d1a346a/bindings/python/src/pre_tokenizers.rs (L672) produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer.
* Propagate add_prefix_space in T5TokenizerFast to superclass
* look-ahead negation
* re add examples by default
* Fix the bug in topological sort
* Update create_dependency_mapping.py
* start adding test
* finalize test
* more tests
* style
* style
* update modular_modernbert -- add inputs_embeds param to ModernBertModel
* Fix implementation issues; extend to other classes; docstring
First of all, the inputs_embeds shouldn't fully replace `self.embeddings(input_ids)`, because this call also does layer normalization and dropout. So, now both input_ids and inputs_embeds is passed to the ModernBertEmbeddings, much like how BertEmbeddings is implemented.
I also added `inputs_embeds` to the docstring, and propagated the changes to the other model classes.
I also introduced an error if input_ids and input_embeds are both or neither provided.
Lastly, I fixed an issue with device being based solely on input_ids with attention_mask.
* Propagate inputs_embeds to ModernBertForMaskedLM correctly
Also reintroduce inputs_embeds test
---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
* setup loss_type in config at model init time
ensures no additional graph break introduced when torch.compile'ed
fixes#34615
Signed-off-by: ChanderG <mail@chandergovind.org>
* lookup loss mapping at init time instead of manual setup
Signed-off-by: ChanderG <mail@chandergovind.org>
* remove redundant lookup at loss_function time
* overwride losstype at init time
---------
Signed-off-by: ChanderG <mail@chandergovind.org>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* إضافة الترجمة العربية: multiple_choice.md
* Update multiple_choice.md
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Add files via upload
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* update codecarbon
* replace directly-specified-test-dirs with tmp_dir
* pass tmp_dir to all get_regression_trainer
* test_trainer.py: Use tmp_dir consistently for all output_dir arguments
* fix some with...as tmp_dir blocks
* reflect the comments to improve test_trainer.py
* refresh .gitignore
* update conversion script
* update for bias again
* remove pdv
* use my dir
* Update how we initialize the tokenizer
* Convert in bfloat16
* Undo that one again
* fix config dump
* .to() was broken for BatchMixFeature
* quick debug breakpoint
* put the breakpoint in the right place
* Add a config flag for the multimodal projector bias
* Add a config flag for the multimodal projector bias
* Conversion script can load chat templates
* Indent config for comparison
* Stop clobbering the config
* Re-enable the config clobber
* Get rid of the config manual save - it has no effect!
* Handle adapter bias correctly
* Default vision transformer activation to silu
* Remove legacy processing path
* One commit with all the debug breakpoints before I delete them all, in case I need to revert
* Update conversion
* Remove vLLM debugging instrumentation
* Drop xformers
* Remove debug enumerates
* make fixup
* make fixup
* Break copied from in pixtral
* Propagate multimodal_projector_bias change
* Propagate multimodal_projector_bias change
* Remove debug device .to()
* Restore attention weights output
* Fix Pixtral test
* Drop image_seq_length
* Drop image_seq_length
* Put the legacy processing code back
* Add the bias option to the llava_next_video config
* Add the bias option to the llava_next_video config
* Make certain args required in converter
* Make certain args required in converter
* typo
* make fixup
* Reverting some dtype changes since it seems to work without them
---------
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-166-244.ec2.internal>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Updated docstring for _determine_best_metric.
* Updated docstring for metric_for_best_model.
* Added test case for save strategy.
* Updated incorrect test case.
* Changed eval_strategy to match save_strategy.
* Separated test cases for metric.
* Allow load_best_model when save_strategy == "best".
* Updated docstring for metric_for_best_model.
* fix: processing odd number of frames
* feat: add test case
* update: test one frame
* feat: support custom patch size
* fix: test with videos
* revert: change on patch repeat
* fix: much wow
* update: fixups
* fixup pls
* ruff fixup
* fix typo at least
* Correctly list the chat template file in the saved files list
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add save file checking to test
* make fixup
* better filename handling
* make fixup
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add audio_token attribute to proc
* expand input_ids
* and legacy and expanded input_ids
* test update
* split lines
* add possibility not to provide eos and bos audio tokens
* raise errors
* test incorrect number of audio tokens
* add example
* fmt
* typo