* potential bug fix for drop path
* variable name change
* forgot to rename the variables
* back to original
* modify dpr properly
* check_copies auto fix
* corresponsing swin2 changes
* auto fix
* linting
* default value for drop_path_rate as 0.0
* Update src/transformers/models/glm/modeling_glm.py
* maskformer fix
* ruff format
* changes made to tf code as well
* lint
---------
Co-authored-by: abhijit deo <167164474+deo-abhijit@users.noreply.github.com>
* Separator in regex
* Standardize separator for relative path in auto generated message
* open() encoding
* Replace `\` on `os.path.abspath`
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* feat: Added int conversion and unwrapping
* test: added tests for post_process_keypoint_detection of SuperPointImageProcessor
* docs: changed docs to include post_process_keypoint_detection method and switched from opencv to matplotlib
* test: changed test to not depend on SuperPointModel forward
* test: added missing require_torch decorator
* docs: changed pyplot parameters for the keypoints to be more visible in the example
* tests: changed import torch location to make test_flax and test_tf
* Revert "tests: changed import torch location to make test_flax and test_tf"
This reverts commit 39b32a2f69.
* tests: fixed import
* chore: applied suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* tests: fixed import
* tests: fixed import (bis)
* tests: fixed import (ter)
* feat: added choice of type for target_size and changed tests accordingly
* docs: updated code snippet to reflect the addition of target size type choice in post process method
* tests: fixed imports (...)
* tests: fixed imports (...)
* style: formatting file
* docs: fixed typo from image[0] to image.size[0]
* docs: added output image and fixed some tests
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: included SuperPointKeypointDescriptionOutput in TYPE_CHECKING if statement and changed tests results to reflect changes to SuperPoint from absolute keypoints coordinates to relative
* docs: changed SuperPoint's docs to print output instead of just accessing
* style: applied make style
* docs: added missing output type and precision in docstring of post_process_keypoint_detection
* perf: deleted loop to perform keypoint conversion in one statement
* fix: moved keypoint conversion at the end of model forward
* docs: changed SuperPointInterestPointDecoder to SuperPointKeypointDecoder class name and added relative (x, y) coordinates information to its method
* fix: changed type hint
* refactor: removed unnecessary brackets
* revert: SuperPointKeypointDecoder to SuperPointInterestPointDecoder
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* enable average tokens across devices
* reduce earlier in case model needs it
* simplify if statement
* reformat code to make ruff happy
* add doc for argument: average_tokens_across_devices
* cannot find world size when pytorch is unavailable
* format code
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add docs/source/ar/fast_tokenizers.md to Add_docs_source_ar_fast_tokenizers.md
* Update _toctree.yml
* Update _toctree.yml
* Update docs/source/ar/_toctree.yml
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details
* [docs] correct input documentation for MISTRAL model to reference `input_ids` instead of `decoder_input_ids`
* [docs] clarify cache_position description in MISTRAL model documentation
* Add _determine_best_metric and new saving logic.
1. Logic to determine the best logic was separated out from
`_save_checkpoint`.
2. In `_maybe_log_save_evaluate`, whether or not a new best metric was
achieved is determined after each evaluation, and if the save strategy
is "best' then the TrainerControl is updated accordingly.
* Added SaveStrategy.
Same as IntervalStrategy, but with a new attribute called BEST.
* IntervalStrategy -> SaveStrategy
* IntervalStratgy -> SaveStrategy for save_strat.
* Interval -> Save in docstring.
* Updated docstring for save_strategy.
* Added SaveStrategy and made according changes.
`save_strategy` previously followed `IntervalStrategy` but now follows
`SaveStrategy`.
Changes were made accordingly to the code and the docstring.
* Changes from `make fixup`.
* Removed redundant metrics argument.
* Added new test_save_best_checkpoint test.
1. Checks for both cases where `metric_for_best_model` is explicitly
provided and when it's not provided.
2. The first case should have two checkpoints saved, whereas the second
should have three saved.
* Changed should_training_end saving logic.
The Trainer saves a checkpoints at the end of training by default as
long as `save_strategy != SaveStrategy.NO`. This condition was modified
to include `SaveStrategy.BEST` because it would be counterintuitive that
we'd only want the best checkpoint to be saved but the last one is as
well.
* `args.metric_for_best_model` default to loss.
* Undo metric_for_best_model update.
* Remove checking metric_for_best_model.
* Added test cases for loss and no metric.
* Added error for metric and changed default best_metric.
* Removed unused import.
* `new_best_metric` -> `is_new_best_metric`
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Applied `is_new_best_metric` to all.
Changes were made for consistency and also to fix a potential bug.
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* exclude fsdp from delay_optimizer_creation
* add test case for trainer: FSDP mode and fp8 as mixed precision
* rearrange imports
* ruff formatted
* adapt _init_fsdp to fp8
* use _init_fsdp only when resume_from_checkpoint
* In case of FDP, self.layer will be CheckpointWrapper which has no len() method
* delete _init_fsdp
* solve conflict
* fix conflict
* make fixup
* Fix batch size handling in prediction_loop for DataLoaderShard
Updated the prediction_loop method in the Trainer class to correctly handle batch size when using DataLoaderShard. This ensures that the batch size is retrieved from total_batch_size for distributed training scenarios, preventing TypeError related to NoneType during evaluation.
* Update src/transformers/trainer.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Applied the fix to remove unused imports
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Correct the new defaults
* CIs
* add check
* Update utils.py
* Update utils.py
* Add the max_length in generate test checking shape without passing length
* style
* CIs
* fix fx CI issue
When loading a LoRA adapter, so far, there was only a warning when there
were unexpected keys in the checkpoint. Now, there is also a warning
when there are missing keys.
This change is consistent with
https://github.com/huggingface/peft/pull/2118 in PEFT and the planned PR
https://github.com/huggingface/diffusers/pull/9622 in diffusers.
Apart from this change, the error message for unexpected keys was
slightly altered for consistency (it should be more readable now). Also,
besides adding a test for the missing keys warning, a test for
unexpected keys warning was also added, as it was missing so far.
* translated gguf.md into chinese
* Apply suggestions from code review
I have updated the PR accordingly.Thank you very much for detailed guidance,and I 'll pay more attention to the details next time.
Co-authored-by: Isotr0py <2037008807@qq.com>
* Apply suggestions from code review
Co-authored-by: Isotr0py <2037008807@qq.com>
---------
Co-authored-by: Isotr0py <2037008807@qq.com>