* Add initial implementation of `spectrogram_batch`
* Format the initial implementation
* Add test suite for the `spectrogram_batch`
* Update `spectrogram_batch` to ensure compatibility with test suite
* Update `spectrogram_batch` to include pre and post-processing
* Add `amplitude_to_db_batch` function and associated tests
* Add `power_to_db_batch` function and associated tests
* Reimplement the test suite for `spectrogram_batch`
* Fix errors in `spectrogram_batch`
* Add the function annotation for `spectrogram_batch`
* Address code quality
* Re-add `test_chroma_equivalence` function
* Update src/transformers/audio_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/audio_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* First draft, still missing automatic function conversion
* First draft of the automatic schema generator
* Lots of small fixes
* the walrus has betrayed me
* please stop committing your debug breakpoints
* Lots of cleanup and edge cases, looking better now
* Comments and bugfixes for the type hint parser
* More cleanup
* Add tests, update schema generator
* Update tests, proper handling of return values
* Small docstring change
* More doc updates
* More doc updates
* Add json_schema decorator
* Clean up the TODOs and finish the docs
* self.maxDiff = None to see the whole diff for the nested list test
* add import for add_json_schema
* Quick test fix
* Fix something that was bugging me in the chat template docstring
* Less "anyOf" when unnecessary
* Support return types for the templates that need them
* Proper return type tests
* Switch to Google format docstrings
* Update chat templating docs to match new format
* Stop putting the return type in with the other parameters
* Add Tuple support
* No more decorator - we just do it implicitly!
* Add enum support to get_json_schema
* Update docstring
* Add copyright header
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/chat_templating.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add copyright header
* make fixup
* Fix indentation
* Reformat chat_template_utils
* Correct return value
* Make regexes module-level
* Support more complex, multi-line arg docstrings
* Update error message for ...
* Update ruff
* Add document type validation
* Refactor docs
* Refactor docs
* Refactor docs
* Clean up Tuple error
* Add an extra test for very complex defs and docstrings and clean everything up for it
* Document enum block
* Quick test fixes
* Stop supporting type hints in docstring to fix bugs and simplify the regex
* Update docs for the regex change
* Clean up enum regex
* Wrap functions in {"type": "function", "function": ...}
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Temporary tool calling commit
* Add type hints to chat template utils, partially update docs (incomplete!)
* Code cleanup based on @molbap's suggestion
* Add comments to explain regexes
* Fix up type parsing for unions and lists
* Add custom exception types and adjust tests to look for them
* Update docs with a demo!
* Docs cleanup
* Pass content as string
* Update tool call formatting
* Update docs with new function format
* Update docs
* Update docs with a second tool to show the model choosing correctly
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Fix has_file in offline mode
* harmonize env variable for offline mode
* Switch to HF_HUB_OFFLINE
* fix test
* revert test_offline to test TRANSFORMERS_OFFLINE
* Add new offline test
* merge conflicts
* docs
* Bookmark, initial impelemtation. Need to test
* Clean
* Working fully, woop woop
* I think working version now, testing
* Fin!
* rm cast, could keep None
* Fix typing issue
* rm typehint
* Add test
* Add tests and make more rigid
* Add test for parse_json_file
* Change Path to PathLike
* Fix `Import block is un-sorted or un-formatted`
* revert parse_json_file
* Fix ruff format
* Add parse_json_file test
* Port core files + ESM (because ESM code is odd)
* Search-replace in modelling code
* Fix up transfo_xl as well
* Fix other core files + tests (still need to add correct import to tests)
* Fix cookiecutter
* make fixup, fix imports in some more core files
* Auto-add imports to tests
* Cleanup, add imports to sagemaker tests
* Use correct exception for importing tf_keras
* Fixes in modeling_tf_utils
* make fixup
* Correct version parsing code
* Ensure the pipeline tests correctly revert to float32 after each test
* Ensure the pipeline tests correctly revert to float32 after each test
* More tf.keras -> keras
* Add dtype cast
* Better imports of tf_keras
* Add a cast for tf.assign, just in case
* Fix callback imports
* Enable instantiating model with pretrained backbone weights
* Update tests so backbone checkpoint isn't passed in
* Remove doc updates until changes made in modeling code
* Clarify pretrained import
* Update configs - docs and validation check
* Update src/transformers/utils/backbone_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Clarify exception message
* Update config init in tests
* Add test for when use_timm_backbone=True
* Small test updates
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Iteratre over out_features instead of stage_names
* Update for all backbones
* Add tests
* Fix
* Align timm backbone behaviour with other backbones
* Fix tests
* Stricter checks on set out_features and out_indices
* Revert back stage selection logic
* Remove out-of-order logic
* Document restriction in docstrings
* add sdpa
* wip
* cleaning
* add ref
* yet more cleaning
* and more :)
* wip llama
* working llama
* add output_attentions=True support
* bigcode sdpa support
* fixes
* gpt-bigcode support, require torch>=2.1.1
* add falcon support
* fix conflicts falcon
* style
* fix attention_mask definition
* remove output_attentions from attnmaskconverter
* support whisper without removing any Copied from statement
* fix mbart default to eager renaming
* fix typo in falcon
* fix is_causal in SDPA
* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
* add warnings when falling back on the manual implementation
* precise doc
* wip replace _flash_attn_enabled by config.attn_implementation
* fix typo
* add tests
* style
* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
* obey to config.attn_implementation if a config is passed in from_pretrained
* fix is_torch_sdpa_available when torch is not installed
* remove dead code
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove duplicate pretraining_tp code
* add dropout in llama
* precise comment on attn_mask
* add fmt: off for _unmask_unattended docstring
* precise num_masks comment
* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
* cleanup modeling_utils
* backward compatibility
* fix style as requested
* style
* improve documentation
* test pass
* style
* add _unmask_unattended tests
* skip meaningless tests for idefics
* hard_check SDPA requirements when specifically requested
* standardize the use if XXX_ATTENTION_CLASSES
* fix SDPA bug with mem-efficient backend on CUDA when using fp32
* fix test
* rely on SDPA is_causal parameter to handle the causal mask in some cases
* fix FALCON_ATTENTION_CLASSES
* remove _flash_attn_2_enabled occurences
* fix test
* add OPT to the list of supported flash models
* improve test
* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
* remove remaining _flash_attn_2_enabled occurence
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove use_attn_implementation
* fix docstring & slight bug
* make attn_implementation internal (_attn_implementation)
* typos
* fix tests
* deprecate use_flash_attention_2=True
* fix test
* add back llama that was removed by mistake
* fix tests
* remove _flash_attn_2_enabled occurences bis
* add check & test that passed attn_implementation is valid
* fix falcon torchscript export
* fix device of mask in tests
* add tip about torch.jit.trace and move bt doc below sdpa
* fix parameterized.expand order
* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
* update sdpaattention class with the new cache
* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bark/modeling_bark.py
* address review comments
* WIP torch.jit.trace fix. left: test both eager & sdpa
* add test for torch.jit.trace for both eager/sdpa
* fix falcon with torch==2.0 that needs to use sdpa
* fix doc
* hopefully last fix
* fix key_value_length that has no default now in mask converter
* is it flacky?
* fix speculative decoding bug
* tests do pass
* fix following #27907
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Safetensors serialization by default
* First pass on the tests
* Second pass on the tests
* Third pass on the tests
* Fix TF weight loading from TF-format safetensors
* Specific encoder-decoder fixes for weight crossloading
* Add VisionEncoderDecoder fixes for TF too
* Change filename test for pt-to-tf
* One missing fix for TFVisionEncoderDecoder
* Fix the other crossload test
* Support for flax + updated tests
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Sanchit's comments
* Sanchit's comments 2
* Nico's comments
* Fix tests
* cleanup
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Register ModelOutput as supported torch pytree nodes
* Test ModelOutput as supported torch pytree nodes
* Update type hints for pytree unflatten functions
* add kaldi fbank
* make style
* add herz_to_mel_kaldi tests
* add mel to hertz kaldi test
* integration tests
* correct test and remove comment
* make style
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* change parameter name
* Apply suggestions from Arthur review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update remove_dc_offset description
* fix bug + make style
* fix error in using np.exp instead of np.power
* make style
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add @dataclass to MaskFormerPixelDecoderOutput
* Add dataclass check if subclass of ModelOutout
* Use unittest assertRaises rather than pytest per contribution doc
* Update src/transformers/utils/generic.py per suggested change
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Register ModelOutput subclasses as supported torch.utils._pytree nodes
Fixes#25357 where DDP with static_graph=True does not sync gradients when calling backward() over tensors contained in ModelOutput subclasses
* Add test for torch pytree ModelOutput serialization and deserialization
* Fix saved_model_creation_extended
* Skip the BLIP model creation test for now
* Fix TF SAM test
* Fix longformer tests
* Fix Wav2Vec2
* Add a skip for XLNet
* make fixup
* make fix-copies
* Add comments
* Add test for proper input signatures
* No more signature pruning
* Test the dummy inputs are valid too
* fine-tine -> fine-tune
* Fix indent in test_dataset_conversion
* Stop storing references to bound methods in tf.functions
* Remove the gc.collect calls now that we resolved the underlying problem
* Remove the default signature from model.serving entirely, big cleanup
* Remove _prune_signature as self.input_signature can prune itself
* Restore serving docstring
* Update int support test to check the input signature
* Make sure other tests also use model.input_signature and not serving.input_signature
* Restore _prune_signature
* Remove the doctest GC now it's no longer needed
* Correct core tests to use the pruned sig
* order lines correctly in core tests
* Add eager_serving back with a deprecation warning
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Don't forget the imports
* Add the imports to tests too
* make fixup
* Refactor tests that depended on get_type_hints
* Better test refactor
* Fix an old hidden bug in the test_keras_fit input creation code
* Fix for the Deit tests
* Adding Llama FastTokenizer support.
- Requires https://github.com/huggingface/tokenizers/pull/1183 version
- Only support byte_fallback for llama, raise otherwise (safety net).
- Lots of questions are special tokens
How to test:
```python
from transformers.convert_slow_tokenizer import convert_slow_tokenizer
from transformers import AutoTokenizer
from tokenizers import Tokenizer
tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")
if False:
new_tokenizer = Tokenizer.from_file("tok.json")
else:
new_tokenizer = convert_slow_tokenizer(tokenizer)
new_tokenizer.save("tok.json")
strings = [
"This is a test",
"生活的真谛是",
"生活的真谛是[MASK]。",
# XXX: This one is problematic because of special tokens
# "<s> Something something",
]
for string in strings:
encoded = tokenizer(string)["input_ids"]
encoded2 = new_tokenizer.encode(string).ids
assert encoded == encoded2, f"{encoded} != {encoded2}"
decoded = tokenizer.decode(encoded)
decoded2 = new_tokenizer.decode(encoded2)
assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
```
The converter + some test script.
The test script.
Tmp save.
Adding Fast tokenizer + tests.
Adding the tokenization tests.
Correct combination.
Small fix.
Fixing tests.
Fixing with latest update.
Rebased.
fix copies + normalized added tokens + copies.
Adding doc.
TMP.
Doc + split files.
Doc.
Versions + try import.
Fix Camembert + warnings -> Error.
Fix by ArthurZucker.
Not a decorator.
* Fixing comments.
* Adding more to docstring.
* Doc rewriting.
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* Fixed test_saved_model_extended
* Fix TFGPT2 tests
* make fixup
* Make sure keras-nlp utils are available for type hinting too
* Update src/transformers/testing_utils.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* make fixup
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Add a test to ensure int dummy inputs are int64
* Move the test into the existing int64 test and update a lot of existing dummies
* Fix remaining dummies
* Fix remaining dummies
* Test for int64 serving sigs as well
* Update core tests to use tf.int64
* Add better messages to the assertions
* Update all serving sigs to int64
* More sneaky hiding tf.int32s
* Add an optional int32 signature in save_pretrained
* make fixup
* Add Amy's suggestions
* Switch all serving sigs back to tf.int32
* Switch all dummies to tf.int32
* Adjust tests to check for tf.int32 instead of tf.int64
* Fix base dummy_inputs dtype
* Start casting to tf.int32 in input_processing
* Change dtype for unpack_inputs test
* Add proper tf.int32 test
* Make the alternate serving signature int64
* Adapt FE methods to transforms library
* Mixin for saving the image processor
* Base processor skeleton
* BatchFeature for packaging image processor outputs
* Initial image processor for GLPN
* REmove accidental import
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Add rescale back and remove ImageType
* fix import mistake
* Fix enum var reference
* Can transform and specify image data format
* Remove redundant function
* Update reference
* Data format flag for rescale
* Fix typo
* Fix dimension check
* Fixes to make IP and FE outputs match
* Add tests for transforms
* Add test for utils
* Update some docstrings
* Make sure in channels last before converting to PIL
* Remove default to numpy batching
* Fix up
* Add docstring and model_input_types
* Use feature processor config from hub
* Alias GLPN feature extractor to image processor
* Alias feature extractor mixin
* Add return_numpy=False flag for resize
* Fix up
* Fix up
* Use different frameworks safely
* Safely import PIL
* Call function checking if PIL available
* Only import if vision available
* Address Sylvain PR comments
Co-authored-by: Sylvain.gugger@gmail.com
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/image_transforms.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Update src/transformers/models/glpn/feature_extraction_glpn.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add in docstrings
* Fix TFSwinSelfAttention to have relative position index as non-trainable weight (#18226)
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Refactor `TFSwinLayer` to increase serving compatibility (#18352)
* Refactor `TFSwinLayer` to increase serving compatibility
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix missed parameters while refactoring
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix window_reverse to calculate batch size
Signed-off-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add TF prefix to TF-Res test class (#18481)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove py.typed (#18485)
* Fix pipeline tests (#18487)
* Fix pipeline tests
* Make sure all pipelines tests run with init changes
* Use new huggingface_hub tools for download models (#18438)
* Draft new cached_file
* Initial draft for config and model
* Small fixes
* Fix first batch of tests
* Look in cache when internet is down
* Fix last tests
* Bad black, not fixing all quality errors
* Make diff less
* Implement change for TF and Flax models
* Add tokenizer and feature extractor
* For compatibility with main
* Add utils to move the cache and auto-do it at first use.
* Quality
* Deal with empty commit shas
* Deal with empty etag
* Address review comments
* Fix `test_dbmdz_english` by updating expected values (#18482)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Move cache folder to huggingface/hub for consistency with hf_hub (#18492)
* Move cache folder to just huggingface
* Thank you VsCode for this needless import
* Move to hub
* Forgot one
* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` (#18484)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Forgot one new_ for cache migration
* disable Onnx test for google/long-t5-tglobal-base (#18454)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Typo reported by Joel Grus on TWTR (#18493)
* Just re-reading the whole doc every couple of months 😬 (#18489)
* Delete valohai.yaml
* NLP => ML
* typo
* website supports https
* datasets
* 60k + modalities
* unrelated link fixing for accelerate
* Ok those links were actually broken
* Fix link
* Make `AutoTokenizer` auto-link
* wording tweak
* add at least one non-nlp task
* `transformers-cli login` => `huggingface-cli login` (#18490)
* zero chance anyone's using that constant no?
* `transformers-cli login` => `huggingface-cli login`
* `transformers-cli repo create` => `huggingface-cli repo create`
* `make style`
* Add seed setting to image classification example (#18519)
* [DX fix] Fixing QA pipeline streaming a dataset. (#18516)
* [DX fix] Fixing QA pipeline streaming a dataset.
QuestionAnsweringArgumentHandler would iterate over the whole dataset
effectively killing all properties of the pipeline.
This restores nice properties when using `Dataset` or `Generator` since
those are meant to be consumed lazily.
* Handling TF better.
* Clean up hub (#18497)
* Clean up utils.hub
* Remove imports
* More fixes
* Last fix
* update fsdp docs (#18521)
* updating fsdp documentation
* typo fix
* Fix compatibility with 1.12 (#17925)
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* fix torch.onnx.symbolic_opset12 import
* Reject bad version
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove debug statement
* Specify en in doc-builder README example (#18526)
Co-authored-by: Ankur Goyal <ankur@impira.com>
* New cache fixes: add safeguard before looking in folders (#18522)
* unpin resampy (#18527)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* ✨ update to use interlibrary links instead of Markdown (#18500)
* Add example of multimodal usage to pipeline tutorial (#18498)
* 📝 add example of multimodal usage to pipeline tutorial
* 🖍 apply feedbacks
* 🖍 apply niels feedback
* [VideoMAE] Add model to doc tests (#18523)
* Add videomae to doc tests
* Add pip install decord
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update perf_train_gpu_one.mdx (#18532)
* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper (#18473)
* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script
* make fixup changes
* PR comments
* changed input to Acceletor based on PR comment, ran make fixup
* Added comment explaining the sync_gradients statement
* Fixed lr scheduler max steps
* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper
* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper
* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script
* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py
* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script
* Add Spanish translation of converting_tensorflow_models.mdx (#18512)
* Add file in spanish docs to be translated
* Finish translation to Spanish
* Improve Spanish wording
* Add suggested changes from review
* Spanish translation of summarization.mdx (#15947) (#18477)
* Add Spanish translation of summarization.mdx
* Apply suggestions from code review
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Let's not cast them all (#18471)
* add correct dtypes when checking for params dtype
* forward contrib credits
* Update src/transformers/modeling_utils.py
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* more comments
- added more comments on why we cast only floating point parameters
* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* fix: data2vec-vision Onnx ready-made configuration. (#18427)
* feat: add the data2vec conf that are missing https://huggingface.co/docs/transformers/serialization
* fix: wrong config
* Add mt5 onnx config (#18394)
* update features
* MT5OnnxConfig added with updated with tests and docs
* fix imports
* fix onnc_config_cls for mt5
Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>
* Minor update of `run_call_with_unpacked_inputs` (#18541)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* BART - Fix attention mask device issue on copied models (#18540)
* attempt to fix attn mask device
* fix bart `_prepare_decoder_attention_mask`
- add correct device
- run `make fix-copies` to propagate the fix
* Adding a new `align_to_words` param to qa pipeline. (#18010)
* Adding a new `align_to_words` param to qa pipeline.
* Update src/transformers/pipelines/question_answering.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Import protection.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* 📝 update metric with evaluate (#18535)
* Restore _init_weights value in no_init_weights (#18504)
* Recover _init_weights value in no_init_weights
For potential nested use.
In addition, users might modify private no_init_weights as well.
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove private variable change check
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up comment
* 📝 update documentation build section (#18548)
* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models (#17901)
* first commit
* correct replace function
* add final changes
- works like charm!
- cannot implement tests yet
- tested
* clean up a bit
* add bitsandbytes dependencies
* working version
- added import function
- added bitsandbytes utils file
* small fix
* small fix
- fix import issue
* fix import issues
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit
- move bitsandbytes utils to utils
- change comments on functions
* reformat docstring
- reformat docstring on init_empty_weights_8bit
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert bad formatting
* change to bitsandbytes
* refactor a bit
- remove init8bit since it is useless
* more refactoring
- fixed init empty weights issue
- added threshold param
* small hack to make it work
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
* revmoe the small hack
* modify utils file
* make style + refactor a bit
* create correctly device map
* add correct dtype for device map creation
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
- remove with torch.grad
- do not rely on Python bool magic!
* add docstring
- add docstring for new kwargs
* add docstring
- comment `replace_8bit_linear` function
- fix weird formatting
* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add
* few modifs
- typo doc
- force cast into float16 when load_in_8bit is enabled
* added colab link
* add test architecture + docstring a bit
* refactor a bit testing class
* make style + refactor a bit
* enhance checks
- add more checks
- start writing saving test
* clean up a bit
* male style
* add more details on doc
* add more tests
- still needs to fix 2 tests
* replace by "or"
- could not fix it from GitHub GUI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit testing code + add readme
* make style
* fix import issue
* Update src/transformers/modeling_utils.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* add few comments
* add more doctring + make style
* more docstring
* raise error when loaded in 8bit
* make style
* add warning if loaded on CPU
* add small sanity check
* fix small comment
* add bitsandbytes on dockerfile
* Improve documentation
- improve documentation from comments
* add few comments
* slow tests pass on the VM but not on the CI VM
* Fix merge conflict
* make style
* another test should pass on a multi gpu setup
* fix bad import in testing file
* Fix slow tests
- remove dummy batches
- no more CUDA illegal memory errors
* odify dockerfile
* Update docs/source/en/main_classes/model.mdx
* Update Dockerfile
* Update model.mdx
* Update Dockerfile
* Apply suggestions from code review
* few modifications
- lm head can stay on disk/cpu
- change model name so that test pass
* change test value
- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging
* modify installation guidelines
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* replace `n`by `name`
* merge `load_in_8bit` and `low_cpu_mem_usage`
* first try - keep the lm head in full precision
* better check
- check the attribute `base_model_prefix` instead of computing the number of parameters
* added more tests
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit
* improve documentation
- fix typos for installation
- change title in the documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* TF: XLA-trainable DeBERTa v2 (#18546)
* fix deberta issues
* add different code paths for gpu and tpu
* shorter gpu take along axis
* Stable Dropout without tf cond
* variable must be float
* Preserve hub-related kwargs in AutoModel.from_pretrained (#18545)
* Preserve hub-related kwargs in AutoModel.from_pretrained
* Fix tests
* Remove debug statement
* TF Examples Rewrite (#18451)
* Finished QA example
* Dodge a merge conflict
* Update text classification and LM examples
* Update NER example
* New Keras metrics WIP, fix NER example
* Update NER example
* Update MC, summarization and translation examples
* Add XLA warnings when shapes are variable
* Make sure batch_size is consistently scaled by num_replicas
* Add PushToHubCallback to all models
* Add docs links for KerasMetricCallback
* Add docs links for prepare_tf_dataset and jit_compile
* Correct inferred model names
* Don't assume the dataset has 'lang'
* Don't assume the dataset has 'lang'
* Write metrics in text classification
* Add 'framework' to TrainingArguments and TFTrainingArguments
* Export metrics in all examples and add tests
* Fix training args for Flax
* Update command line args for translation test
* make fixup
* Fix accidentally running other tests in fp16
* Remove do_train/do_eval from run_clm.py
* Remove do_train/do_eval from run_mlm.py
* Add tensorflow tests to circleci
* Fix circleci
* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix save path for tests
* Fix some model card kwargs
* Explain the magical -1000
* Actually enable tests this time
* Skip text classification PR until we fix shape inference
* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Use commit hash to look in cache instead of calling head (#18534)
* Use commit hash to look in cache instead of calling head
* Add tests
* Add attr for local configs too
* Stupid typos
* Fix tests
* Update src/transformers/utils/hub.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address Julien's comments
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* `pipeline` support for `device="mps"` (or any other string) (#18494)
* `pipeline` support for `device="mps"` (or any other string)
* Simplify `if` nesting
* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix? @sgugger
* passing `attr=None` is not the same as not passing `attr` 🤯
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update philosophy to include other preprocessing classes (#18550)
* 📝 update philosophy to include other preprocessing classes
* 🖍 apply feedbacks
* Properly move cache when it is not in default path (#18563)
* Adds CLIP to models exportable with ONNX (#18515)
* onnx config for clip
* default opset as 14
* changes from the original repo
* input values order fix
* outputs fix
* remove unused import
* ran make fix-copies
* black format
* review comments: forward ref, import fix, model change revert, .to cleanup
* make style
* formatting fixes
* revert groupvit
* comment for cast to int32
* comment fix
* make .T as .t() for onnx conversion
* ran make fix-copies
* remove unneeded comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix copies
* remove comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* raise atol for MT5OnnxConfig (#18560)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix string (#18568)
* Segformer TF: fix output size in documentation (#18572)
* Segformer TF: fix output size in doc
* Segformer pytorch: fix output size in doc
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
* Fix resizing bug in OWL-ViT (#18573)
* Fixes resizing bug in OWL-ViT
* Defaults to square resize if size is set to an int
* Sets do_center_crop default value to False
* Fix LayoutLMv3 documentation (#17932)
* fix typos
* fix sequence_length docs of LayoutLMv3Model
* delete trailing white spaces
* fix layoutlmv3 docs more
* apply make fixup & quality
* change to two versions of input docstring
* apply make fixup & quality
* Skip broken tests
* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training (#18486)
* changing BartLearnedPositionalEmbedding forward signature and references to it
* removing debugging dead code (thanks style checker)
* blackened modeling_bart file
* removing copy inconsistencies via make fix-copies
* changing references to copied signatures in Bart variants
* make fix-copies once more
* using expand over repeat (thanks @michaelbenayoun)
* expand instead of repeat for all model copies
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
* german docs translation (#18544)
* Create _config.py
* Create _toctree.yml
* Create index.mdx
not sure about "du / ihr" oder "sie"
* Create quicktour.mdx
* Update _toctree.yml
* Update build_documentation.yml
* Update build_pr_documentation.yml
* fix build
* Update index.mdx
* Update quicktour.mdx
* Create installation.mdx
* Update _toctree.yml
* Deberta V2: Fix critical trace warnings to allow ONNX export (#18272)
* Fix critical trace warnings to allow ONNX export
* Force input to `sqrt` to be float type
* Cleanup code
* Remove unused import statement
* Update model sew
* Small refactor
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Use broadcasting instead of repeat
* Implement suggestion
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Match deberta v2 changes in sew_d
* Improve code quality
* Update code quality
* Consistency of small refactor
* Match changes in sew_d
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* [FX] _generate_dummy_input supports audio-classification models for labels (#18580)
* Support audio classification architectures for labels generation, as well as provides a flag to print warnings or not
* Use ENV_VARS_TRUE_VALUES
* Fix docstrings with last version of hf-doc-builder styler (#18581)
* Fix docstrings with last version of hf-doc-builder styler
* Remove empty Parameter block
* Bump nbconvert from 6.0.1 to 6.3.0 in /examples/research_projects/lxmert (#18565)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump nbconvert in /examples/research_projects/visual_bert (#18566)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix owlvit tests, update docstring examples (#18586)
* Return the permuted hidden states if return_dict=True (#18578)
* Load sharded pt to flax (#18419)
* initial commit
* add small test
* add cross pt tf flag to test
* fix quality
* style
* update test with new repo
* fix failing test
* update
* fix wrong param ordering
* style
* update based on review
* update related to recent new caching mechanism
* quality
* Update based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* quality and style
* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add type hints for ViLT models (#18577)
* Add type hints for Vilt models
* Add missing return type for TokenClassification class
* update doc for perf_train_cpu_many, add intel mpi introduction (#18576)
* update doc for perf_train_cpu_many, add mpi introduction
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typos (#18594)
* FSDP bug fix for `load_state_dict` (#18596)
* Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` (#18600)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Generate: validate `model_kwargs` (and catch typos in generate arguments) (#18261)
* validate generate model_kwargs
* generate tests -- not all models have an attn mask
* Supporting seq2seq models for `bitsandbytes` integration (#18579)
* Supporting seq2seq models for `bitsandbytes` integration
- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check
* small modification
- tie the weights before looking at tied weights!
* Add Donut (#18488)
* First draft
* Improve script
* Update script
* Make conversion work
* Add final_layer_norm attribute to Swin's config
* Add DonutProcessor
* Convert more models
* Improve feature extractor and convert base models
* Fix bug
* Improve integration tests
* Improve integration tests and add model to README
* Add doc test
* Add feature extractor to docs
* Fix integration tests
* Remove register_buffer
* Fix toctree and add missing attribute
* Add DonutSwin
* Make conversion script work
* Improve conversion script
* Address comment
* Fix bug
* Fix another bug
* Remove deprecated method from docs
* Make Swin and Swinv2 untouched
* Fix code examples
* Fix processor
* Update model_type to donut-swin
* Add feature extractor tests, add token2json method, improve feature extractor
* Fix failing tests, remove integration test
* Add do_thumbnail for consistency
* Improve code examples
* Add code example for document parsing
* Add DonutSwin to MODEL_NAMES_MAPPING
* Add model to appropriate place in toctree
* Update namespace to appropriate organization
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Fix URLs (#18604)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update BLOOM parameter counts (#18531)
* Update BLOOM parameter counts
* Update BLOOM parameter counts
* [doc] fix anchors (#18591)
the manual anchors end up being duplicated with automatically added anchors and no longer work.
* [fsmt] deal with -100 indices in decoder ids (#18592)
* [fsmt] deal with -100 indices in decoder ids
Fixes: https://github.com/huggingface/transformers/issues/17945
decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index.
For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.
Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.
* style
* small change (#18584)
* Flax Remat for LongT5 (#17994)
* [Flax] Add remat (gradient checkpointing)
* fix variable naming in test
* flip: checkpoint using a method
* fix naming
* fix class naming
* apply PVP's suggestions from code review
* add gradient_checkpointing to examples
* Add gradient_checkpointing to run_mlm_flax
* Add remat to longt5
* Add gradient checkpointing test longt5
* Fix args errors
* Fix remaining tests
* Make fixup & quality fixes
* replace kwargs
* remove unecessary kwargs
* Make fixup changes
* revert long_t5_flax changes
* Remove return_dict and copy to LongT5
* Remove test_gradient_checkpointing
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
* mac m1 `mps` integration (#18598)
* mac m1 `mps` integration
* Update docs/source/en/main_classes/trainer.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* addressing comments
* Apply suggestions from code review
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* resolve comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* Change scheduled CIs to use torch 1.12.1 (#18644)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add checks for some workflow jobs (#18583)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* TF: Fix generation repetition penalty with XLA (#18648)
* Update longt5.mdx (#18634)
* Update run_translation_no_trainer.py (#18637)
* Update run_translation_no_trainer.py
found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint
* fixs `no_decay` and `resume_step` issue
1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
* [bnb] Minor modifications (#18631)
* bnb minor modifications
- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* put in one block
- put bash instructions in one block
* update readme
- refactor a bit hardware requirements
* change text a bit
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add link to paper
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update tests/mixed_int8/README.md
* Apply suggestions from code review
* refactor a bit
* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add A6000
* clarify a bit
* remove small part
* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Examples: add Bloom support for token classification (#18632)
* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)
* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)
* Fix Yolos ONNX export test (#18606)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fixup
* Fix up
* Move PIL default arguments inside function for safe imports
* Add image utils to toctree
* Update `rescale` method to reflect changes in #18677
* Update docs/source/en/internal/image_processing_utils.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Address Niels PR comments
* Add normalize method to transforms library
* Apply suggestions from code review - remove defaults to None
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix docstrings and revert to PIL.Image.XXX resampling
Use PIL.Image.XXX resampling values instead of PIL.Image.Resampling.XXX enum as it's only in the recent version >= 9.10 and version is not yet pinned and older version support deprecated
* Some more docstrings and PIL.Image tidy up
* Reorganise arguments so flags by modifiers
* Few last docstring fixes
* Add normalize to docs
* Accept PIL.Image inputs with deprecation warning
* Update src/transformers/image_transforms.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update warning to include version
* Trigger CI - hash clash on doc build
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Ankur Goyal <ankrgyl@gmail.com>
Co-authored-by: Ankur Goyal <ankur@impira.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Mishig Davaadorj <dmishig@gmail.com>
Co-authored-by: Rasmus Arpe Fogh Jensen <Rasmus.arpe@gmail.com>
Co-authored-by: Ian Castillo <7807897+donelianc@users.noreply.github.com>
Co-authored-by: AguilaCudicio <aguila.cudicio@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Niklas Hansson <niklas.sven.hansson@gmail.com>
Co-authored-by: Thomas Chaigneau <t.chaigneau.tc@gmail.com>
Co-authored-by: YouJiacheng <1503679330@qq.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Dhruv Karan <k4r4n.dhruv@gmail.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Maxime G <joihn@users.noreply.github.com>
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
Co-authored-by: Wonseok Lee (Jack) <rollerkid02@snu.ac.kr>
Co-authored-by: Dan Jones <dan.j.jones2@gmail.com>
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
Co-authored-by: flozi00 <flozi00.fz@gmail.com>
Co-authored-by: iiLaurens <iiLaurens@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
Co-authored-by: zhoutang776 <47708118+zhoutang776@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Adapt FE methods to transforms library
* Mixin for saving the image processor
* Base processor skeleton
* BatchFeature for packaging image processor outputs
* Initial image processor for GLPN
* REmove accidental import
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Add rescale back and remove ImageType
* fix import mistake
* Fix enum var reference
* Can transform and specify image data format
* Remove redundant function
* Update reference
* Data format flag for rescale
* Fix typo
* Fix dimension check
* Fixes to make IP and FE outputs match
* Add tests for transforms
* Add test for utils
* Update some docstrings
* Make sure in channels last before converting to PIL
* Remove default to numpy batching
* Fix up
* Add docstring and model_input_types
* Use feature processor config from hub
* Alias GLPN feature extractor to image processor
* Alias feature extractor mixin
* Add return_numpy=False flag for resize
* Fix up
* Fix up
* Use different frameworks safely
* Safely import PIL
* Call function checking if PIL available
* Only import if vision available
* Address Sylvain PR comments
Co-authored-by: Sylvain.gugger@gmail.com
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/image_transforms.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Update src/transformers/models/glpn/feature_extraction_glpn.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add in docstrings
* Fix TFSwinSelfAttention to have relative position index as non-trainable weight (#18226)
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Refactor `TFSwinLayer` to increase serving compatibility (#18352)
* Refactor `TFSwinLayer` to increase serving compatibility
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix missed parameters while refactoring
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix window_reverse to calculate batch size
Signed-off-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add TF prefix to TF-Res test class (#18481)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove py.typed (#18485)
* Fix pipeline tests (#18487)
* Fix pipeline tests
* Make sure all pipelines tests run with init changes
* Use new huggingface_hub tools for download models (#18438)
* Draft new cached_file
* Initial draft for config and model
* Small fixes
* Fix first batch of tests
* Look in cache when internet is down
* Fix last tests
* Bad black, not fixing all quality errors
* Make diff less
* Implement change for TF and Flax models
* Add tokenizer and feature extractor
* For compatibility with main
* Add utils to move the cache and auto-do it at first use.
* Quality
* Deal with empty commit shas
* Deal with empty etag
* Address review comments
* Fix `test_dbmdz_english` by updating expected values (#18482)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Move cache folder to huggingface/hub for consistency with hf_hub (#18492)
* Move cache folder to just huggingface
* Thank you VsCode for this needless import
* Move to hub
* Forgot one
* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` (#18484)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Forgot one new_ for cache migration
* disable Onnx test for google/long-t5-tglobal-base (#18454)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Typo reported by Joel Grus on TWTR (#18493)
* Just re-reading the whole doc every couple of months 😬 (#18489)
* Delete valohai.yaml
* NLP => ML
* typo
* website supports https
* datasets
* 60k + modalities
* unrelated link fixing for accelerate
* Ok those links were actually broken
* Fix link
* Make `AutoTokenizer` auto-link
* wording tweak
* add at least one non-nlp task
* `transformers-cli login` => `huggingface-cli login` (#18490)
* zero chance anyone's using that constant no?
* `transformers-cli login` => `huggingface-cli login`
* `transformers-cli repo create` => `huggingface-cli repo create`
* `make style`
* Add seed setting to image classification example (#18519)
* [DX fix] Fixing QA pipeline streaming a dataset. (#18516)
* [DX fix] Fixing QA pipeline streaming a dataset.
QuestionAnsweringArgumentHandler would iterate over the whole dataset
effectively killing all properties of the pipeline.
This restores nice properties when using `Dataset` or `Generator` since
those are meant to be consumed lazily.
* Handling TF better.
* Clean up hub (#18497)
* Clean up utils.hub
* Remove imports
* More fixes
* Last fix
* update fsdp docs (#18521)
* updating fsdp documentation
* typo fix
* Fix compatibility with 1.12 (#17925)
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* fix torch.onnx.symbolic_opset12 import
* Reject bad version
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove debug statement
* Specify en in doc-builder README example (#18526)
Co-authored-by: Ankur Goyal <ankur@impira.com>
* New cache fixes: add safeguard before looking in folders (#18522)
* unpin resampy (#18527)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* ✨ update to use interlibrary links instead of Markdown (#18500)
* Add example of multimodal usage to pipeline tutorial (#18498)
* 📝 add example of multimodal usage to pipeline tutorial
* 🖍 apply feedbacks
* 🖍 apply niels feedback
* [VideoMAE] Add model to doc tests (#18523)
* Add videomae to doc tests
* Add pip install decord
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update perf_train_gpu_one.mdx (#18532)
* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper (#18473)
* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script
* make fixup changes
* PR comments
* changed input to Acceletor based on PR comment, ran make fixup
* Added comment explaining the sync_gradients statement
* Fixed lr scheduler max steps
* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper
* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper
* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script
* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py
* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script
* Add Spanish translation of converting_tensorflow_models.mdx (#18512)
* Add file in spanish docs to be translated
* Finish translation to Spanish
* Improve Spanish wording
* Add suggested changes from review
* Spanish translation of summarization.mdx (#15947) (#18477)
* Add Spanish translation of summarization.mdx
* Apply suggestions from code review
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Let's not cast them all (#18471)
* add correct dtypes when checking for params dtype
* forward contrib credits
* Update src/transformers/modeling_utils.py
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* more comments
- added more comments on why we cast only floating point parameters
* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* fix: data2vec-vision Onnx ready-made configuration. (#18427)
* feat: add the data2vec conf that are missing https://huggingface.co/docs/transformers/serialization
* fix: wrong config
* Add mt5 onnx config (#18394)
* update features
* MT5OnnxConfig added with updated with tests and docs
* fix imports
* fix onnc_config_cls for mt5
Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>
* Minor update of `run_call_with_unpacked_inputs` (#18541)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* BART - Fix attention mask device issue on copied models (#18540)
* attempt to fix attn mask device
* fix bart `_prepare_decoder_attention_mask`
- add correct device
- run `make fix-copies` to propagate the fix
* Adding a new `align_to_words` param to qa pipeline. (#18010)
* Adding a new `align_to_words` param to qa pipeline.
* Update src/transformers/pipelines/question_answering.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Import protection.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* 📝 update metric with evaluate (#18535)
* Restore _init_weights value in no_init_weights (#18504)
* Recover _init_weights value in no_init_weights
For potential nested use.
In addition, users might modify private no_init_weights as well.
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove private variable change check
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up comment
* 📝 update documentation build section (#18548)
* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models (#17901)
* first commit
* correct replace function
* add final changes
- works like charm!
- cannot implement tests yet
- tested
* clean up a bit
* add bitsandbytes dependencies
* working version
- added import function
- added bitsandbytes utils file
* small fix
* small fix
- fix import issue
* fix import issues
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit
- move bitsandbytes utils to utils
- change comments on functions
* reformat docstring
- reformat docstring on init_empty_weights_8bit
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert bad formatting
* change to bitsandbytes
* refactor a bit
- remove init8bit since it is useless
* more refactoring
- fixed init empty weights issue
- added threshold param
* small hack to make it work
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
* revmoe the small hack
* modify utils file
* make style + refactor a bit
* create correctly device map
* add correct dtype for device map creation
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
- remove with torch.grad
- do not rely on Python bool magic!
* add docstring
- add docstring for new kwargs
* add docstring
- comment `replace_8bit_linear` function
- fix weird formatting
* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add
* few modifs
- typo doc
- force cast into float16 when load_in_8bit is enabled
* added colab link
* add test architecture + docstring a bit
* refactor a bit testing class
* make style + refactor a bit
* enhance checks
- add more checks
- start writing saving test
* clean up a bit
* male style
* add more details on doc
* add more tests
- still needs to fix 2 tests
* replace by "or"
- could not fix it from GitHub GUI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit testing code + add readme
* make style
* fix import issue
* Update src/transformers/modeling_utils.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* add few comments
* add more doctring + make style
* more docstring
* raise error when loaded in 8bit
* make style
* add warning if loaded on CPU
* add small sanity check
* fix small comment
* add bitsandbytes on dockerfile
* Improve documentation
- improve documentation from comments
* add few comments
* slow tests pass on the VM but not on the CI VM
* Fix merge conflict
* make style
* another test should pass on a multi gpu setup
* fix bad import in testing file
* Fix slow tests
- remove dummy batches
- no more CUDA illegal memory errors
* odify dockerfile
* Update docs/source/en/main_classes/model.mdx
* Update Dockerfile
* Update model.mdx
* Update Dockerfile
* Apply suggestions from code review
* few modifications
- lm head can stay on disk/cpu
- change model name so that test pass
* change test value
- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging
* modify installation guidelines
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* replace `n`by `name`
* merge `load_in_8bit` and `low_cpu_mem_usage`
* first try - keep the lm head in full precision
* better check
- check the attribute `base_model_prefix` instead of computing the number of parameters
* added more tests
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit
* improve documentation
- fix typos for installation
- change title in the documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* TF: XLA-trainable DeBERTa v2 (#18546)
* fix deberta issues
* add different code paths for gpu and tpu
* shorter gpu take along axis
* Stable Dropout without tf cond
* variable must be float
* Preserve hub-related kwargs in AutoModel.from_pretrained (#18545)
* Preserve hub-related kwargs in AutoModel.from_pretrained
* Fix tests
* Remove debug statement
* TF Examples Rewrite (#18451)
* Finished QA example
* Dodge a merge conflict
* Update text classification and LM examples
* Update NER example
* New Keras metrics WIP, fix NER example
* Update NER example
* Update MC, summarization and translation examples
* Add XLA warnings when shapes are variable
* Make sure batch_size is consistently scaled by num_replicas
* Add PushToHubCallback to all models
* Add docs links for KerasMetricCallback
* Add docs links for prepare_tf_dataset and jit_compile
* Correct inferred model names
* Don't assume the dataset has 'lang'
* Don't assume the dataset has 'lang'
* Write metrics in text classification
* Add 'framework' to TrainingArguments and TFTrainingArguments
* Export metrics in all examples and add tests
* Fix training args for Flax
* Update command line args for translation test
* make fixup
* Fix accidentally running other tests in fp16
* Remove do_train/do_eval from run_clm.py
* Remove do_train/do_eval from run_mlm.py
* Add tensorflow tests to circleci
* Fix circleci
* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix save path for tests
* Fix some model card kwargs
* Explain the magical -1000
* Actually enable tests this time
* Skip text classification PR until we fix shape inference
* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Use commit hash to look in cache instead of calling head (#18534)
* Use commit hash to look in cache instead of calling head
* Add tests
* Add attr for local configs too
* Stupid typos
* Fix tests
* Update src/transformers/utils/hub.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address Julien's comments
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* `pipeline` support for `device="mps"` (or any other string) (#18494)
* `pipeline` support for `device="mps"` (or any other string)
* Simplify `if` nesting
* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix? @sgugger
* passing `attr=None` is not the same as not passing `attr` 🤯
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update philosophy to include other preprocessing classes (#18550)
* 📝 update philosophy to include other preprocessing classes
* 🖍 apply feedbacks
* Properly move cache when it is not in default path (#18563)
* Adds CLIP to models exportable with ONNX (#18515)
* onnx config for clip
* default opset as 14
* changes from the original repo
* input values order fix
* outputs fix
* remove unused import
* ran make fix-copies
* black format
* review comments: forward ref, import fix, model change revert, .to cleanup
* make style
* formatting fixes
* revert groupvit
* comment for cast to int32
* comment fix
* make .T as .t() for onnx conversion
* ran make fix-copies
* remove unneeded comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix copies
* remove comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* raise atol for MT5OnnxConfig (#18560)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix string (#18568)
* Segformer TF: fix output size in documentation (#18572)
* Segformer TF: fix output size in doc
* Segformer pytorch: fix output size in doc
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
* Fix resizing bug in OWL-ViT (#18573)
* Fixes resizing bug in OWL-ViT
* Defaults to square resize if size is set to an int
* Sets do_center_crop default value to False
* Fix LayoutLMv3 documentation (#17932)
* fix typos
* fix sequence_length docs of LayoutLMv3Model
* delete trailing white spaces
* fix layoutlmv3 docs more
* apply make fixup & quality
* change to two versions of input docstring
* apply make fixup & quality
* Skip broken tests
* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training (#18486)
* changing BartLearnedPositionalEmbedding forward signature and references to it
* removing debugging dead code (thanks style checker)
* blackened modeling_bart file
* removing copy inconsistencies via make fix-copies
* changing references to copied signatures in Bart variants
* make fix-copies once more
* using expand over repeat (thanks @michaelbenayoun)
* expand instead of repeat for all model copies
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
* german docs translation (#18544)
* Create _config.py
* Create _toctree.yml
* Create index.mdx
not sure about "du / ihr" oder "sie"
* Create quicktour.mdx
* Update _toctree.yml
* Update build_documentation.yml
* Update build_pr_documentation.yml
* fix build
* Update index.mdx
* Update quicktour.mdx
* Create installation.mdx
* Update _toctree.yml
* Deberta V2: Fix critical trace warnings to allow ONNX export (#18272)
* Fix critical trace warnings to allow ONNX export
* Force input to `sqrt` to be float type
* Cleanup code
* Remove unused import statement
* Update model sew
* Small refactor
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Use broadcasting instead of repeat
* Implement suggestion
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Match deberta v2 changes in sew_d
* Improve code quality
* Update code quality
* Consistency of small refactor
* Match changes in sew_d
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* [FX] _generate_dummy_input supports audio-classification models for labels (#18580)
* Support audio classification architectures for labels generation, as well as provides a flag to print warnings or not
* Use ENV_VARS_TRUE_VALUES
* Fix docstrings with last version of hf-doc-builder styler (#18581)
* Fix docstrings with last version of hf-doc-builder styler
* Remove empty Parameter block
* Bump nbconvert from 6.0.1 to 6.3.0 in /examples/research_projects/lxmert (#18565)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump nbconvert in /examples/research_projects/visual_bert (#18566)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix owlvit tests, update docstring examples (#18586)
* Return the permuted hidden states if return_dict=True (#18578)
* Load sharded pt to flax (#18419)
* initial commit
* add small test
* add cross pt tf flag to test
* fix quality
* style
* update test with new repo
* fix failing test
* update
* fix wrong param ordering
* style
* update based on review
* update related to recent new caching mechanism
* quality
* Update based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* quality and style
* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add type hints for ViLT models (#18577)
* Add type hints for Vilt models
* Add missing return type for TokenClassification class
* update doc for perf_train_cpu_many, add intel mpi introduction (#18576)
* update doc for perf_train_cpu_many, add mpi introduction
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typos (#18594)
* FSDP bug fix for `load_state_dict` (#18596)
* Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` (#18600)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Generate: validate `model_kwargs` (and catch typos in generate arguments) (#18261)
* validate generate model_kwargs
* generate tests -- not all models have an attn mask
* Supporting seq2seq models for `bitsandbytes` integration (#18579)
* Supporting seq2seq models for `bitsandbytes` integration
- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check
* small modification
- tie the weights before looking at tied weights!
* Add Donut (#18488)
* First draft
* Improve script
* Update script
* Make conversion work
* Add final_layer_norm attribute to Swin's config
* Add DonutProcessor
* Convert more models
* Improve feature extractor and convert base models
* Fix bug
* Improve integration tests
* Improve integration tests and add model to README
* Add doc test
* Add feature extractor to docs
* Fix integration tests
* Remove register_buffer
* Fix toctree and add missing attribute
* Add DonutSwin
* Make conversion script work
* Improve conversion script
* Address comment
* Fix bug
* Fix another bug
* Remove deprecated method from docs
* Make Swin and Swinv2 untouched
* Fix code examples
* Fix processor
* Update model_type to donut-swin
* Add feature extractor tests, add token2json method, improve feature extractor
* Fix failing tests, remove integration test
* Add do_thumbnail for consistency
* Improve code examples
* Add code example for document parsing
* Add DonutSwin to MODEL_NAMES_MAPPING
* Add model to appropriate place in toctree
* Update namespace to appropriate organization
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Fix URLs (#18604)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update BLOOM parameter counts (#18531)
* Update BLOOM parameter counts
* Update BLOOM parameter counts
* [doc] fix anchors (#18591)
the manual anchors end up being duplicated with automatically added anchors and no longer work.
* [fsmt] deal with -100 indices in decoder ids (#18592)
* [fsmt] deal with -100 indices in decoder ids
Fixes: https://github.com/huggingface/transformers/issues/17945
decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index.
For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.
Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.
* style
* small change (#18584)
* Flax Remat for LongT5 (#17994)
* [Flax] Add remat (gradient checkpointing)
* fix variable naming in test
* flip: checkpoint using a method
* fix naming
* fix class naming
* apply PVP's suggestions from code review
* add gradient_checkpointing to examples
* Add gradient_checkpointing to run_mlm_flax
* Add remat to longt5
* Add gradient checkpointing test longt5
* Fix args errors
* Fix remaining tests
* Make fixup & quality fixes
* replace kwargs
* remove unecessary kwargs
* Make fixup changes
* revert long_t5_flax changes
* Remove return_dict and copy to LongT5
* Remove test_gradient_checkpointing
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
* mac m1 `mps` integration (#18598)
* mac m1 `mps` integration
* Update docs/source/en/main_classes/trainer.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* addressing comments
* Apply suggestions from code review
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* resolve comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* Change scheduled CIs to use torch 1.12.1 (#18644)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add checks for some workflow jobs (#18583)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* TF: Fix generation repetition penalty with XLA (#18648)
* Update longt5.mdx (#18634)
* Update run_translation_no_trainer.py (#18637)
* Update run_translation_no_trainer.py
found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint
* fixs `no_decay` and `resume_step` issue
1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
* [bnb] Minor modifications (#18631)
* bnb minor modifications
- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* put in one block
- put bash instructions in one block
* update readme
- refactor a bit hardware requirements
* change text a bit
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add link to paper
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update tests/mixed_int8/README.md
* Apply suggestions from code review
* refactor a bit
* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add A6000
* clarify a bit
* remove small part
* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Examples: add Bloom support for token classification (#18632)
* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)
* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)
* Fix Yolos ONNX export test (#18606)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fixup
* Fix up
* Move PIL default arguments inside function for safe imports
* Add image utils to toctree
* Update `rescale` method to reflect changes in #18677
* Update docs/source/en/internal/image_processing_utils.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Address Niels PR comments
* Apply suggestions from code review - remove defaults to None
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix docstrings and revert to PIL.Image.XXX resampling
Use PIL.Image.XXX resampling values instead of PIL.Image.Resampling.XXX enum as it's only in the recent version >= 9.10 and version is not yet pinned and older version support deprecated
* Some more docstrings and PIL.Image tidy up
* Reorganise arguments so flags by modifiers
* Few last docstring fixes
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Ankur Goyal <ankrgyl@gmail.com>
Co-authored-by: Ankur Goyal <ankur@impira.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Mishig Davaadorj <dmishig@gmail.com>
Co-authored-by: Rasmus Arpe Fogh Jensen <Rasmus.arpe@gmail.com>
Co-authored-by: Ian Castillo <7807897+donelianc@users.noreply.github.com>
Co-authored-by: AguilaCudicio <aguila.cudicio@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Niklas Hansson <niklas.sven.hansson@gmail.com>
Co-authored-by: Thomas Chaigneau <t.chaigneau.tc@gmail.com>
Co-authored-by: YouJiacheng <1503679330@qq.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Dhruv Karan <k4r4n.dhruv@gmail.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Maxime G <joihn@users.noreply.github.com>
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
Co-authored-by: Wonseok Lee (Jack) <rollerkid02@snu.ac.kr>
Co-authored-by: Dan Jones <dan.j.jones2@gmail.com>
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
Co-authored-by: flozi00 <flozi00.fz@gmail.com>
Co-authored-by: iiLaurens <iiLaurens@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
Co-authored-by: zhoutang776 <47708118+zhoutang776@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update methods to optionally rescale
This is necessary to allow for casting our images / videos to numpy arrays within the feature extractors' call. We want to do this to make sure the behaviour is as expected when flags like are False. If some transformations aren't applied, then the output type can't be unexpected e.g. a list of PIL images instead of numpy arrays.
* Cast images to numpy arrays in call to enable consistent behaviour with different configs
* Remove accidental clip changes
* Update tests to reflect the scaling logic
We write a generic function to handle rescaling of our arrays. In order for the API to be intuitive, we take some factor c and rescale the image values by that. This means, the rescaling done in normalize and to_numpy_array are now done with array * (1/255) instead of array / 255. This leads to small differences in the resulting image. When testing, this was in the order of 1e-8, and so deemed OK
* Add serving_output and serving methods to some vision models
* Add serving outputs for DeiT
* Don't convert hidden states - differing shapes
* Make saveable
* Fix up
* Make swin saveable
* Add in tests
* Fix funnel tests (can't convert to tensor)
* Fix numpy call
* Tidy up a bit
* Add in hidden states - resnet
* Remove numpy
* Fix failing tests - tensor shape and skipping tests
* Remove duplicated function
* PR comments - formatting and var names
* PR comments
Add suggestions made by Joao Gante:
* Use tf.shape instead of shape_list
* Use @tooslow decorator on tests
* Simplify some of the logic
* PR comments
Address Yih-Dar Sheih comments - making tensor names consistent and make types float
* Types consistent with docs; disable test on swin (slow)
* CI trigger
* Change input_features to float32
* Add serving_output for segformer
* Fixup
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
* Copy inputs to train and test step before modifying them, as this breaks things
* Add XLA tests, fix our loss functions to be XLA-compatible
* make fixup
* Update loss computation test to expect vector of per-sample losses
* Patch loss for TFLED
* Patch loss for TFAlbert
* Add a tf_legacy_loss config flag that enables old loss functions
* Stop using config.get() because it's not a dict
* Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it
* make fixup
* Add XLA-compatible RAG loss
* Fix dtype of loss mask for TFAlbert
* Fix test for XLNet too because it overrides the default one
* make fixup
* Fix config test
* No more depending on GPU NaN behaviour
* Add test, avoid potential zero division
* Fix test item assignment
* Fix loss computation masking test
* make fixup
* Fix dtype bugs
* Raise RepoNotFoundError in case of 401
* Include changes from revert-17646-skip_repo_not_found
* Add a comment
* 💄 Code quality
* 💚 Update `get_from_cache` test
* 💚 Code quality & skip failing test
* add support for MLFLOW_FLATTEN_PARAMS
* ensure key is str
* fix style and update warning msg
* Empty commit to trigger CI
* fix bug in check_inits.py
* add unittest for flatten_dict utils
* fix 'NoneType' object is not callable on __del__
* add generic flatten_dict unittest to SPECIAL_MODULE_TO_TEST_MAP
* fix style
* update proto sentencepiece model
* Revert "update proto sentencepiece model"
This reverts commit b07f671747.
* add check
* add test
* Revert "Revert "update proto sentencepiece model""
This reverts commit 46108257b8.
* test for log level
* test for log level 2
* warning at the warning level
* clean
* format
* add explanation in docstring
* Add utility to find model labels
* Use it in the Trainer
* Update src/transformers/utils/generic.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Quality
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Make Transformers use cache files when hf.co is down
* Fix tests
* Was there a random circleCI failure?
* Isolate patches
* Style
* Comment out the failure since it doesn't fail anymore
* Better comment
* Split file_utils in several submodules
* Fixes
* Add back more objects
* More fixes
* Who exactly decided to import that from there?
* Second suggestion to code with code review
* Revert wront move
* Fix imports
* Adapt all imports
* Adapt all imports everywhere
* Revert this import, will fix in a separate commit
* Updates the default branch from master to main
* Links from `master` to `main`
* Typo
* Update examples/flax/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>