* added test
* correct embedding init
* some changes in blenderbot (incomplete)
* update blenderbot (diff to be used as reference)
* update blenderbot_small
* update LED
* update marian
* update T5 and remove TFWrappedEmbeddings
* nullcontext() -> ContextManagers()
* fix embedding init
* Override save() to use the serving signature as the default
* Replace int32 with int64 in all our serving signatures
* Remember one very important line so as not to break every test at once
* Dtype fix for TFLED
* dtype fix for shift_tokens_right in general
* Dtype fixes in mBART and RAG
* Fix dtypes for test_unpack_inputs
* More dtype fixes
* Yet more mBART + RAG dtype fixes
* Yet more mBART + RAG dtype fixes
* Add a check that the model actually has a serving method
Comparisons like
version.parse(torch.__version__) > version.parse("1.6")
are True for torch==1.6.0+cu101 or torch==1.6.0+cpu
version.parse(version.parse(torch.__version__).base_version) are preferred (and available in pytorch_utils.py
* [Flax] Add remat (gradient checkpointing)
* fix variable naming in test
* flip: checkpoint using a method
* fix naming
* fix class naming
* apply PVP's suggestions from code review
* make fix-copies
* fix big-bird, electra, roberta
* cookie-cutter
* fix flax big-bird
* move test to common
* Use torch.finfo(self.dtype).min
* for GPTNeoX
* for Albert
* For Splinter
* Update src/transformers/models/data2vec/modeling_data2vec_audio.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix -inf used in Bart-like models
* Fix a few remaining -inf
* more fix
* clean up
* For CLIP
* For FSMT
* clean up
* fix test
* Add dtype argument and use it for LayoutLMv3
* update FlaxLongT5Attention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* begin do_init
* add params_shape_tree
* raise error if params are accessed when do_init is False
* don't allow do_init=False when keys are missing
* make shape tree a property
* assign self._params at the end
* add test for do_init
* add do_init arg to all flax models
* fix param setting
* disbale do_init for composite models
* update test
* add do_init in FlaxBigBirdForMultipleChoice
* better names and errors
* improve test
* style
* add a warning when do_init=False
* remove extra if
* set params after _required_params
* add test for from_pretrained
* do_init => _do_init
* chage warning to info
* fix typo
* add params in init_weights
* add params to gpt neo init
* add params to init_weights
* update do_init test
* Trigger CI
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update template
* trigger CI
* style
* style
* fix template
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding new train_step logic to make things less confusing for users
* DO NOT ASK WHY WE NEED THAT SUBCLASS
* Metrics now working, at least for single-output models with type annotations!
* Updates and TODOs for the new train_step
* Make fixup
* Temporary test workaround until T5 has types
* Temporary test workaround until T5 has types
* I think this actually works! Needs a lot of tests though
* MAke style/quality
* Revert changes to T5 tests
* Deleting the aforementioned unmentionable subclass
* Deleting the aforementioned unmentionable subclass
* Adding a Keras API test
* Style fixes
* Removing unneeded TODO and comments
* Update test_step too
* Stop trying to compute metrics with the dummy_loss, patch up test
* Make style
* make fixup
* Docstring cleanup
* make fixup
* make fixup
* Stop expanding 1D input tensors when using dummy loss
* Adjust T5 test given the new compile()
* make fixup
* Skipping test for convnext
* Removing old T5-specific Keras test now that we have a common one
* make fixup
* make fixup
* Only skip convnext test on CPU
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Avoiding TF import issues
* make fixup
* Update compile() to support TF 2.3
* Skipping model.fit() on template classes for now
* Skipping model.fit() on template class tests for now
* Replace ad-hoc solution with find_labels
* make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Split file_utils in several submodules
* Fixes
* Add back more objects
* More fixes
* Who exactly decided to import that from there?
* Second suggestion to code with code review
* Revert wront move
* Fix imports
* Adapt all imports
* Adapt all imports everywhere
* Revert this import, will fix in a separate commit
* Updates the default branch from master to main
* Links from `master` to `main`
* Typo
* Update examples/flax/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* undo black autoformat
* minor fix to rembert forward with default
* make fix-copies, make quality
* Adding types to template model
* Removing List from the template types
* Remove `Optional` from a couple of types that don't accept `None`
Co-authored-by: matt <rocketknight1@gmail.com>
* added type hints for BART model
* make fixup, adding imports to copied files
* Adding some missing types to cookiecutter
* Adding some missing types to cookiecutter
* Adding some missing types to cookiecutter
Co-authored-by: matt <rocketknight1@gmail.com>
* Do not change the output from tuple to list - to match PT's version
* Fix the same issues for 5 other models and the template
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix wrong method name tf.concatenate
* add tests related to causal LM / decoder
* make style and quality
* clean-up
* Fix TFBertModel's extended_attention_mask when past_key_values is provided
* Fix tests
* fix copies
* More tf.int8 -> tf.int32 in TF test template
* clean-up
* Update TF test template
* revert the previous commit + update the TF test template
* Fix TF template extended_attention_mask when past_key_values is provided
* Fix some styles manually
* clean-up
* Fix ValueError: too many values to unpack in the test
* Fix more: too many values to unpack in the test
* Add a comment for extended_attention_mask when there is past_key_values
* Fix TFElectra extended_attention_mask when past_key_values is provided
* Add tests to other TF models
* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder
* Fix not passing training arg to lm_head in TFRobertaForCausalLM
* Fix tests (with past) for TF Roberta
* add testing for pask_key_values for TFElectra model
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* TF generate start refactor
* Add tf tests for sample generate
* re-organize
* boom boom
* Apply suggestions from code review
* re-add
* add all code
* make random greedy pass
* make encoder-decoder random work
* further improvements
* delete bogus file
* make gpt2 and t5 tests work
* finish logits tests
* correct logits processors
* correct past / encoder_outputs drama
* refactor some methods
* another fix
* refactor shape_list
* fix more shape list
* import shape
_list
* finish docs
* fix imports
* make style
* correct tf utils
* Fix TFRag as well
* Apply Lysandre's and Sylvais suggestions
* Update tests/test_generation_tf_logits_process.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/tf_utils.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* remove cpu according to gante
* correct logit processor
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* use_cache = False for PT models if labels is passed
* Fix for BigBirdPegasusForConditionalGeneration
* add warning if users specify use_cache=True
* Use logger.warning instead of warnings.warn
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add new test
* update test
* remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py`
* add `tokenizer_file` for the fast only tokenizer
* change global variables layoutxml
* remove `"tokenizer_file"` from DPR tokenizer's Global variables
* remove `tokenizer_file` from herbert slow tokenizer init
* `"tokenizer_file"` from LED tokenizer's Global variables
* remove `tokenizer_file` from mbart slow tokenizer init
* remove `tokenizer_file` from slow tokenizer template
* adapt to versioning
* adapt the `test_tokenizer_mismatch_warning` test
* clean test
* clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py
* Revert "remove `tokenizer_file` from mbart slow tokenizer init"
This reverts commit 0dbb723fa9.
* Revert "`"tokenizer_file"` from LED tokenizer's Global variables"
This reverts commit 5a3f879bdd.
* Revert "remove `tokenizer_file` from herbert slow tokenizer init"
This reverts commit f5e10007b7.
* Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables"
This reverts commit da0895330b.
* set `tokenizer_file` in super `__init__` of mbart
* Fix loss calculation in TFFunnelForTokenClassification
* revert the change in TFFunnelForTokenClassification
* fix FunnelForTokenClassification loss
* fix other TokenClassification loss
* fix more
* fix more
* add num_labels to ElectraForTokenClassification
* revert the change to research projects
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add missing __spec__ for transformers.models.auto
* Moves the __spec__-test to the UnitTest class
* Adds module_spec to all instances of _LazyModule
* Refactors an old test from pytest to unittest
* Rename compute_loss to hf_compute_loss to avoid conflicts with the new Keras method
* make style
* Adding deprecation warning to `compute_loss`
* Fix sneaky reference to compute_loss
* Replace logger.warning with warnings.warn
* Clarifying warning and deprecation timeline
* Convert docstrings of all configurations and tokenizers
* Processors and fixes
* Last modeling files and fixes to models
* Pipeline modules
* Utils files
* Data submodule
* All the other files
* Style
* Missing examples
* Style again
* Fix copies
* Say bye bye to rst docstrings forever
* Convert file_utils docstrings to Markdown
* Test on BERT
* Return block indent
* Temporarily disable doc styler
* Remove from quality checks as well
* Remove doc styler mess
* Remove check from circleCI
* Fix typo
* Convert file_utils docstrings to Markdown
* Test on BERT
* Return block indent
* Temporarily disable doc styler
* Remove from quality checks as well
* Remove doc styler mess
* Remove check from circleCI
* Fix typo
* Let's go on all other model files
* Add templates too
* Styling and quality
* Implement head_mask for Flax BERT and other models copied from BERT
* Remove `from jax._src.nn.functions import sigmoid`
Remove `from jax._src.nn.functions import sigmoid` unintentionally added by IDE
* Remove no more valid copy statement
* Apply patil-suraj's suggestions from code review
* Apply suggestions from the code review
* Update Flax template
* Fix a typo
* Also update template for CausalLM modules
* Add cross attentions to TFGPT2Model
* Add TFEncoderDecoderModel
* Add TFBaseModelOutputWithPoolingAndCrossAttentions
* Add cross attentions to TFBertModel
* Fix past or past_key_values argument issue
* Fix generation
* Fix save and load
* Add some checks and comments
* Clean the code that deals with past keys/values
* Add kwargs to processing_inputs
* Add serving_output to TFEncoderDecoderModel
* Some cleaning + fix use_cache value issue
* Fix tests + add bert2bert/bert2gpt2 tests
* Fix more tests
* Ignore crossattention.bias when loading GPT2 weights into TFGPT2
* Fix return_dict_in_generate in tf generation
* Fix is_token_logit_eos_token bug in tf generation
* Finalize the tests after fixing some bugs
* Fix another is_token_logit_eos_token bug in tf generation
* Add/Update docs
* Add TFBertEncoderDecoderModelTest
* Clean test script
* Add TFEncoderDecoderModel to the library
* Add cross attentions to TFRobertaModel
* Add TFRobertaEncoderDecoderModelTest
* make style
* Change the way of position_ids computation
* bug fix
* Fix copies in tf_albert
* Remove some copied from and apply some fix-copies
* Remove some copied
* Add cross attentions to some other TF models
* Remove encoder_hidden_states from TFLayoutLMModel.call for now
* Make style
* Fix TFRemBertForCausalLM
* Revert the change to longformer + Remove copies
* Revert the change to albert and convbert + Remove copies
* make quality
* make style
* Add TFRembertEncoderDecoderModelTest
* make quality and fix-copies
* test TFRobertaForCausalLM
* Fixes for failed tests
* Fixes for failed tests
* fix more tests
* Fixes for failed tests
* Fix Auto mapping order
* Fix TFRemBertEncoder return value
* fix tf_rembert
* Check copies are OK
* Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined
* Add TFEncoderDecoderModelSaveLoadTests
* fix tf weight loading
* check the change of use_cache
* Revert the change
* Add missing test_for_causal_lm for TFRobertaModelTest
* Try cleaning past
* fix _reorder_cache
* Revert some files to original versions
* Keep as many copies as possible
* Apply suggested changes - Use raise ValueError instead of assert
* Move import to top
* Fix wrong require_torch
* Replace more assert by raise ValueError
* Add test_pt_tf_model_equivalence (the test won't pass for now)
* add test for loading/saving
* finish
* finish
* Remove test_pt_tf_model_equivalence
* Update tf modeling template
* Remove pooling, added in the prev. commit, from MainLayer
* Update tf modeling test template
* Move inputs["use_cache"] = False to modeling_tf_utils.py
* Fix torch.Tensor in the comment
* fix use_cache
* Fix missing use_cache in ElectraConfig
* Add a note to from_pretrained
* Fix style
* Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt
* Fix TFMLP (in TFGPT2) activation issue
* Fix None past_key_values value in serving_output
* Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub
* Apply review suggestions - style for cross_attns in serving_output
* Apply review suggestions - change assert + docstrings
* break the error message to respect the char limit
* deprecate the argument past
* fix docstring style
* Update the encoder-decoder rst file
* fix Unknown interpreted text role "method"
* fix typo
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
We use `torch.unique` here only to check whether every elements have
the same value.
Therefore, we can use `torch.unique_consecutive` here.
This function eliminates all but the first element from every consecutive
group of equivalent elements.
Like, if we apply this function to `[1, 2, 2, 1]`, it will result in
`[1, 2, 1]`.
As you could see, this is enough for checking whether every elements
have the same value.
Since `torch.unique_consecutive` do less thing, it is much more faster.
On my computer, it is 25x faster on GPU and 15x faster on CPU.
* Make gradient_checkpointing a training argument
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix tests
* Style
* document Gradient Checkpointing as a performance feature
* Small rename
* PoC for not using the config
* Adapt BC to new PoC
* Forgot to save
* Rollout changes to all other models
* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
* refactor GPT Config to allow dyn. properties
* make attribute_map a class attribute
* remove old code
* update unit test to test config: Add test for common properties setter
* update unit test to test config: Add test for common properties passed as parameters to __init__
* update to black code format
* Allow that setters are not defined for certain config classes
* update config classes to implement attribute_map
* bugfix lxmert config - id2labels was not defined when num_labels was set
* update broken configs - add attribute_maps
* update bart config
* update black codestyle
* update documentation on common config attributes
* update GPTJ config to new attribute map
* update docs on common attributes
* gptj config: add max_position_embeddings
* gptj config: format with black
* update speech to text 2 config
* format doc file to max_len 119
* update config template
* Add option to add flax
* Add flax template for __init__.py
* Add flax template for .rst
* Copy TF modeling template
* Add a missing line in modeling_tf_... template
* Update first half of modeling_flax_..
* Update encoder flax template
* Copy test_modeling_tf... as test_modeling_flax...
* Replace some TF to Flax in test_modeling_flax_...
* Replace tf to np
some function might not work, like _assert_tensors_equal
* Replace remaining tf to np (might not work)
* Fix cookiecutter
* Add Flax in to_replace_... template
* Update transformers-cli add-new-model
* Save generate_flax in configuration.json
This will be read by transformers-cli
* Fix to_replace_... and cli
* Fix replace cli
* Fix cookiecutter name
* Move docstring earlier to avoid not defined error
* Fix a missing Module
* Add encoder-decoder flax template from bart
* Fix flax test
* Make style
* Fix endif
* Fix replace all "utf-8 -> unp-8"
* Update comment
* Fix flax template (add missing ..._DOCSTRING)
* Use flax_bart imports in template (was t5)
* Fix unp
* Update templates/adding_a_new_model/tests
* Revert "Fix unp"
This reverts commit dc9002a41d.
* Remove one line of copied from to suppress CI error
* Use generate_tensorflow_pytorch_and_flax
* Add a missing part
* fix typo
* fix flax config
* add examples for flax
* small rename
* correct modeling imports
* correct auto loading
* corrects some flax tests
* correct small typo
* correct as type
* finish modif
* correct more templates
* final fixes
* add file testers
* up
* make sure tests match template regex
* correct pytorch
* correct tf
* correct more tf
* correct imports
* minor error
* minor error
* correct init
* more fixes
* correct more flax tests
* correct flax test
* more fixes
* correct docs
* update
* fix
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Initial work
* All auto models
* All tf auto models
* All flax auto models
* Tokenizers
* Add feature extractors
* Fix typos
* Fix other typo
* Use the right config
* Remove old mapping names and update logic in AutoTokenizer
* Update check_table
* Fix copies and check_repo script
* Fix last test
* Add back name
* clean up
* Update template
* Update template
* Forgot a )
* Use alternative to fixup
* Fix TF model template
* Address review comments
* Address review comments
* Style
* registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing
* sytle format
* adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue
* adding the try catch to the fix as persistent flag is only available from PT >1.6
* adding version check
* added the condition to only use the token_type_ids buffer when its autogenerated not passed by user
* adding comments and making the conidtion where token_type_ids are None to use the registered buffer
* taking out position-embeddding from the if block
* adding comments
* handling the case if buffer for position_ids was not registered
* reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings
* reverting the token_type_ids in case of None to the previous version
* reverting changes on position_ids adding back the if block
* changes added by running make fix-copies
* changes added by running make fix-copies and added the import version as it was getting used
* changes added by running make fix-copies
* changes added by running make fix-copies
* fixing the import format
* fixing the import format
* modified to use temp tensor for trimed and expanded token_type_ids buffer
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* clean up
* clean up
* clean up
* clean up
* Nit
* Nit
* Nit
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* changes based on latest in master
* Adapt templates
* Add version import
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
* Fixing all models QA clamp_ bug.
* Add cross_attn_head_mask to BART
* Fix cross_attentions in TFBart-like models
* This commit enables returning of `cross_attentions`
for TFBart-like models
* It also fixes attention head masking in cross-attenion module
* Update TF model templates
* Fix missing , in TF model templates
* Fix typo: congig -> config
* Fix cross-attention head mask for Torch BART models
* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus
* Enable test_headmasking for M2M_100 model
* Fix cross_head_mask for FSMT, LED and T5
* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5
* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`
* Update template
* Fix template for BartForCausalLM
* Fix cross_head_mask for Speech2Text models
* Fix cross_head_mask in templates
* Fix args order in BartForCausalLM template
* Fix doc in BART templates
* Make more explicit naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Fix doc
* make style quality
* Fix speech2text docstring
* Assumption of padding_idx <2 might not stand
* Use offset instead of 2
* Fix with black
* Change behavior to warning instead for backward compatibility.
* Fix with black
* Remove warning
* Make padding_idx non-required
* padding_idx fix for blenderbot
* padding_idx fix for blenderbot_small
* padding_idx fix for led
* padding_idx fix for mbart
* Remove extra whitespaces
* padding_idx fix for template
* Fix padding_idx passed to nn.Embedding mistake
* Fixed padding_idx passed to positional embedding in template
* Remove padding_idx from pytorch learned positional embeddings
* Remove accidentally added quotes
* Remove padding_idx from tf learned positional embeddings
* Remove zeroing of weights in __init__
Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>