* Add serving_output and serving methods to some vision models
* Add serving outputs for DeiT
* Don't convert hidden states - differing shapes
* Make saveable
* Fix up
* Make swin saveable
* Add in tests
* Fix funnel tests (can't convert to tensor)
* Fix numpy call
* Tidy up a bit
* Add in hidden states - resnet
* Remove numpy
* Fix failing tests - tensor shape and skipping tests
* Remove duplicated function
* PR comments - formatting and var names
* PR comments
Add suggestions made by Joao Gante:
* Use tf.shape instead of shape_list
* Use @tooslow decorator on tests
* Simplify some of the logic
* PR comments
Address Yih-Dar Sheih comments - making tensor names consistent and make types float
* Types consistent with docs; disable test on swin (slow)
* CI trigger
* Change input_features to float32
* Add serving_output for segformer
* Fixup
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
* Return scalar losses instead of per-sample means
* Make loss shape (1,) instead of scalar
* Allow scalar losses in test_loss_computation
* Allow scalar losses in test_loss_computation
* Allow scalar losses in test_loss_computation
* Remove XLA loss function for RAG
* Copy inputs to train and test step before modifying them, as this breaks things
* Add XLA tests, fix our loss functions to be XLA-compatible
* make fixup
* Update loss computation test to expect vector of per-sample losses
* Patch loss for TFLED
* Patch loss for TFAlbert
* Add a tf_legacy_loss config flag that enables old loss functions
* Stop using config.get() because it's not a dict
* Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it
* make fixup
* Add XLA-compatible RAG loss
* Fix dtype of loss mask for TFAlbert
* Fix test for XLNet too because it overrides the default one
* make fixup
* Fix config test
* No more depending on GPU NaN behaviour
* Add test, avoid potential zero division
* Fix test item assignment
* Fix loss computation masking test
* make fixup
* Fix dtype bugs
* sharded conversion; add flag to control max hidden error
* better hidden name matching
* Add test: load TF from PT shards
* fix test (PT data must be local)
* Fix tests that broke when models used batchnorm
* Initializing the model twice does not actually...
...give you the same weights each time.
I am good at machine learning.
* Fix speed regression
* Prepare CI for v0.8.0
* pin hfh (revert before merge)
* Revert "pin hfh (revert before merge)"
This reverts commit a0103140e1.
* Test rc3
* Test latest rc
* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Use shape_list to safely get shapes
* Add relevant test
* Tidy and add metrics
* Resolve dynamic shaping issues and move test
* Tidy up and all samples in batch
* Formatting
* Add method to call to_tf_dataset() with column inference
* Add test for dataset creation
* Add a default arg for data collator
* Fix test
* Fix call with non-dev version of datasets
* Test correct column removal too
* make fixup
* More tests to make sure we remove unwanted columns
* Fix test to avoid predicting on unbuilt models
* Fix test to avoid predicting on unbuilt models
* Fix test to remove unwanted head mask columns from inputs
* Stop pushing your debug breakpoints to the main repo of the $2bn company you work for
* Skip the test in convnext because no grouped conv support
* Drop bools from the dataset dict
* Make style
* Skip the training test for models whose input dicts don't give us labels
* Skip transformerXL in the test because it doesn't return a simple loss
* Skip TFTapas because of some odd NaN losses
* make style
* make fixup
* Add docstring
* fixup
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove breakpoint from tests
* Fix assert, add requires_backends
* Protect tokenizer import with if TYPE_CHECKING
* make fixup
* Add noqa, more fixup
* More rearranging for ~* aesthetics *~
* Adding defaults for shuffle and batch_size to match to_tf_dataset()
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial commit
* Better label renaming
* Remove breakpoint before pushing (this is your job)
* Test a lot more in the Keras fit() test
* make fixup
* Clarify the case where we flatten y dicts into tensors
* Clarify the case where we flatten y dicts into tensors
* Extract label name remapping to a method
* Add test to ensure models can take int64 inputs
* is_integer is an attribute, not a method
* Fix test when some inputs aren't tensors
* Add casts to blenderbot and blenderbot-small
* Add casts to the other failing models
* Adding new train_step logic to make things less confusing for users
* DO NOT ASK WHY WE NEED THAT SUBCLASS
* Metrics now working, at least for single-output models with type annotations!
* Updates and TODOs for the new train_step
* Make fixup
* Temporary test workaround until T5 has types
* Temporary test workaround until T5 has types
* I think this actually works! Needs a lot of tests though
* MAke style/quality
* Revert changes to T5 tests
* Deleting the aforementioned unmentionable subclass
* Deleting the aforementioned unmentionable subclass
* Adding a Keras API test
* Style fixes
* Removing unneeded TODO and comments
* Update test_step too
* Stop trying to compute metrics with the dummy_loss, patch up test
* Make style
* make fixup
* Docstring cleanup
* make fixup
* make fixup
* Stop expanding 1D input tensors when using dummy loss
* Adjust T5 test given the new compile()
* make fixup
* Skipping test for convnext
* Removing old T5-specific Keras test now that we have a common one
* make fixup
* make fixup
* Only skip convnext test on CPU
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Avoiding TF import issues
* make fixup
* Update compile() to support TF 2.3
* Skipping model.fit() on template classes for now
* Skipping model.fit() on template class tests for now
* Replace ad-hoc solution with find_labels
* make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Make Transformers use cache files when hf.co is down
* Fix tests
* Was there a random circleCI failure?
* Isolate patches
* Style
* Comment out the failure since it doesn't fail anymore
* Better comment
* Make TF pt-tf equivalence test more aggressive
* Fix for TFConvNextModelTest and TFTransfoXLModelTest
* fix kwargs for outputs
* clean-up
* Add docstring for check_outputs()
* remove: need to rename encoder-decoder
* clean-up
* send PyTorch things to the correct device
* Add back the accidentally removed test case in test_pt_tf_model_equivalence()
* Fix: change to tuple before calling check_outputs()
* Fix: tfo could be a list
* use to_tuple()
* allow tfo only to be tuple or tensor
* allow tfo to be list or tuple for now + style change
* minor fix
* remove np.copy and update comments
* tfo -> tf_output, same for pt
* Add more detailed comment
* remove the incorrect comment
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add TF logits wrappers
* Add sample method
* add tests for TF logit wrappers
* TF generate sample tests now run on CPU
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* feat: initial implementation of convnext in tensorflow.
* fix: sample code for the classification model.
* chore: added checked for from the classification model.
* chore: set bias initializer in the classification head.
* chore: updated license terms.
* chore: removed ununsed imports
* feat: enabled argument during using drop_path.
* chore: replaced tf.identity with layers.Activation(linear).
* chore: edited default checkpoint.
* fix: minor bugs in the initializations.
* partial-fix: tf model errors for loading pretrained pt weights.
* partial-fix: call method updated
* partial-fix: cross loading of weights (4x3 variables to be matched)
* chore: removed unneeded comment.
* removed playground.py
* rebasing
* rebasing and removing playground.py.
* fix: renaming TFConvNextStage conv and layer norm layers
* chore: added initializers and other minor additions.
* chore: added initializers and other minor additions.
* add: tests for convnext.
* fix: integration tester class.
* fix: issues mentioned in pr feedback (round 1).
* fix: how output_hidden_states arg is propoagated inside the network.
* feat: handling of arg for pure cnn models.
* chore: added a note on equal contribution in model docs.
* rebasing
* rebasing and removing playground.py.
* feat: encapsulation for the convnext trunk.
* Fix variable naming; Test-related corrections; Run make fixup
* chore: added Joao as a contributor to convnext.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: corrected copyright year and added comment on NHWC.
* chore: fixed the black version and ran formatting.
* chore: ran make style.
* chore: removed from_pt argument from test, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* fix: tests in the convnext subclass, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: moved convnext test to the correct location
* fix: locations for the test file of convnext.
* fix: convnext tests.
* chore: applied sgugger's suggestion for dealing w/ output_attentions.
* chore: added comments.
* chore: applied updated quality enviornment style.
* chore: applied formatting with quality enviornment.
* chore: revert to the previous tests/test_modeling_common.py.
* chore: revert to the original test_modeling_common.py
* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py
* fix: tests for convnext.
* chore: removed output_attentions argument from convnext config.
* chore: revert to the earlier tf utils.
* fix: output shapes of the hidden states
* chore: removed unnecessary comment
* chore: reverting to the right test_modeling_tf_common.py.
* Styling nits
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* TF generate start refactor
* Add tf tests for sample generate
* re-organize
* boom boom
* Apply suggestions from code review
* re-add
* add all code
* make random greedy pass
* make encoder-decoder random work
* further improvements
* delete bogus file
* make gpt2 and t5 tests work
* finish logits tests
* correct logits processors
* correct past / encoder_outputs drama
* refactor some methods
* another fix
* refactor shape_list
* fix more shape list
* import shape
_list
* finish docs
* fix imports
* make style
* correct tf utils
* Fix TFRag as well
* Apply Lysandre's and Sylvais suggestions
* Update tests/test_generation_tf_logits_process.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/tf_utils.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* remove cpu according to gante
* correct logit processor
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add wrapper classes
* convert inner layers to tf
* Add TF Encoder and Decoder layers
* TFSpeech2Text models
* Loadable model
* TF model with same outputs as PT model
* test skeleton
* correct tests and run the fixup
* correct attention expansion
* TFSpeech2Text pask_key_values with TF format
* Rename compute_loss to hf_compute_loss to avoid conflicts with the new Keras method
* make style
* Adding deprecation warning to `compute_loss`
* Fix sneaky reference to compute_loss
* Replace logger.warning with warnings.warn
* Clarifying warning and deprecation timeline
* Add a main_input_name attribute to all models
* Fix tests
* Wtf Vs Code?
* Update src/transformers/models/imagegpt/modeling_imagegpt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Style
* Fix copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Initial commit for Keras model cards
* Revert accidental change
* make style
* make style
* make style
* Fix PR comments
* Move repo creation to __init__
* Fixes to README.md creation
* Partial progress for proper card creation on `push_to_hub`
* Proper card creation from `push_to_hub` plus fixes for malformed model cards
* Fixes for model card creation outside the callback
* Adding a model card creation test
* Putting the model card creation test in the right file.
Good job, Matt.
* make style
* Fix model card test temp dir usage
* Fix model card creation when no optimizer present
* Fixes for when training history not present
* Fix accidental edit to test_modeling_common
* test: make sure model configs are jsonifiable
* fix: return python dict instead of config object
* fix: accept pretrained config and use correct class
* Re-enabling slow tests and applying them to core models only
* Re-enabling slow tests and applying them to core models only
* Add new test file to fetcher
* Remove tooslow tests from test_modeling_tf_common.py
* make style
* Style fixes
* Style fixes
* Style fixes
* Style fixes
* Adding core tests to GPT2 and BART
* Removing unused imports
Co-authored-by: niklas.fruehauf <niklas.fruehauf@sovanta.com>
Co-authored-by: matt <rocketknight1@gmail.com>
* Start the work for TFViTModel
* Convert to TF code - need to check in the follow up commits
* Clean up model code
* Expose TFViTModel
* make style
* make quality
* Add test
* make style & quality
* Fix some imports
* fix wrong usage - *kwargs => ** kwargs
* Fix Conv2D weight loading (PT->TF) issue
* Add tests for images with different sizes + fix model
* Fix some common tests for TFViTModel
* Use inputs instead of input_ids in test_compile_tf_model
* Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name
* Avoid transpose in TFViT call
* Fix Conv2D issue in load_tf2_weights_in_pytorch_model
* Use tf.keras.layers.Conv2D instead of tf.nn.conv2d
* Using simpler heuristic to detect Conv2D layer
* Change convert_tf_weight_name_to_pt_weight_name to return TransposeType
* Check tf_weight_shape is not None before using it
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix missing comma
* fix input dtype
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add cross attentions to TFGPT2Model
* Add TFEncoderDecoderModel
* Add TFBaseModelOutputWithPoolingAndCrossAttentions
* Add cross attentions to TFBertModel
* Fix past or past_key_values argument issue
* Fix generation
* Fix save and load
* Add some checks and comments
* Clean the code that deals with past keys/values
* Add kwargs to processing_inputs
* Add serving_output to TFEncoderDecoderModel
* Some cleaning + fix use_cache value issue
* Fix tests + add bert2bert/bert2gpt2 tests
* Fix more tests
* Ignore crossattention.bias when loading GPT2 weights into TFGPT2
* Fix return_dict_in_generate in tf generation
* Fix is_token_logit_eos_token bug in tf generation
* Finalize the tests after fixing some bugs
* Fix another is_token_logit_eos_token bug in tf generation
* Add/Update docs
* Add TFBertEncoderDecoderModelTest
* Clean test script
* Add TFEncoderDecoderModel to the library
* Add cross attentions to TFRobertaModel
* Add TFRobertaEncoderDecoderModelTest
* make style
* Change the way of position_ids computation
* bug fix
* Fix copies in tf_albert
* Remove some copied from and apply some fix-copies
* Remove some copied
* Add cross attentions to some other TF models
* Remove encoder_hidden_states from TFLayoutLMModel.call for now
* Make style
* Fix TFRemBertForCausalLM
* Revert the change to longformer + Remove copies
* Revert the change to albert and convbert + Remove copies
* make quality
* make style
* Add TFRembertEncoderDecoderModelTest
* make quality and fix-copies
* test TFRobertaForCausalLM
* Fixes for failed tests
* Fixes for failed tests
* fix more tests
* Fixes for failed tests
* Fix Auto mapping order
* Fix TFRemBertEncoder return value
* fix tf_rembert
* Check copies are OK
* Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined
* Add TFEncoderDecoderModelSaveLoadTests
* fix tf weight loading
* check the change of use_cache
* Revert the change
* Add missing test_for_causal_lm for TFRobertaModelTest
* Try cleaning past
* fix _reorder_cache
* Revert some files to original versions
* Keep as many copies as possible
* Apply suggested changes - Use raise ValueError instead of assert
* Move import to top
* Fix wrong require_torch
* Replace more assert by raise ValueError
* Add test_pt_tf_model_equivalence (the test won't pass for now)
* add test for loading/saving
* finish
* finish
* Remove test_pt_tf_model_equivalence
* Update tf modeling template
* Remove pooling, added in the prev. commit, from MainLayer
* Update tf modeling test template
* Move inputs["use_cache"] = False to modeling_tf_utils.py
* Fix torch.Tensor in the comment
* fix use_cache
* Fix missing use_cache in ElectraConfig
* Add a note to from_pretrained
* Fix style
* Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt
* Fix TFMLP (in TFGPT2) activation issue
* Fix None past_key_values value in serving_output
* Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub
* Apply review suggestions - style for cross_attns in serving_output
* Apply review suggestions - change assert + docstrings
* break the error message to respect the char limit
* deprecate the argument past
* fix docstring style
* Update the encoder-decoder rst file
* fix Unknown interpreted text role "method"
* fix typo
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>