* Add a main_input_name attribute to all models
* Fix tests
* Wtf Vs Code?
* Update src/transformers/models/imagegpt/modeling_imagegpt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Style
* Fix copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix gradient_checkpointing backward compatibility
* Remove needless line
* make sure mask prob is big enough and length small enough
* Fix tests
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Start the work for TFViTModel
* Convert to TF code - need to check in the follow up commits
* Clean up model code
* Expose TFViTModel
* make style
* make quality
* Add test
* make style & quality
* Fix some imports
* fix wrong usage - *kwargs => ** kwargs
* Fix Conv2D weight loading (PT->TF) issue
* Add tests for images with different sizes + fix model
* Fix some common tests for TFViTModel
* Use inputs instead of input_ids in test_compile_tf_model
* Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name
* Avoid transpose in TFViT call
* Fix Conv2D issue in load_tf2_weights_in_pytorch_model
* Use tf.keras.layers.Conv2D instead of tf.nn.conv2d
* Using simpler heuristic to detect Conv2D layer
* Change convert_tf_weight_name_to_pt_weight_name to return TransposeType
* Check tf_weight_shape is not None before using it
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix missing comma
* fix input dtype
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add first draft
* Make forward pass work
* Improve conversion script
* Add notebook that checks if it works
* Add BeitForSemanticSegmentation to the tests
* More improvements
* Make BeitForSemanticSegmentation consistent with Segformer
* Small bug fix
* Add BeitForSemanticSegmentation to docs
* Make sure model doesn't output hidden states when the user doesn't want to
* Make it possible to convert the large model
* Fix issue
* Fix conversion script for large model
* Add auxiliary_head option to semantic segmentation model
* Apply suggestions from @sgugger's review
* Apply suggestions from code review
* Fix failing test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Symbolic trace dynamic axes support for BERT like models (albert, bert, distilbert, mobilebert, electra, megatron-bert)
* Sanity checks before tracing that make sure the model to trace is supported
* Adapted to PyTorch 1.9
Co-authored-by: Michael Benayoun <michael@huggingface.co>
* Make gradient_checkpointing a training argument
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix tests
* Style
* document Gradient Checkpointing as a performance feature
* Small rename
* PoC for not using the config
* Adapt BC to new PoC
* Forgot to save
* Rollout changes to all other models
* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
* fix AutoModel.from_pretrained(..., torch_dtype=...)
* fix to_diff_dict
* add better test
* torch is not always available when a model has self.torch_dtype
* Add option to load a pretrained model with mismatched shapes
* Fail at loading when mismatched shapes in Flax
* Fix tests
* Update src/transformers/modeling_flax_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Clean push to hub API
* Create working dir if it does not exist
* Different tweak
* New API + all models + test Flax
* Adds the Trainer clean up
* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* (nit) output types
* No need to set clone_from when folder exists
* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Add generated_from_trainer tag
* Update to new version
* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures:
- ALBERT
- DistilBERT
- MobileBERT
- MegatronBERT
- GPT2
- GPT Neo
Co-authored-by: Michael Benayoun <michael@huggingface.co>
* Fix cross-attention head mask for Torch BART models
* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus
* Enable test_headmasking for M2M_100 model
* Fix cross_head_mask for FSMT, LED and T5
* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5
* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`
* Update template
* Fix template for BartForCausalLM
* Fix cross_head_mask for Speech2Text models
* Fix cross_head_mask in templates
* Fix args order in BartForCausalLM template
* Fix doc in BART templates
* Make more explicit naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Fix doc
* make style quality
* Fix speech2text docstring
* Initial support for upload to hub
* push -> upload
* Fixes + examples
* Fix torchhub test
* Torchhub test I hate you
* push_model_to_hub -> push_to_hub
* Apply mixin to other pretrained models
* Remove ABC inheritance
* Add tests
* Typo
* Run tests
* Install git-lfs
* Change approach
* Add push_to_hub to all
* Staging test suite
* Typo
* Maybe like this?
* More deps
* Cache
* Adapt name
* Quality
* MOAR tests
* Put it in testing_utils
* Docs + torchhub last hope
* Styling
* Wrong method
* Typos
* Update src/transformers/file_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address review comments
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Squash all commits into one
* Update ViTFeatureExtractor to use image_utils instead of torchvision
* Remove torchvision and add Pillow
* Small docs improvement
* Address most comments by @sgugger
* Fix tests
* Clean up conversion script
* Pooler first draft
* Fix quality
* Improve conversion script
* Make style and quality
* Make fix-copies
* Minor docs improvements
* Should use fix-copies instead of manual handling
* Revert "Should use fix-copies instead of manual handling"
This reverts commit fd4e591bce.
* Place ViT in alphabetical order
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added check to ensure model name passed to from_pretrained and model are the same
* Added test to check from_pretrained throws assert error when passed an incompatiable model name
* Modified assert in from_pretrained with f-strings. Modified test to ensure desired assert message is being generated
* Added check to ensure config and model has model_type
* Fix FlauBERT heads
Co-authored-by: vimarsh chaturvedi <vimarsh chaturvedi>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add {decoder_,}head_mask to LED
* Fix create_custom_forward signatue in encoder
* Add head_mask to longformer
* Add head_mask to longformer to fix dependencies
of LED on Longformer.
* Not working yet
* Add mising one input in longofrmer_modeling.py
* make fix-copies