* config and tokenization(fast too) changed and ErnieEncoder added
* Slow Tokenization Added
* Tokenizer(slow) is now working and Fast Tokenizer removed
* Added Config code
* Added Base Model and utils
* ErnieMModel is now working
* All added except tests
* All tests passed except ErnieUIEM
* All tests passed
* all fixes done
* all fixes done
* fixed MAP
* fixed check_code_quality
* fixed Build PR Documentation issue
* Added changes(comments) and also updated to the latest upstream/main
* Added fixup
* Added # Copied comments
* Added fixup
* Added more comments and some nits
* Added fixup
* Fixed README_hd.md
* Added more fixes
* ErnieMTokenizer (being sentencepiece) protected and other docs edited
* Added code_quality fix
* Fixed for
* Added more fix
* modified AZ
* ernie-m tokenization test added!
* attention mask part fixed(with 0->self.config.pad_token_id)
* applied make fixup
* First draft
* More improvements
* More improvements
* Improve conversion script
* Convert all weights
* Make forward pass work
* Make logits match
* More improvements
* More improvements
* More improvements
* Use get_input_embeddings
* Improve some more
* Improve model tests
* Improve model tests
* More improvements
* Fix processor
* Update files
* Update prepare_inputs_for_generation
* More improvements
* Fix copies
* More fixes
* Make fixup
* More improvements
* Add support for seq2seq language model
* More improvements
* Fix test
* More improvements
* Improve conversion script
* Remove some todo's
* Fix README's
* Improve conversion script
* Fix generation
* Fix style and remove Blip2Model
* Fix model outputs
* More improvements
* Set eos_token_id in config
* Fix quality
* Small improvements
* Add processor tests
* More improvements
* Apply suggestions
* Apply suggestions
* Add integration test
* Update image URL
* Add integration test
* Fix model_type
* Update style
* Improve docs
* Add doc tests
* Fix copies
* Remove tests which are passing
* Improve some more
* Add tests for seq2seq language models
* Minor fix
* Convert more checkpoints
* finalize CI
* Fix blip and blip2 processors
* add `accelerate` support for `blip2`
* clean up
* make style
* Update conversion script
* Update conversion script some more
* Update organization
* revert toc file
* add blip-2 to toc file
* Some more improvements
* Fix docstring
* Improve docs
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* make SpeechT5 model by copying Wav2Vec2
* add paper to docs
* whoops added docs in wrong file
* remove SpeechT5Tokenizer + put CTC back in the name
* remove deprecated class
* remove unused docstring
* delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead
* remove classes we don't need right now
* initial stab at speech encoder prenet
* add more speech encoder prenet stuff
* improve SpeechEncoderPrenet
* add encoder (not finished yet)
* add relative position bias to self-attention
* add encoder CTC layers
* fix formatting
* add decoder from BART, doesn't work yet
* make it work with generate loop
* wrap the encoder into a speech encoder class
* wrap the decoder in a text decoder class
* changed my mind
* changed my mind again ;-)
* load decoder weights, make it work
* add weights for text decoder postnet
* add SpeechT5ForCTC model that uses only the encoder
* clean up EncoderLayer and DecoderLayer
* implement _init_weights in SpeechT5PreTrainedModel
* cleanup config + Encoder and Decoder
* add head + cross attention masks
* improve doc comments
* fixup
* more cleanup
* more fixup
* TextDecoderPrenet works now, thanks Kendall
* add CTC loss
* add placeholders for other pre/postnets
* add type annotation
* fix freeze_feature_encoder
* set padding tokens to 0 in decoder attention mask
* encoder attention mask downsampling
* remove features_pen calculation
* disable the padding tokens thing again
* fixup
* more fixup
* code review fixes
* rename encoder/decoder wrapper classes
* allow checkpoints to be loaded into SpeechT5Model
* put encoder into wrapper for CTC model
* clean up conversion script
* add encoder for TTS model
* add speech decoder prenet
* add speech decoder post-net
* attempt to reconstruct the generation loop
* add speech generation loop
* clean up generate_speech
* small tweaks
* fix forward pass
* enable always dropout on speech decoder prenet
* sort declaration
* rename models
* fixup
* fix copies
* more fixup
* make consistency checker happy
* add Seq2SeqSpectrogramOutput class
* doc comments
* quick note about loss and labels
* add HiFi-GAN implementation (from Speech2Speech PR)
* rename file
* add vocoder to TTS model
* improve vocoder
* working on tokenizer
* more better tokenizer
* add CTC tokenizer
* fix decode and batch_code in CTC tokenizer
* fix processor
* two processors and feature extractors
* use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2
* cleanup
* more cleanup
* even more fixup
* notebooks
* fix log-mel spectrograms
* support reduction factor
* fixup
* shift spectrograms to right to create decoder inputs
* return correct labels
* add labels for stop token prediction
* fix doc comments
* fixup
* remove SpeechT5ForPreTraining
* more fixup
* update copyright headers
* add usage examples
* add SpeechT5ProcessorForCTC
* fixup
* push unofficial checkpoints to hub
* initial version of tokenizer unit tests
* add slow test
* fix failing tests
* tests for CTC tokenizer
* finish CTC tokenizer tests
* processor tests
* initial test for feature extractors
* tests for spectrogram feature extractor
* fixup
* more fixup
* add decorators
* require speech for tests
* modeling tests
* more tests for ASR model
* fix imports
* add fake tests for the other models
* fixup
* remove jupyter notebooks
* add missing SpeechT5Model tests
* add missing tests for SpeechT5ForCTC
* add missing tests for SpeechT5ForTextToSpeech
* sort tests by name
* fix Hi-Fi GAN tests
* fixup
* add speech-to-speech model
* refactor duplicate speech generation code
* add processor for SpeechToSpeech model
* add usage example
* add tests for speech-to-speech model
* fixup
* enable gradient checkpointing for SpeechT5FeatureEncoder
* code review
* push_to_hub now takes repo_id
* improve doc comments for HiFi-GAN config
* add missing test
* add integration tests
* make number of layers in speech decoder prenet configurable
* rename variable
* rename variables
* add auto classes for TTS and S2S
* REMOVE CTC!!!
* S2S processor does not support save/load_pretrained
* fixup
* these models are now in an auto mapping
* fix doc links
* rename HiFiGAN to HifiGan, remove separate config file
* REMOVE auto classes
* there can be only one
* fixup
* replace assert
* reformat
* feature extractor can process input and target at same time
* update checkpoint names
* fix commit hash
* [FT] First commit for graphormer architecture.
The model has no tokenizer, as it uses a collator and preprocessing function for its input management.
Architecture to be tested against original one.
The arch might need to be changed to fit the checkpoint, but a revert to the original arch will make the code less nice to read.
TODO: doc
* [FIX] removed test model
* [FIX] import error
* [FIX] black and flake
* [DOC] added paper refs
* [FIX] [DOC]
* [FIX] black
* [DOC] Updated READMEs
* [FIX] Order of imports + rm Tokenizer calls
* [FIX] Moved assert in class to prevent doc build failure
* [FIX] make fix-copies
* [Doc] update from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [FIX] Removed Graphormer from Sequence classification model list
* [DOC] Added HF copyright to Cython file
* [DOC] Fixed comments
* [FIX] typos in class doc + removed config classes.
Todo: update doc from paper definitions
* [FIX] Removed dependency to fairseq, and replaced all asserts with Exception management
* [FIX] Homogeneized initialization of weights to pretrained constructor
* [FIX] [CP] Updated multi_hop parameter to get same results as in original implementation
* [DOC] Relevant parameter description in the configuration file
* [DOC] Updated doc and comments in main graphormer file
* [FIX] make style and quality checks
* [DOC] Fix doc format
* [FIX] [WIP] Updated part of the tests, though still a wip
* [FIX] [WIP]
* [FIX] repo consistency
* [FIX] Changed input names for more understandability
* [FIX] [BUG] updated num_classes params for propagation in the model
* simplified collator
* [FIX] Updated tests to follow new naming pattern
* [TESTS] Updated test suite along with model
* |FIX] rm tokenizer import
* [DOC] add link to graphormerdoc
* Changed section in doc from text model to graph model
* Apply suggestions from code review
Spacing, inits
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [DOC] Explain algos_graphormer functions
* Cython soft import protection
* Rm call to Callable in configuration graphormer
* [FIX] replaced asserts with Exceptions
* Add org to graphormer checkpoints
* Prefixed classes with Graphormer
* Management of init functions
* format
* fixes
* fix length file
* update indent
* relaunching ci
* Errors for missing cython imports
* fix style
* fix style doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Copy RoBERTa
* formatting
* implement RoBERTa with prelayer normalization
* update test expectations
* add documentation
* add convertion script for DinkyTrain weights
* update checkpoint repo
Unfortunately the original checkpoints assumes a hacked roberta model
* add to RoBERTa-PreLayerNorm docs to toc
* run utils/check_copies.py
* lint files
* remove unused import
* fix check_repo reporting wrongly a test is missing
* fix import error, caused by rebase
* run make fix-copies
* add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS
* Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup: Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add missing Flax header
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* expected_slice -> EXPECTED_SLICE
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update copies after rebase
* add missing copied from statements
* make fix-copies
* make prelayernorm explicit in code
* fix checkpoint path for the original implementation
* add flax integration tests
* improve docs
* update utils/documentation_tests.txt
* lint files
* Remove Copyright notice
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make fix-copies
* Remove EXPECTED_SLICE calculation comments
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add templates for gpt-sw3
* Add templates for gpt-sw3
* Added sentencepiece tokenizer
* intermediate commit with many changes
* fixed conflicts
* Init commit for tokenization port
* Tokenization progress
* Remove fast tokenizer
* Clean up and rename spm.model -> spiece.model
* Remove TF -> PT conversion script template, Clean up Megatron -> PT script
* Optimize encode & decode performance
* added new attention
* added new attention
* attention for gpt-sw3 working
* attention good
* Cache is now working
* fixed attention mask so that it works with causal attention
* fixed badbmm bug for cpu and caching
* updated config with correct parameters
* Refactor and leave optimizations as separate functions to avoid breaking expected functionality
* Fix special tokens mapping for both tokenizers
* cleaning up of code and comments
* HF compatible attention outputs
* Tokenizer now passing tests, add documentation
* Update documentation
* reverted back to base implementation after checking that it is identical to pretrained model
* updated gpt-sw3 config
* updated conversion script
* aligned parameters with gpt-sw3 config
* changed default scale_attn_by_inverse_layer_idx to true
* removed flag from conversion script
* added temporary model path
* reverted back to functioning convert script
* small changes to default config
* updated tests for gpt-sw3
* make style, make quality, minor cleanup
* Change local paths to testing online repository
* Change name: GptSw3 -> GPTSw3
* Remove GPTSw3TokenizerFast references
* Use official model repository and add more model sizes
* Added reference to 6.7b model
* Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel
* Remove pointers to non-existing TFGPTSw3
* Add GPTSw3 to docs/_toctree.yml
* Remove TF artifacts from GPTSw3 in __init__ files
* Update README:s with 'make fix-copies'
* Add 20b model to archive list
* Add documentation for GPT-Sw3
* Fix typo in documentation for GPT-Sw3
* Do 'make fix-copies' again after having updated docs
* Fix some typos in docs
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Resolve comments from PR feedback
* Resolve more comments from PR feedback, also set use_cache=True in convert script
* Add '# Copied from' comments for GPTSw3 modeling
* Set 'is_parallelizable = False'
* Remove '# Copied from' where code was modified and add 'with x->y' when appropriate
* Remove parallelize in mdx
* make style, make quality
* Update GPTSw3Config default values and corresponding documentation
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available
* Make style, make quality
* Add dummy object for GPTSw3Tokenizer via 'make fix-copies'
* make fix-copies
* Remove GPTSw3 modeling classes
* make style, make quality
* Add GPTSw3 auto-mappings for other GPT2 heads
* Update docs/source/en/model_doc/gpt-sw3.mdx
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove old TODO-comment
* Add example usage to GPTSw3Tokenizer docstring
* make style, make quality
* Add implementation details and example usage to gpt-sw3.mdx
Co-authored-by: JoeyOhman <joeyoh@kth.se>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft
* Fix backwards compatibility
* More fixes
* More fixes
* Make backbone more general
* Improve backbone
* Improve test
* Fix config checkpoint
* Address comments
* Use model_type
* Address more comments
* Fix special model names
* Remove MaskFormerSwinModel and MaskFormerSwinPreTrainedModel from main init
* Fix typo
* Update backbone
* Apply suggestion
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Add hidden states and attentions to backbone outputs
* Update ResNet
* Fix more tests
* Debug test
* Fix test_determinism
* Fix test_save_load
* Remove file
* Disable fx tests
* Test
* Add fx support for backbones
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Add ResNetBackbone
* Define channels and strides as property
* Remove file
* Add test for backbone
* Update BackboneOutput class
* Remove strides property
* Fix docstring
* Add backbones to SHOULD_HAVE_THEIR_OWN_PAGE
* Fix auto mapping name
* Add sanity check for out_features
* Set stage names based on depths
* Update to tuple
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* allow loading projection in text and vision model
* begin tests
* finish test for CLIPTextModelTest
* style
* add slow tests
* add new classes for projection heads
* remove with_projection
* add in init
* add in doc
* fix tests
* fix some more tests
* fix copies
* fix docs
* remove leftover from fix-copies
* add the head models in IGNORE_NON_AUTO_CONFIGURED
* fix docstr
* fix tests
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add docstr for models
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add first draft
* Update conversion script
* Improve conversion script
* Improve conversion script some more
* Add conditional embeddings
* Add initial decoder
* Fix activation function of decoder
* Make decoder outputs match original implementation
* Make decoder outputs match original implementation
* Add more copied from statements
* Improve model outputs
* Fix auto tokenizer file
* Fix more tests
* Add test
* Improve README and docs, improve conditional embeddings
* Fix more tests
* Remove print statements
* Remove initial embeddings
* Improve conversion script
* Add interpolation of position embeddings
* Finish addition of interpolation of position embeddings
* Add support for refined checkpoint
* Fix refined checkpoint
* Remove unused parameter
* Improve conversion script
* Add support for training
* Fix conversion script
* Add CLIPSegFeatureExtractor
* Fix processor
* Fix CLIPSegProcessor
* Fix conversion script
* Fix most tests
* Fix equivalence test
* Fix README
* Add model to doc tests
* Use better variable name
* Convert other checkpoint as well
* Update config, add link to paper
* Add docs
* Update organization
* Replace base_model_prefix with clip
* Fix base_model_prefix
* Fix checkpoint of config
* Fix config checkpoint
* Remove file
* Use logits for output
* Fix tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* initial commit
* First draft that gets outputs without crashing!
* Add all the ported openfold dependencies
* testing
* Restructure config files for ESMFold
* Debugging to find output discrepancies
* Mainly style
* Make model runnable without extra deps
* Remove utils and merge them to the modeling file
* Use correct gelu and remove some debug prints
* More cleanup
* Update esm docs
* Update conversion script to support ESMFold properly
* Port some top-level changes from ESMFold repo
* Expand EsmFold docstrings
* Make attention_mask optional (default to all 1s)
* Add inference test for ESMFold
* Use config and not n kwargs
* Add modeling output class
* Remove einops
* Remove chunking in ESM FFN
* Update tests for ESMFold
* Quality
* REpo consistency
* Remove tree dependency from ESMFold
* make fixup
* Add an error in case my structure map function breaks later
* Remove needless code
* Stop auto-casting the LM to float16 so CPU tests pass
* Stop auto-casting the LM to float16 so CPU tests pass
* Final test updates
* Split test file
* Copyright and quality
* Unpin PyTorch to see built doc
* Fix config file to_dict() method
* Add some docstrings to the output
* Skip TF checkpoint tests for ESM until we reupload those
* make fixup
* More docstrings
* Unpin to get even with main
* Flag example to write
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* chore: initial commit
* chore: adding util methods
yet to work on the nn.functional.interpolate port with align_corener=True
* chore: refactor the utils
* used tf.compat.v1.image.resize to align the F.interpolate function
* added type hints to the method signatures
* added references to the gists where one 2 one alignment of torch and tf has been shown
* chore: adding the layers
* chore: porting all the layers from torch to tf
This is the initial draft, nothing is tested yet.
* chore: aligning the layers with reference to tf clip
* chore: aligning the modules
* added demaraction comments
* added copied and adapted from comments
* chore: aligning with CLIP
* chore: wrangling the layers to keep it tf compatible
* chore: aligning the names of the layers for porting
* chore: style changes
* chore: adding docs and inits
* chore: adding tfp dependencis
the code is taken from TAPAS
* chore: initial commit for testing
* chore: aligning the vision embeddings with the vit implementatino
* chore: changing model prefix
* chore: fixing the name of the model and the layer normalization test case
* chore: every test passes but the slow ones
* chore: fix style and integration test
* chore: moving comments below decorators
* chore: make fixup and fix-copies changes
* chore: adding the Vision and Text Model to check_repo
* chore: modifying the prefix name to align it with the torch implementation
* chore: fix typo in configuration
* choer: changing the name of the model variable
* chore: adding segmentation flag
* chore: gante's review
* chore: style refactor
* chore: amy review
* chore: adding shape_list to parts that have been copied from other snippets
* chore: init batchnorm with torch defaults
* chore: adding shape_list to pass the tests
* test fix: adding seed as 0
* set seed
* chore: changing the straight through trick to fix -ve dimensinos
* chore: adding a dimension to the loss
* chore: adding reviewers and contributors names to the docs
* chore: added changes after review
* chore: code quality fixup
* chore: fixing the segmentation snippet
* chore: adding to the layer calls
* chore: changing int32 to int64 for inputs of serving
* chore: review changes
* chore: style changes
* chore: remove from_pt=True
* fix: repo consistency
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* First draft
* More improvements
* Improve model, add custom CUDA code
* Import torch before
* Add script that imports custom layer
* Add everything in new ops directory
* Import custom layer in modeling file
* Fix ARCHIVE_MAP typo
* Creating the custom kernel on the fly.
* Import custom layer in modeling file
* More improvements
* Fix CUDA loading
* More improvements
* Improve conversion script
* Improve conversion script
* Make it work until encoder_outputs
* Make forward pass work
* More improvements
* Make logits match original implementation
* Make implementation also support single_scale model
* Add support for single_scale and dilation checkpoint
* Add support for with_box_refine model
* Support also two stage model
* Improve tests
* Fix more tests
* Make more tests pass
* Upload all models to the hub
* Clean up some code
* Improve decoder outputs
* Rename intermediate hidden states and reference points
* Improve model outputs
* Move tests to dedicated folder
* Improve model outputs
* Fix retain_grad test
* Improve docs
* Clean up and make test_initialization pass
* Improve variable names
* Add copied from statements
* Improve docs
* Fix style
* Improve docs
* Improve docs, move tests to model folder
* Fix rebase
* Remove DetrForSegmentation from auto mapping
* Apply suggestions from code review
* Improve variable names and docstrings
* Apply some more suggestions from code review
* Apply suggestion from code review
* better docs and variables names
* hint to num_queries and two_stage confusion
* remove asserts and code refactor
* add exception if two_stage is True and with_box_refine is False
* use f-strings
* Improve docs and variable names
* Fix code quality
* Fix rebase
* Add require_torch_gpu decorator
* Add pip install ninja to CI jobs
* Apply suggestion of @sgugger
* Remove DeformableDetrForObjectDetection from auto mapping
* Remove DeformableDetrModel from auto mapping
* Add model to toctree
* Add model back to mappings, skip model in pipeline tests
* Apply @sgugger's suggestion
* Fix imports in the init
* Fix copies
* Add CPU implementation
* Comment out GPU function
* Undo previous change
* Apply more suggestions
* Remove require_torch_gpu annotator
* Fix quality
* Add logger.info
* Fix logger
* Fix variable names
* Fix initializaztion
* Add missing initialization
* Update checkpoint name
* Add model to doc tests
* Add CPU/GPU equivalence test
* Add Deformable DETR to pipeline tests
* Skip model for object detection pipeline
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* First draft
* Improve conversion script
* Make vision encoder work
* More improvements
* Improve conversion script
* Fix quality
* Add MultiframeIntegrationTransformer
* More improvements
* Make MiT output work
* Fix quality
* Add prompts generator
* Add tests
* Fix some tests
* Fix some more tests
* Fix more tests
* Improve conversion script
* Fix model outputs
* Fix more tests
* Add XClipProcessor
* Use processor in conversion script
* Fix integration test
* Update README, fix docs
* Fix all tests
* Add MIT output to XClipOutput
* Create better variable names
* Rename XClip to XCLIP
* Extend conversion script
* Add support for large models
* Add support for 16 frame models
* Add another model'
* Fix module issue
* Apply suggestions from code review
* Add figure to docs
* Fix CLIPProcessor issue
* Apply suggestions from code review
* Delete file
* Convert more checkpoints
* Convert last checkpoint
* Update nielsr to microsoft
* add: segformer utils and img. classification.
* add: segmentation layer.
* feat: working implementation of segformer.
* chore: remove unused variable.
* add test, remaining modifications.
* remove: unnecessary files.
* add: rest of the files.
Co-authored-by: matt <rocketknight1@gmail.com>
* chore: remove ModuleList comment.
* chore: apply make style.
* chore: apply make fixup-copies.
* add to check_repo.py
* add decode head to IGNORE_NON_TESTED
* chore: run make style.
* chore: PR comments.
* chore: minor changes to model doc.
* tests: reduction across samples.
* add a note on the space.
* sort importats.
* fix: reduction in loss computation.
* chore: align loss function with that of NER.
* chore: correct utils/documentation_tests.txt
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* chore: simplify the interpolation of logits in loss computation.
* chore: return transposed logits when return_dict=False.
* chore: add link to the tf fine-tuning repo.
* address pr comments.
* address niels's comments.
* remove from_pt=True since tf weights are in.
* remove comment from pt model.
* address niels's comments.
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Initial commit
* Make some fixes
* Make PT model full forward pass
* Drop TF & Flax implementation, fix copies etc
* Add Flax model and update some corresponding stuff
* Drop some TF things
* Update config and flax local attn
* Add encoder_attention_type to config
* .
* Update docs
* Do some cleansing
* Fix some issues -> make style; add some docs
* Fix position_bias + mask addition + Update tests
* Fix repo consistency
* Fix model consistency by removing flax operation over attn_mask
* [WIP] Add PT TGlobal LongT5
* .
* [WIP] Add flax tglobal model
* [WIP] Update flax model to use the right attention type in the encoder
* Fix flax tglobal model forward pass
* Make the use of global_relative_attention_bias
* Add test suites for TGlobal model
* Fix minor bugs, clean code
* Fix pt-flax equivalence though not convinced with correctness
* Fix LocalAttn implementation to match the original impl. + update READMEs
* Few updates
* Update: [Flax] improve large model init and loading #16148
* Add ckpt conversion script accoring to #16853 + handle torch device placement
* Minor updates to conversion script.
* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
* gpu support + dtype fix
* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies
* Remove caching logic for local & tglobal attention
* Apply another batch of suggestions from code review
* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx
* Fix converting script + revert config file change
* Revert "Remove caching logic for local & tglobal attention"
This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
* Stash caching logic in Flax model
* Make side relative bias used always
* Drop caching logic in PT model
* Return side bias as it was
* Drop all remaining model parallel logic
* Remove clamp statements
* Move test files to the proper place
* Update docs with new version of hf-doc-builder
* Fix test imports
* Make some minor improvements
* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray
* Fix TGlobal for ONNX conversion + update docs
* fix _make_global_fixed_block_ids and masked neg value
* update flax model
* style and quality
* fix imports
* remove load_tf_weights_in_longt5 from init and fix copies
* add slow test for TGlobal model
* typo fix
* Drop obsolete is_parallelizable and one warning
* Update __init__ files to fix repo-consistency
* fix pipeline test
* Fix some device placements
* [wip]: Update tests -- need to generate summaries to update expected_summary
* Fix quality
* Update LongT5 model card
* Update (slow) summarization tests
* make style
* rename checkpoitns
* finish
* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
* added cbs to notebooks, made copy-paste error fix in generation_utils
* initial push for mctc model
* mctc feature extractor done
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* passing attention, now struggling to figure out how attention masks make sense here
* works when excluding attention masks. ask later how one would integrate attention maskshere
* bizarre configuration error (model prefix comes first in config dict json and messes up the order)
* all passing but bizzarre config dict ordering issue when to_dict
* passing all major tests
* feature extraction, processor, tokenizer added & tests passing
* style & consistency & other logistical fixes
* copy paste fix
* model after feature extraction working
* commiting final feature extraction results; need to fix normalization
* feature extraction passing tests; probably should add tests on the specific flashlight-copied functions?
* delete print ; format code a bit
* fixing tests
* passing major tests
* fixing styles
* completed tokenization test with real example; not sure if these values are entirely correct.
* last test fixes from local
* reverting accidentally included custom setup configs
* remove load tf weights; fix config error
* testing couldnt import featureextractor
* fix docs
* fix docs
* resolving comments
* style fixes
* style fixes
* Update to MCTCConv1dSubSampler
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* relposemb fixes
* conv1d name issue; expecting config fail with paraentheses
* fix config issue
* fix config issue
* fix config issue
* change everything to MCTCT
* fixing naming change errors
* archive list
* copyrights and docs
* copyrights and docs
* copyrights and docs
* merge resolution
* move tests, fix to changed optionaldependency structure
* test directories changed
* fixing tests
* how to avoid tf tests?
* how to avoid tf tests?
* tests passing locally
* allow mctctprocessor imported any env
* allow mctctprocessor imported any env
* fixed second round of feedback, need to fix docs
* doc changes not being applied
* all fixed
* style fix
* feedback fixes
* fix copies and feature extraction style fix
* Update tests/models/visual_bert/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* copy paste huggingface:main visual bert
* added eof newline to visual bert; all tests are passing otherwise
* fix slow tests by adding attention mask
* change model id to speechbrain
* make fix-copies
* fix readme unwanted deletes
* fixing readmes, make fix-copies
* consistent M-CTC-T naming
* Update src/transformers/models/mctct/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* all fixed but variable naming
* adjust double quotes
* fixed variable names
* copyright and mr quilter
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct slow tests
* make fix-copies
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* m-ctc-t not mctct
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First version - OPT model
* Final changes
- putting use cache to False
* few changes
- remove commented block
* few changes
- remove unecessary files
* fix style issues
* few changes
- remove a test file
- added the logits test
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add gen tests
* few changes
- rm mask filling example on docstring
* few changes
- remove useless args
* some changes
- more tests should pass now
- needs to clean more
- documentation still needs to be done
* fix code quality
* major changes
- change attention architecture to BART-like
- modify some tests
- style fix
* rm useless classes
- remove opt for:
- QA
- cond generation
- seq classif
* Removed autodoc calls to non-existant classes
TOkenizers are not implemented
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/modeling_tf_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Replaced OPTTokeniser with GPT2 tokenizer
* added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
* Removed OPTTokenizer
* make style
* Make style replaces
``` ...).unsqueeze(```
by
``` >>>).unsqueeze(```
* make repo consistency
* Removed PretrainedOPTModel
* fix opt.mdx removed other heads
* fix init, removed 3 heads
* removed heads
* finished cleaning head
* removed seauence classif and question answering
* removed unused imports
* removed useless dummy object for QA, SC and CG
* removed tests for removed useless dummy object for QA, SC and CG
* Removed head_mask using encoder layers which don't exist
* fixed test
* fix line
* added OPT to toctree
* Updated model path with pushed weigths
* fix model path
* fixed code quality
* fixed embeddings and generation tests
* update paths
* clean comments
* removed OPTClassificationHead for sentence classification
* renamed hidden layer
* renamed num layers to standard num_hidden_layers
* num_attention_heads fix
* changes for 125m
* add first version for 125m
* add first version - flax
* add new version
* causal LM output
* replace output type with BaseModelOutputWithPastAndCrossAttentions
* revert working config from 150m to 350m
* clean
* removed decoder input ids
* fixed embed dim
* more embed_dim issues
* make style + removed enc_dec test
* update falx model
* removed troublesome copy
* added is_encoder_decoder=False to config
* added set_input emb fuinction to model class
* requires torch on embed test
* use head mask instead of decoder head mask input param solves a test
* 8 test remaining, update
* Updated create_and_check_decoder_model_past_large_inputs
* Make style
* update op tokenizer with condition
* make style
* See if I can push
* some clean up
* remove linear head hack
* save intermediate
* save correct attention
* add copied from from bart
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix part of the reviewss
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* same changes in naming / conversion
* correct mask
* more fixes
* delete FlaxOPT and TfOPT
* clean traces of Flax and Tf
* fix mask
* fixed positionnal embedding length when past key value is provoded
* get 125m, 6.7b to work
* Added do_layer_norm
* solved mismatch in load dictionnary
* clean up preapre opt input dict
* fixed past key value as bool
* fix previus
* fixed return dict False tuple issue
* All tests are passing
* Make style
* Ignore OPTDecoder non tested
* make fix-copies
* make repo consistency
* small fix
* removed uselss @torch.no_grad decorator
* make styl;e
* fix previous opt test
* style
* make style
* added opt documentation
* update OPT_PRETRAINED_MODEL_ARCHIVE_LIST
* up
* more fixes
* model & config work
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added comment on padding hack (+2)
* cleaup
* review update
* docstring for missing arg
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update pretrained map
* update path and tests
* make style
* styling
* make consistency
* add gpt2 tok new
* more tok fixes
* Update src/transformers/models/auto/tokenization_auto.py
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/models/opt/test_modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update based on reviews
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* make style
* make tokenizer auto tests pass
* apply Lysandre suggestion
* finish tests
* add some good tokenizer tests
* improve docs slighly
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* [WIP] Add FLAVA model
This PR aims to add [FLAVA](ihttps://arxiv.org/abs/2112.04482) model to the transformers repo.
Following checklist delineates the list of things to be done for this PR
to be complete:
[x] Flava init
[x] Flava base models
[x] Flava layers
[x] Flava Configs
[x] Flava encoders
[x] Flava pretraining models
[ ] Flava classification/retrieval models (To be added in a separate PR)
[x] Documentation updates
[x] Imports updates
[x] Argstring updates
[x] Flava pretrained checkpoints
[x] Flava tests
[x] Flava processors
[x] Sanity check
[x] Lint
* Created the Decision Transformer Modle
* updating tests, copy to other machine
* Added last hidden size to Decision Transformer modelling outputs
* Removed copy of original DT file
* made a temporary change to gpt2 to have it conform with the Decision Transformer version
* Updated tests
* Ignoring a file used to test the DT model
* added comments to config file
* added comments and argument descriptions to decision transformer file
* Updated doc
* Ran "make style"
* Remove old model imports
* Removed unused imports, cleaned up init file
* Update docs/source/model_doc/decision_transformer.mdx
added my username
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Reverted changes made to gpt2
* Removed datasets submodule
* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states
* Added support for return of hidden states, attentions and return dict of gpt2 model.
* Updated tests to include many of the ModelTesterMixin tests.
The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes
* Added missing line to the end of gpt2 file
* Added an integration test for the Decision Transformer
Test performs and autoregressive evaluation for two time steps
* Set done and info to _ to fix failing test
* Updated integration test to be deterministic and check expected outputs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unnecessary config options
* Cleaned up commented code and old comments.
* Cleaned up commented code.
* Changed DecisionTransformer to Decision Transformer
* Added Decision Transformer to the main README file
* Added copy of GTP2 called DecisionTranformerGPT2Model
* isorted imports
* isorted imports
* Added model to non-English README files
* Ran make fix-copies and corrected some cases.
* Updated index file to include Decision Transformer
* Added gpt2 model as copy inside the Decision Transformer model file
* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS
* Deleted redundant checkpoint files (I don't know how these got committed)
* Removed testing files. (These should have never been committed)
* Removed accidentally committed files
* Moved the Decision Transformer test to its own directory
* Add type hints for Pegasus (#16324)
* Funnel type hints (#16323)
* add pt funnel type hints
* add tf funnel type hints
* Add type hints for ProphetNet PyTorch (#16272)
* [GLPN] Improve docs (#16331)
* Add link to notebook
* Add link
* Fix bug
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Added type hints for Pytorch Marian calls (#16200)
* Added type hinting for forward functions in pytorch marian
* typo correction
* Removed type hints on functions from BART per Suraj Patil request
* fix import pb
* fix typo
* corrected tuple call
* ran black
* after fix-copies
Some optional tags on primitives were removed, past_key_values in MarianForCausalLM changed from Tuple of Tuple to List
* Fixing copies to roformer and pegasus
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
* Moved DecisionTransformOutput to modeling_decision_transformer
* Moved the example usage to research project and cleaned comments
* Made tests ignore the copy of gpt2 in Decision Transformer
* Added module output to modelling decision transformer
* removed copied gpt2 model from list of transformers models
* Updated tests and created __init__ file for new test location
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unneeded summary type from config file
* Fixed copies
* Updated pretrained config map to refer to hopper-medium checkpoint
* done (#16340)
* Added Decision transformer to model docs
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add type annotations for Rembert/Splinter and copies (#16338)
* undo black autoformat
* minor fix to rembert forward with default
* make fix-copies, make quality
* Adding types to template model
* Removing List from the template types
* Remove `Optional` from a couple of types that don't accept `None`
Co-authored-by: matt <rocketknight1@gmail.com>
* [Bug template] Shift responsibilities for long-range (#16344)
* Fix code repetition in serialization guide (#16346)
* Adopt framework-specific blocks for content (#16342)
* ✨ refactor code samples with framework-specific blocks
* ✨ update training.mdx
* 🖍 apply feedback
* Updates the default branch from master to main (#16326)
* Updates the default branch from master to main
* Links from `master` to `main`
* Typo
* Update examples/flax/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updated model with custom docstring example
* Created the Decision Transformer Modle
* updating tests, copy to other machine
* Added last hidden size to Decision Transformer modelling outputs
* Removed copy of original DT file
* made a temporary change to gpt2 to have it conform with the Decision Transformer version
* Updated tests
* Ignoring a file used to test the DT model
* added comments to config file
* added comments and argument descriptions to decision transformer file
* Updated doc
* Ran "make style"
* Remove old model imports
* Removed unused imports, cleaned up init file
* Update docs/source/model_doc/decision_transformer.mdx
added my username
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Reverted changes made to gpt2
* Removed datasets submodule
* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states
* Added support for return of hidden states, attentions and return dict of gpt2 model.
* Updated tests to include many of the ModelTesterMixin tests.
The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes
* Added missing line to the end of gpt2 file
* Added an integration test for the Decision Transformer
Test performs and autoregressive evaluation for two time steps
* Set done and info to _ to fix failing test
* Updated integration test to be deterministic and check expected outputs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unnecessary config options
* Cleaned up commented code and old comments.
* Cleaned up commented code.
* Changed DecisionTransformer to Decision Transformer
* Added Decision Transformer to the main README file
* Added copy of GTP2 called DecisionTranformerGPT2Model
* isorted imports
* isorted imports
* Added model to non-English README files
* Ran make fix-copies and corrected some cases.
* Updated index file to include Decision Transformer
* Added gpt2 model as copy inside the Decision Transformer model file
* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS
* Deleted redundant checkpoint files (I don't know how these got committed)
* Removed testing files. (These should have never been committed)
* Removed accidentally committed files
* Moved the Decision Transformer test to its own directory
* Moved DecisionTransformOutput to modeling_decision_transformer
* Moved the example usage to research project and cleaned comments
* Made tests ignore the copy of gpt2 in Decision Transformer
* Added module output to modelling decision transformer
* removed copied gpt2 model from list of transformers models
* Updated tests and created __init__ file for new test location
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unneeded summary type from config file
* Fixed copies
* Updated pretrained config map to refer to hopper-medium checkpoint
* Added Decision transformer to model docs
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updated model with custom docstring example
* Updated copies, config auto, and readme files.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Tegzes <48134725+Tegzes@users.noreply.github.com>
Co-authored-by: Adam Montgomerie <adam@avanssion.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: Jacob Dineen <54680234+jacobdineen@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Split file_utils in several submodules
* Fixes
* Add back more objects
* More fixes
* Who exactly decided to import that from there?
* Second suggestion to code with code review
* Revert wront move
* Fix imports
* Adapt all imports
* Adapt all imports everywhere
* Revert this import, will fix in a separate commit
* maskformer
* conflicts
* conflicts
* minor fixes
* feature extractor test fix
refactor MaskFormerLoss following conversation
MaskFormer related types should not trigger a module time import error
missed one
removed all the types that are not used
update config mapping
minor updates in the doc
resolved conversation that doesn't need a discussion
minor changes
resolved conversations
fixed DetrDecoder
* minor changes
minor changes
fixed mdx file
test feature_extractor return types
functional losses -> classes
removed the return type test for the feature extractor
minor changes + style + quality
* conflicts?
* rebase master
* readme
* added missing files
* deleded poolformers test that where in the wrong palce
* CI
* minor changes
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* resolved conversations
* minor changes
* conversations
[Unispeech] Fix slow tests (#15818)
* remove soundfile old way of loading audio
* Adapt slow test
[Barthez Tokenizer] Fix saving (#15815)
[TFXLNet] Correct tf xlnet generate (#15822)
* [TFXLNet] Correct tf xlnet
* adapt test comment
Fix the push run (#15807)
Fix semantic segmentation pipeline test (#15826)
Fix dummy_inputs() to dummy_inputs in symbolic_trace doc (#15776)
Add model specific output classes to PoolFormer model docs (#15746)
* Added model specific output classes to poolformer docs
* Fixed Segformer typo in Poolformer docs
Adding the option to return_timestamps on pure CTC ASR models. (#15792)
* Adding the option to return_timestamps on pure CTC ASR models.
* Remove `math.prod` which was introduced in Python 3.8
* int are not floats.
* Reworking the PR to support "char" vs "word" output.
* Fixup!
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
HFTracer.trace should use/return self.graph to be compatible with torch.fx.Tracer (#15824)
Fix tf.concatenate + test past_key_values for TF models (#15774)
* fix wrong method name tf.concatenate
* add tests related to causal LM / decoder
* make style and quality
* clean-up
* Fix TFBertModel's extended_attention_mask when past_key_values is provided
* Fix tests
* fix copies
* More tf.int8 -> tf.int32 in TF test template
* clean-up
* Update TF test template
* revert the previous commit + update the TF test template
* Fix TF template extended_attention_mask when past_key_values is provided
* Fix some styles manually
* clean-up
* Fix ValueError: too many values to unpack in the test
* Fix more: too many values to unpack in the test
* Add a comment for extended_attention_mask when there is past_key_values
* Fix TFElectra extended_attention_mask when past_key_values is provided
* Add tests to other TF models
* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder
* Fix not passing training arg to lm_head in TFRobertaForCausalLM
* Fix tests (with past) for TF Roberta
* add testing for pask_key_values for TFElectra model
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
[examples/summarization and translation] fix readme (#15833)
Add ONNX Runtime quantization for text classification notebook (#15817)
Re-enable doctests for the quicktour (#15828)
* Re-enable doctests for the quicktour
* Re-enable doctests for task_summary (#15830)
* Remove &
Framework split model report (#15825)
Add TFConvNextModel (#15750)
* feat: initial implementation of convnext in tensorflow.
* fix: sample code for the classification model.
* chore: added checked for from the classification model.
* chore: set bias initializer in the classification head.
* chore: updated license terms.
* chore: removed ununsed imports
* feat: enabled argument during using drop_path.
* chore: replaced tf.identity with layers.Activation(linear).
* chore: edited default checkpoint.
* fix: minor bugs in the initializations.
* partial-fix: tf model errors for loading pretrained pt weights.
* partial-fix: call method updated
* partial-fix: cross loading of weights (4x3 variables to be matched)
* chore: removed unneeded comment.
* removed playground.py
* rebasing
* rebasing and removing playground.py.
* fix: renaming TFConvNextStage conv and layer norm layers
* chore: added initializers and other minor additions.
* chore: added initializers and other minor additions.
* add: tests for convnext.
* fix: integration tester class.
* fix: issues mentioned in pr feedback (round 1).
* fix: how output_hidden_states arg is propoagated inside the network.
* feat: handling of arg for pure cnn models.
* chore: added a note on equal contribution in model docs.
* rebasing
* rebasing and removing playground.py.
* feat: encapsulation for the convnext trunk.
* Fix variable naming; Test-related corrections; Run make fixup
* chore: added Joao as a contributor to convnext.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: corrected copyright year and added comment on NHWC.
* chore: fixed the black version and ran formatting.
* chore: ran make style.
* chore: removed from_pt argument from test, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* fix: tests in the convnext subclass, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: moved convnext test to the correct location
* fix: locations for the test file of convnext.
* fix: convnext tests.
* chore: applied sgugger's suggestion for dealing w/ output_attentions.
* chore: added comments.
* chore: applied updated quality enviornment style.
* chore: applied formatting with quality enviornment.
* chore: revert to the previous tests/test_modeling_common.py.
* chore: revert to the original test_modeling_common.py
* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py
* fix: tests for convnext.
* chore: removed output_attentions argument from convnext config.
* chore: revert to the earlier tf utils.
* fix: output shapes of the hidden states
* chore: removed unnecessary comment
* chore: reverting to the right test_modeling_tf_common.py.
* Styling nits
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* minor changes
* doc fix in feature extractor
* doc
* typose
* removed detr logic from config
* removed detr logic from config
* removed num_labels
* small fix in the config
* auxilary -> auxiliary
* make style
* some test is failing
* fix a weird char in config prevending doc-builder
* retry to fix the doc-builder issue
* make style
* new try to fix the doc builder
* CI
* change weights to facebook
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Add data2vec model cloned from roberta
* Add checkpoint conversion script
* Fix copies
* Update docs
* Add checkpoint conversion script
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update documentation
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* add inputs to logits to data2vec'
* correct autio models
* correct config auto
* correct tok auto
* Update utils/tests_fetcher.py
* delete unnecessary files
* delete unnecessary files
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix copies
* Update docs
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update documentation
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* add inputs to logits to data2vec'
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* correct autio models
* correct config auto
* correct tok auto
* delete unnecessary files
* delete unnecessary files
* Update utils/tests_fetcher.py
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Move data2vec tests to new structure
* Fix test imports for text tests
* Remove fairseq files
* Change paper link to arxiv
* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.
* Update text model checkpoint to be facebook/data2vec-text-base
* Add 'Copy from' statements and update paper links and docs
* fix copy from statements
* improve copied from
* correct more copied from statements
* finish copied from stuff
* make style
* add model to README
* add to master
Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* rebase
* Delete shift tokens func
* downsample decoder input seq len for init
* correct attention mask
* add tests
* pt flax cross test
* make fixup
* init file for import
* change pt-flax cross test threshold
* pt-flax test logits only
* move tests
* make repo-consistency
* consistent indentation
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* First commit
* Add conversion script
* Make conversion script work for base model
* More improvements
* Update conversion script, works for vqa
* Add indexing argument to meshgrid
* Make conversion script work for ViltForPreTraining
* Add ViltForPreTraining to docs
* Fix device issue
* Add processor
* Add MinMaxResize to feature extractor
* Implement call method of ViltProcessor
* Fix tests
* Add integration test
* Add loss calculation for VQA
* Improve tests
* Improve some more tests
* Debug tests
* Small improvements
* Add support for attention_mask
* Remove mask_it
* Add pixel_mask
* Add tests for ViltFeatureExtractor
* Improve tests
* Add ViltForNaturalLanguageVisualReasoning
* Add ViltForNaturalLanguageVisualReasoning to conversion script
* Minor fixes
* Add support for image_embeds, update docstrings to markdown
* Update docs to markdown
* Improve conversion script
* Rename ViltForPreTraining to ViltForMaskedLM
* Improve conversion script
* Convert docstrings to markdown
* Fix code example of retrieval model
* Properly convert masked language model
* Add integration test for nlvr
* Fix code quality
* Apply suggestions from code review
* Add copied from statements
* Fix pretrained_config_archive_map
* Fix docs
* Add model to README
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply more suggestions from code review
* Make code more readable
* Add ViltForNaturalLanguageVisualReasoning to the tests
* Rename ViltForVisualQuestionAnswering to ViltForQuestionAnswering
* Replace pixel_values_2 by single tensor
* Add hidden_states and attentions
* Fix one more test
* Fix all tests
* Update year
* Fix rebase issues
* Fix another rebase issue
* Remove ViltForPreTraining from auto mapping
* Rename ViltForImageRetrievalTextRetrieval to ViltForImageAndTextRetrieval
* Make it possible to use BertTokenizerFast in the processor
* Use BertTokenizerFast by default
* Rename ViltForNaturalLanguageVisualReasoning, define custom model output
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Start the work on TFVisionEncoderDecoderModel
* Expose TFVisionEncoderDecoderModel
* fix import
* Add modeling_tf_vision_encoder_decoder to _ignore_modules in get_model_modules()
* reorder
* Apply the fix for checkpoint loading as in #14016
* remove attention_mask + fix VISION_DUMMY_INPUTS
* A minimal change to make TF generate() work for vision models as encoder in encoder-decoder setting
* fix wrong condition: shape_list(input_ids) == 2
* add tests
* use personal TFViTModel checkpoint (for now)
* Add equivalence tests + projection layer
* style
* make sure projection layer can run
* Add examples
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean comments (need to work on TODOs for PyTorch models)
* Remove TF -> PT in check_pt_tf_equivalence for TFVisionEncoderDecoderModel
* fixes
* Revert changes in PT code.
* Update tests/test_modeling_tf_vision_encoder_decoder.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add test_inference_coco_en for TF test
* fix quality
* fix name
* build doc
* add main_input_name
* Fix ckpt name in test
* fix diff between master and this PR
* fix doc
* fix style and quality
* fix missing doc
* fix labels handling
* Delete auto.rst
* Add the changes done in #14016
* fix prefix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Convert docstrings of all configurations and tokenizers
* Processors and fixes
* Last modeling files and fixes to models
* Pipeline modules
* Utils files
* Data submodule
* All the other files
* Style
* Missing examples
* Style again
* Fix copies
* Say bye bye to rst docstrings forever
* First draft
* Style and remove mlm
* Make forward pass work
* More improvements
* More improvements
* Fix bug
* More improvements
* More improvements
* Add PerceiverTokenizer first draft
* Improve conversion script
* More improvements
* Make conversion script work for the encoder
* Make conversion script work with local pickle files
* Style & quality, fix-copies
* Add dummy input to conversion script
* Add absolute position embeddings to TextPreProcessor
* Make forward pass of encoder work
* More improvements
* Move text preprocessor to separate script
* More improvements
* More improvements
* Add post processor
* Make MLM model work
* Style
* Add PerceiverForMaskedLM
* Add PerceiverImagePreprocessor
* Make style
* Make PerceiverForImageClassification work
* More improvements
* More improvements
* Use tokenizer in conversion script
* Use PerceiverForMaskedLM in conversion script
* Define custom PerceiverModelOutput
* Improve PerceiverAttention to make it work for both MLM and image classification
* More improvements
* More improvements
* More improvements to the conversion script
* Make conversion script work for both MLM and image classification
* Add PerceiverFeatureExtractor
* More improvements
* Style and quality
* Add center cropping
* Fix bug
* Small fix
* Add print statement
* Fix bug in image preprocessor
* Fix bug with conversion script
* Make output position embeddings an nn.Parameter layer instead of nn.Embedding
* Comment out print statements
* Add position encoding classes
* More improvements
* Use position_encoding_kwargs
* Add PerceiverForImageClassificationFourier
* Make style & quality
* Add PerceiverForImageClassificationConvProcessing
* Style & quality
* Add flow model
* Move processors to modeling file
* Make position encodings modular
* Make basic decoder use modular position encodings
* Add PerceiverForOpticalFlow to conversion script
* Add AudioPreprocessor
* Make it possible for the basic decoder to use Fourier position embeddings
* Add PerceiverForMultimodalAutoencoding
* Improve model for optical flow
* Improve _build_network_inputs method
* Add print statement
* Fix device issue
* Fix device of Fourier embeddings
* Add print statements for debugging
* Add another print statement
* Add another print statement
* Add another print statement
* Add another print statement
* Improve PerceiverAudioPreprocessor
* Improve conversion script for multimodal modal
* More improvements
* More improvements
* Improve multimodal model
* Make forward pass multimodal model work
* More improvements
* Improve tests
* Fix some more tests
* Add output dataclasses
* Make more tests pass
* Add print statements for debuggin
* Add tests for image classification
* Add PerceiverClassifierOutput
* More improvements
* Make more tests pass for the optical flow model
* Make style & quality
* Small improvements
* Don't support training for optical flow model for now
* Fix _prepare_for_class for tests
* Make more tests pass, add some docs
* Add multimodal model to tests
* Minor fixes
* Fix tests
* Improve conversion script
* Make fixup
* Remove pos_dim argument
* Fix device issue
* Potential fix for OOM
* Revert previous commit
* Fix test_initialization
* Add print statements for debugging
* Fix print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Remove need for output_shape
* Comment out output_shape
* Remove unnecessary code
* Improve docs
* Fix make fixup
* Remove PerceiverTextProcessor from init
* Improve docs
* Small improvement
* Apply first batch of suggestions from code review
* Apply more suggestions from code review
* Update docstrings
* Define dicts beforehand for readability
* Rename task to architecture in conversion script, include PerceiverModel in tests
* Add print statements for debugging
* Fix tests on GPU
* Remove preprocessors, postprocessors and decoders from main init
* Add integration test
* Fix docs
* Replace einops by torch
* Update for new docs frontend
* Rename PerceiverForImageClassification
* Improve docs
* Improve docs
* Improve docs of PerceiverModel
* Fix some more tests
* Improve center_crop
* Add PerceiverForSequenceClassification
* Small improvements
* Fix tests
* Add integration test for optical flow model
* Clean up
* Add tests for tokenizer
* Fix tokenizer by adding special tokens properly
* Fix CI
* implement MLukeTokenizer and LukeForMaskedLM
* update tests
* update docs
* add LukeForMaskedLM to check_repo.py
* update README
* fix test and specify the entity pad id in tokenization_(m)luke
* fix EntityPredictionHeadTransform
* Add first draft
* Make forward pass work
* Improve conversion script
* Add notebook that checks if it works
* Add BeitForSemanticSegmentation to the tests
* More improvements
* Make BeitForSemanticSegmentation consistent with Segformer
* Small bug fix
* Add BeitForSemanticSegmentation to docs
* Make sure model doesn't output hidden states when the user doesn't want to
* Make it possible to convert the large model
* Fix issue
* Fix conversion script for large model
* Add auxiliary_head option to semantic segmentation model
* Apply suggestions from @sgugger's review
* Apply suggestions from code review
* Fix failing test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* First draft
* Make style & quality
* Improve conversion script
* Add print statement to see actual slice
* Make absolute tolerance smaller
* Fix image classification models
* Add post_process_semantic method
* Disable padding
* Improve conversion script
* Rename to ForSemanticSegmentation, add integration test, remove post_process methods
* Improve docs
* Fix code quality
* Fix feature extractor tests
* Fix tests for image classification model
* Delete file
* Add is_torch_available to feature extractor
* Improve documentation of feature extractor methods
* Apply suggestions from @sgugger's code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply some more suggestions of code review
* Rebase with master
* Fix rebase issues
* Make sure model only outputs hidden states when the user wants to
* Apply suggestions from code review
* Add pad method
* Support padding of 2d images
* Add print statement
* Add print statement
* Move padding method to SegformerFeatureExtractor
* Fix issue
* Add casting of segmentation maps
* Add test for padding
* Add small note about padding
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add cross attentions to TFGPT2Model
* Add TFEncoderDecoderModel
* Add TFBaseModelOutputWithPoolingAndCrossAttentions
* Add cross attentions to TFBertModel
* Fix past or past_key_values argument issue
* Fix generation
* Fix save and load
* Add some checks and comments
* Clean the code that deals with past keys/values
* Add kwargs to processing_inputs
* Add serving_output to TFEncoderDecoderModel
* Some cleaning + fix use_cache value issue
* Fix tests + add bert2bert/bert2gpt2 tests
* Fix more tests
* Ignore crossattention.bias when loading GPT2 weights into TFGPT2
* Fix return_dict_in_generate in tf generation
* Fix is_token_logit_eos_token bug in tf generation
* Finalize the tests after fixing some bugs
* Fix another is_token_logit_eos_token bug in tf generation
* Add/Update docs
* Add TFBertEncoderDecoderModelTest
* Clean test script
* Add TFEncoderDecoderModel to the library
* Add cross attentions to TFRobertaModel
* Add TFRobertaEncoderDecoderModelTest
* make style
* Change the way of position_ids computation
* bug fix
* Fix copies in tf_albert
* Remove some copied from and apply some fix-copies
* Remove some copied
* Add cross attentions to some other TF models
* Remove encoder_hidden_states from TFLayoutLMModel.call for now
* Make style
* Fix TFRemBertForCausalLM
* Revert the change to longformer + Remove copies
* Revert the change to albert and convbert + Remove copies
* make quality
* make style
* Add TFRembertEncoderDecoderModelTest
* make quality and fix-copies
* test TFRobertaForCausalLM
* Fixes for failed tests
* Fixes for failed tests
* fix more tests
* Fixes for failed tests
* Fix Auto mapping order
* Fix TFRemBertEncoder return value
* fix tf_rembert
* Check copies are OK
* Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined
* Add TFEncoderDecoderModelSaveLoadTests
* fix tf weight loading
* check the change of use_cache
* Revert the change
* Add missing test_for_causal_lm for TFRobertaModelTest
* Try cleaning past
* fix _reorder_cache
* Revert some files to original versions
* Keep as many copies as possible
* Apply suggested changes - Use raise ValueError instead of assert
* Move import to top
* Fix wrong require_torch
* Replace more assert by raise ValueError
* Add test_pt_tf_model_equivalence (the test won't pass for now)
* add test for loading/saving
* finish
* finish
* Remove test_pt_tf_model_equivalence
* Update tf modeling template
* Remove pooling, added in the prev. commit, from MainLayer
* Update tf modeling test template
* Move inputs["use_cache"] = False to modeling_tf_utils.py
* Fix torch.Tensor in the comment
* fix use_cache
* Fix missing use_cache in ElectraConfig
* Add a note to from_pretrained
* Fix style
* Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt
* Fix TFMLP (in TFGPT2) activation issue
* Fix None past_key_values value in serving_output
* Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub
* Apply review suggestions - style for cross_attns in serving_output
* Apply review suggestions - change assert + docstrings
* break the error message to respect the char limit
* deprecate the argument past
* fix docstring style
* Update the encoder-decoder rst file
* fix Unknown interpreted text role "method"
* fix typo
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* beit-flax
* updated FLAX_BEIT_MLM_DOCSTRING
* removed bool_masked_pos from classification
* updated Copyright
* code refactoring: x -> embeddings
* updated test: rm from_pt
* Update docs/source/model_doc/beit.rst
* model code dtype updates and
other changes according to review
* relative_position_bias
revert back to pytorch design
* Add the audio classification pipeline
* Remove autoconfig exception
* Mark ffmpeg test as slow
* Rearrange pipeline tests
* Add small test
* Replace asserts with ValueError
* Add hubert classifier + tests
* Add hubert classifier + tests
* Dummies for all classification tests
* Wav2Vec2 classifier + ER test
* Fix hubert integration tests
* Add hubert IC
* Pass tests for all classification tasks on Hubert
* Pass all tests + copies
* Move models to the SUPERB org
* make flax gpt2 working with cross attention
* Remove encoder->decoder projection layer
* A draft (incomplete) for FlaxEncoderDecoderModel
* Add the method from_encoder_decoder_pretrained + the docstrings
* Fix the mistakes of using EncoderDecoderModel
* Fix style
* Add FlaxEncoderDecoderModel to the library
* Fix cyclic imports
* Add FlaxEncoderDecoderModel to modeling_flax_auto.py
* Remove question comments
* add tests for FlaxEncoderDecoderModel
* add flax_encoder_decoder to the lists of ignored entries in check_repo.py
* fix missing required positional arguments
* Remove **kwargs when creating FlaxEncoderDecoderModel in from_encoder_decoder_pretrained()
Also fix generation eos/pad tokens issue
* Fix: Use sequences from the generated_output
* Change a check from assert to raise ValueError
* Fix examples and token ids issues
* Fix missing all_cross_attentions when outputting tuple in modeling_gpt2
* Remove the changes in configuration docstrings.
* allow for bert 2 gpt2
* make fix-copies
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Change remaining examples to bert2gpt2
* Change the test to Bert2GPT2
* Fix examples
* Fix import
* Fix unpack bug
* Rename to FlaxEncoderDecoderModelTest and change the test to bert2gpt2
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix: NotImplentedError -> NotImplementedError
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* up
* finalize
Co-authored-by: ydshieh <ydshieh@user.noreply>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Initial work
* All auto models
* All tf auto models
* All flax auto models
* Tokenizers
* Add feature extractors
* Fix typos
* Fix other typo
* Use the right config
* Remove old mapping names and update logic in AutoTokenizer
* Update check_table
* Fix copies and check_repo script
* Fix last test
* Add back name
* clean up
* Update template
* Update template
* Forgot a )
* Use alternative to fixup
* Fix TF model template
* Address review comments
* Address review comments
* Style
* First pass
* Make conversion script work
* Improve conversion script
* Fix bug, conversion script working
* Improve conversion script, implement BEiTFeatureExtractor
* Make conversion script work based on URL
* Improve conversion script
* Add tests, add documentation
* Fix bug in conversion script
* Fix another bug
* Add support for converting masked image modeling model
* Add support for converting masked image modeling
* Fix bug
* Add print statement for debugging
* Fix another bug
* Make conversion script finally work for masked image modeling models
* Move id2label for datasets to JSON files on the hub
* Make sure id's are read in as integers
* Add integration tests
* Make style & quality
* Fix test, add BEiT to README
* Apply suggestions from @sgugger's review
* Apply suggestions from code review
* Make quality
* Replace nielsr by microsoft in tests, add docs
* Rename BEiT to Beit
* Minor fix
* Fix docs of BeitForMaskedImageModeling
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Make quality scripts work when one backend is missing.
* Check env variable is properly set
* Add default
* With print statements
* Fix typo
* Set env variable
* Remove debug code
* Rebase with master
* Minor bug fix in docs
* Copy files from adding_luke_v2 and improve docs
* change the default value of use_entity_aware_attention to True
* remove word_hidden_states
* fix head models
* fix tests
* fix the conversion script
* add integration tests for the pretrained large model
* improve docstring
* Improve docs, make style
* fix _init_weights for pytorch 1.8
* improve docs
* fix tokenizer to construct entity sequence with [MASK] entity when entities=None
* Make fix-copies
* Make style & quality
* Bug fixes
* Add LukeTokenizer to init
* Address most comments by @patil-suraj and @LysandreJik
* rename _compute_extended_attention_mask to get_extended_attention_mask
* add comments to LukeSelfAttention
* fix the documentation of the tokenizer
* address comments by @patil-suraj, @LysandreJik, and @sgugger
* improve docs
* Make style, quality and fix-copies
* Improve docs
* fix docs
* add "entity_span_classification" task
* update example code for LukeForEntitySpanClassification
* improve docs
* improve docs
* improve the code example in luke.rst
* rename the classification layer in LukeForEntityClassification from typing to classifier
* add bias to the classifier in LukeForEntitySpanClassification
* update docs to use fine-tuned hub models in code examples of the head models
* update the example sentences
* Make style & quality
* Add require_torch to tokenizer tests
* Add require_torch to tokenizer tests
* Address comments by @sgugger and add community notebooks
* Make fix-copies
Co-authored-by: Ikuya Yamada <ikuya@ikuya.net>
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
* create modeling_tf_rag
* add tests for tf
* add tf tests
* revert wrong pt commit
* further refactor
* further refactor
* refactor
* Update modeling_tf_rag.py
- input_processing
- fix prepare_input_for_generation (mostly fix generate bug)
- bring back from_pretrained hack in order to test generate
* delete colab pieces of code
* Show case of greedy "generate"
Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.
* cosmetic update
* correct typos
* update
* push some progress
* make easy check
* fix rag save from pretrained
* Update src/transformers/modeling_tf_utils.py
* remove commented out lines
* delete unnecessary lines
* add simple test case for nq_checkpoint
Add nq_checkpoint test to show that current version without hack still fails
* temporarily put ugly hack back again
* Add TFRagSequenceForGeneration!!
* __init__.py , import TFRagSequenceForGeneration
* Add TFRagSequence tests!
* rag init.py - add TFRagSequenceForGeneration
* fix from_pretrained
* fix prepare_inputs_for_generation
* Beam search for RagToken!
* minor clean up
* add tf.cast in TFRagModel
* More tf.cast
* Add all remaining tests (still have issues)
* delete all T5 related
* make style
* fix load weight prefix
* fix bart
* fix return_dict for tf_rag
make all tests pass .. Hooray
* fix some tests
* fix code quality
* fix qualtiy check
* finish tests tf rag
* add tf rag to docs
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Delete outdated comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* improve doc strings
* add generative model classes
* fix adjust token logic
* refactor generate for TFRag
* using shape_list, not _get_shape
Co-authored-by: Julien Plu <plu.julien@gmail.com>
* axis=[1]->axis=1
* delete NEED_HELP comment
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Indicating model is in a developing state in docstrings
As suggested by Julien
* small last changes
* apply sylvains suggestions
* finish tf rag
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Main init work
* Add version
* Change from absolute to relative imports
* Fix imports
* One more typo
* More typos
* Styling
* Make quality script pass
* Add necessary replace in template
* Fix typos
* Spaces are ignored in replace for some reason
* Forgot one models.
* Fixes for import
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Add documentation
* Styling
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* first commit
* change phobert to phoBERT as per author in overview
* v3 and v4 both runs on same code hence there is no need to differentiate them
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* create model
* add integration
* save current state
* make integration tests pass
* add one more test
* add explanation to tests
* remove from bart
* add padding
* remove unnecessary test
* make all tests pass
* re-add cookie cutter tests
* finish PyTorch
* fix attention test
* Update tests/test_modeling_common.py
* revert change
* remove unused file
* add string to doc
* save intermediate
* make tf integration tests pass
* finish tf
* fix doc
* fix docs again
* add led to doctree
* add to auto tokenizer
* added tips for led
* make style
* apply jplus statements
* correct tf longformer
* apply lysandres suggestions
* apply sylvains suggestions
* Apply suggestions from code review
* remove make on the fly linear embedding
* start refactor
* big first refactor
* save intermediate
* save intermediat
* correct mask issue
* save tests
* refactor padding masks
* make all tests pass
* further refactor
* make pegasus test pass
* fix bool if
* fix leftover tests
* continue
* bart renaming
* delete torchscript test hack
* fix imports in tests
* correct shift
* fix docs and repo cons
* re-add fix for FSTM
* typo in test
* fix typo
* fix another typo
* continue
* hot fix 2 for tf
* small fixes
* refactor types linting
* continue
* finish refactor
* fix import in tests
* better bart names
* further refactor and add test
* delete hack
* apply sylvains and lysandres commens
* small perf improv
* further perf improv
* improv perf
* fix typo
* make style
* small perf improv