* Add test for SentencePiece not adding special tokens to strings
* Add SentencePieceStringConversionMixin to fix issue 15003
* Fix conversion from tokens to string for most SentencePiece tokenizers
Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer
* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
* Fix DebertaV2Tokenizer
* Ignore LayoutXLMTokenizer in SentencePiece string conversion test
* Run 'make style' and 'make quality'
* Clean convert_tokens_to_string test
Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.
* Remove commented out code
* Improve robustness of convert_tokens_to_string test
Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.
* Inline and remove SentencePieceStringConversionMixin
The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.
* Run 'make style' and 'make quality'
* Revert removal of space in convert_tokens_to_string
* Remove redundant import
* Revert test text to original
* Uncomment the lowercasing of the reverse_text variable
* Mimic Rust tokenizer behavior for tokenizers
- Albert
- Barthez
- Camembert
- MBart50
- T5
* Fix accidentally skipping test in wrong tokenizer
* Add test for equivalent Rust and slow tokenizer behavior
* Override _decode in BigBirdTokenizer to mimic Rust behavior
* Override _decode in FNetTokenizer to mimic Rust behavior
* Override _decode in XLNetTokenizer to mimic Rust behavior
* Remove unused 're' import
* Update DebertaV2Tokenizer to mimic Rust tokenizer
* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
* Ignore problematic tests in Deberta V2
* Add comment on why the Deberta V2 tests are skipped
* initial commit
* First draft that gets outputs without crashing!
* Add all the ported openfold dependencies
* testing
* Restructure config files for ESMFold
* Debugging to find output discrepancies
* Mainly style
* Make model runnable without extra deps
* Remove utils and merge them to the modeling file
* Use correct gelu and remove some debug prints
* More cleanup
* Update esm docs
* Update conversion script to support ESMFold properly
* Port some top-level changes from ESMFold repo
* Expand EsmFold docstrings
* Make attention_mask optional (default to all 1s)
* Add inference test for ESMFold
* Use config and not n kwargs
* Add modeling output class
* Remove einops
* Remove chunking in ESM FFN
* Update tests for ESMFold
* Quality
* REpo consistency
* Remove tree dependency from ESMFold
* make fixup
* Add an error in case my structure map function breaks later
* Remove needless code
* Stop auto-casting the LM to float16 so CPU tests pass
* Stop auto-casting the LM to float16 so CPU tests pass
* Final test updates
* Split test file
* Copyright and quality
* Unpin PyTorch to see built doc
* Fix config file to_dict() method
* Add some docstrings to the output
* Skip TF checkpoint tests for ESM until we reupload those
* make fixup
* More docstrings
* Unpin to get even with main
* Flag example to write
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Support segformer fx
* Add fx_compatible attribute to test_modeling_segformer.py
* Update glpn model (fx support)
glpn model was copied from segformer.
* Update utils/fx.py | add semantic-segmentation
for SegformerForSemanticSegmentation model
* Fix minor import order(isort)
* Add random input generation for segformer fx
Co-authored-by: noelbird <lduldu00228@gmail.com>
* support sentencepiece for bertjapanesetokenizer
* add test vocab file for sentencepiece, bertjapanesetokenizer
* make BasicTokenizer be identical to transformers.models.bert.tokenization_bert.BasicTokenizer
* fix missing of \n in comment
* fix init argument missing in tests
* make spm_file be optional, exclude spiece.model from tests/fixtures, and add description comments
* make comment length less than 119
* apply doc style check
* First step of PT->TF for composite models
* Update the tests
* For VisionEncoderDecoderModel
* Fix
* Fix
* Add comment
* Fix
* clean up import
* Save memory
* For (TF)EncoderDecoderModel
* For (TF)EncoderDecoderModel
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Clean up deprecation warnings
Notes:
Changed some strings in tests to raw strings, which will change the literal content of the strings as they are fed into whatever machine handles them.
Test cases for past in the past/past_key_values switch changed/removed due to warning of impending removal
* Add PILImageResampling abstraction for PIL.Image.Resampling
* Partial TF port for ESM model
* Add ESM-TF tests
* Add the various imports for TF-ESM
* TF weight conversion almost ready
* Stop ignoring the decoder weights in PT
* Add tests and lots of fixes
* fix-copies
* Fix imports, add model docs
* Add get_vocab() to tokenizer
* Fix vocab links for pretrained files
* Allow multiple inputs with a sep
* Use EOS as SEP token because ESM vocab lacks SEP
* Correctly return special tokens mask from ESM tokenizer
* make fixup
* Stop testing unsupported embedding resizing
* Handle TF bias correctly
* Skip all models with slow tokenizers in the token classification test
* Fixing the batch/unbatcher of pipelines to accomodate the `None` being
passed around.
* Fixing pipeline bug caused by slow tokenizer being different.
* Update src/transformers/models/esm/modeling_tf_esm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/esm/modeling_tf_esm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/esm/modeling_tf_esm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update set_input_embeddings and the copyright notices
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add suport for non fast tf bert tokenizer
* add tests for non fast tf bert tokenizer
* fix fast bert tf tokenizer flag
* double tokenizers list on tf tokenizers test to aovid breaking zip on test output equivalence
* reformat code with black to comply with code quality checks
* trigger ci
* return None to avoid recursive call
* Give error
* Give error
* Add test
* More tests
* Quality
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* First draft
* Fix more things
* Improve more things
* Remove some head models
* Fix more things
* Add missing layers
* Remove tokenizer
* Fix more things
* Fix copied from statements
* Make all tests pass
* Remove print statements
* Remove files
* Fix README and docs
* Add integration test and fix organization
* Add tips
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Make tests faster, improve docs
* Fix doc tests
* Add model to toctree
* Add docs
* Add note about creating new checkpoint
* Remove is_decoder
* Make tests smaller, add docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* implemented TFCvtModel and TFCvtForImageClassification and modified relevant files, added an exception in convert_tf_weight_name_to_pt_weight_name, added quick testing file to compare with pytorch model
* added docstring + testing file in transformers testing suite
* added test in testing file, modified docs to pass repo-consistency, passed formatting test
* refactoring + passing all test
* small refacto, removing unwanted comments
* improved testing config
* corrected import error
* modified acces to pretrained model archive list, to pass tf_test
* corrected import structure in init files
* modified testing for keras_fit with cpu
* correcting PR issues + Refactoring
* Refactoring : improving readability and reducing the number of permutations
* corrected momentum value + cls_token initialization
* removed from_pt as weights were added to the hub
* Update tests/models/cvt/test_modeling_tf_cvt.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* added test
* correct embedding init
* some changes in blenderbot (incomplete)
* update blenderbot (diff to be used as reference)
* update blenderbot_small
* update LED
* update marian
* update T5 and remove TFWrappedEmbeddings
* nullcontext() -> ContextManagers()
* fix embedding init
* Add `OPTForQuestionAnswering`
- added `OPTForQuestionAnswering` class based on `BloomForQuestionAnswering`
- added `OPTForQuestionAnswering` in common tests
- all common tests pass
- make fixup done
* added docstrings for OPTForQuestionAnswering
* Fix docstrings for OPTForQuestionAnswering
Ensures post_process_instance_segmentation and post_process_panoptic_segmentation methods return a tensor of shape (target_height, target_width) filled with -1 values if no segment with score > threshold is found.
* add sudachipy and jumanpp tokenizers for bert_japanese
* use ImportError instead of ModuleNotFoundError in SudachiTokenizer and JumanppTokenizer
* put test cases of test_tokenization_bert_japanese in one line
* add require_sudachi and require_jumanpp decorator for testing
* add sudachi and pyknp(jumanpp) to dependencies
* remove sudachi_dict_small and sudachi_dict_full from dependencies
* empty commit for ci
- Improves MaskFormer docs, corrects minor typos
- Restructures MaskFormerFeatureExtractor.post_process_panoptic_segmentation for better readability, adds target_sizes argument for optional resizing
- Adds post_process_semantic_segmentation and post_process_instance_segmentation methods.
- Adds a deprecation warning to post_process_segmentation method in favour of post_process_instance_segmentation
* add bloom for question answering
- attempt to add Bloom for question answering
- adapted from `GPTJForQuestionAnswering`
- Fixed `num_labels` to `2` for common tests
- Added a bit of docstring
- All common tests pass
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert changes related to `num_labels`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Rebase ESM PR and update all file formats
* Fix test relative imports
* Add __init__.py to the test dir
* Disable gradient checkpointing
* Remove references to TFESM... FOR NOW >:|
* Remove completed TODOs from tests
* Convert docstrings to mdx, fix-copies from BERT
* fix-copies for the README and index
* Update ESM's __init__.py to the modern format
* Add to _toctree.yml
* Ensure we correctly copy the pad_token_id from the original ESM model
* Ensure we correctly copy the pad_token_id from the original ESM model
* Tiny grammar nitpicks
* Make the layer norm after embeddings an optional flag
* Make the layer norm after embeddings an optional flag
* Update the conversion script to handle other model classes
* Remove token_type_ids entirely, fix attention_masking and add checks to convert_esm.py
* Break the copied from link from BertModel.forward to remove token_type_ids
* Remove debug array saves
* Begin ESM-2 porting
* Add a hacky workaround for the precision issue in original repo
* Code cleanup
* Remove unused checkpoint conversion code
* Remove unused checkpoint conversion code
* Fix copyright notices
* Get rid of all references to the TF weights conversion
* Remove token_type_ids from the tests
* Fix test code
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add credit
* Remove _ args and __ kwargs in rotary embedding
* Assertively remove asserts
* Replace einsum with torch.outer()
* Fix docstring formatting
* Remove assertions in tokenization
* Add paper citation to ESMModel docstring
* Move vocab list to single line
* Remove ESMLayer from init
* Add Facebook copyrights
* Clean up RotaryEmbedding docstring
* Fix docstring formatting
* Fix docstring for config object
* Add explanation for new config methods
* make fix-copies
* Rename all the ESM- classes to Esm-
* Update conversion script to allow pushing to hub
* Update tests to point at my repo for now
* Set config properly for tests
* Remove the gross hack that forced loss of precision in inv_freq and instead copy the data from the model being converted
* make fixup
* Update expected values for slow tests
* make fixup
* Remove EsmForCausalLM for now
* Remove EsmForCausalLM for now
* Fix padding idx test
* Updated README and docs with ESM-1b and ESM-2 separately (#19221)
* Updated README and docs with ESM-1b and ESM-2 separately
* Update READMEs, longer entry with 3 citations
* make fix-copies
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Tom Sercu <tsercu@fb.com>
Co-authored-by: Your Name <you@example.com>
* chore: initial commit
* chore: adding util methods
yet to work on the nn.functional.interpolate port with align_corener=True
* chore: refactor the utils
* used tf.compat.v1.image.resize to align the F.interpolate function
* added type hints to the method signatures
* added references to the gists where one 2 one alignment of torch and tf has been shown
* chore: adding the layers
* chore: porting all the layers from torch to tf
This is the initial draft, nothing is tested yet.
* chore: aligning the layers with reference to tf clip
* chore: aligning the modules
* added demaraction comments
* added copied and adapted from comments
* chore: aligning with CLIP
* chore: wrangling the layers to keep it tf compatible
* chore: aligning the names of the layers for porting
* chore: style changes
* chore: adding docs and inits
* chore: adding tfp dependencis
the code is taken from TAPAS
* chore: initial commit for testing
* chore: aligning the vision embeddings with the vit implementatino
* chore: changing model prefix
* chore: fixing the name of the model and the layer normalization test case
* chore: every test passes but the slow ones
* chore: fix style and integration test
* chore: moving comments below decorators
* chore: make fixup and fix-copies changes
* chore: adding the Vision and Text Model to check_repo
* chore: modifying the prefix name to align it with the torch implementation
* chore: fix typo in configuration
* choer: changing the name of the model variable
* chore: adding segmentation flag
* chore: gante's review
* chore: style refactor
* chore: amy review
* chore: adding shape_list to parts that have been copied from other snippets
* chore: init batchnorm with torch defaults
* chore: adding shape_list to pass the tests
* test fix: adding seed as 0
* set seed
* chore: changing the straight through trick to fix -ve dimensinos
* chore: adding a dimension to the loss
* chore: adding reviewers and contributors names to the docs
* chore: added changes after review
* chore: code quality fixup
* chore: fixing the segmentation snippet
* chore: adding to the layer calls
* chore: changing int32 to int64 for inputs of serving
* chore: review changes
* chore: style changes
* chore: remove from_pt=True
* fix: repo consistency
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add DeformableDetrFeatureExtractor
* Fix post_process
* Fix name
* Add tests for feature extractor
* Fix doc tests
* Fix name
* Address comments
* Apply same fix to DETR and YOLOS as well
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* add gpt-neox-japanese model and tokenizer as new model
* Correction to PR's comment for GPT NeoX Japanese
- Fix to be able to use gpu
- Add comment # Copied... at the top of RotaryEmbedding
- Implement nn.Linear instead of original linear class
- Add generation test under @slow
* fix bias treatment for gpt-neox-japanese
* Modidy gpt-neox-japanese following PR
- add doc for bias_dropout_add
- style change following a PR comment
* add document for gpt-neox-japanese
* remove unused import from gpt-neox-japanese
* fix README for gpt-neox-japanese
* First draft
* More improvements
* Improve model, add custom CUDA code
* Import torch before
* Add script that imports custom layer
* Add everything in new ops directory
* Import custom layer in modeling file
* Fix ARCHIVE_MAP typo
* Creating the custom kernel on the fly.
* Import custom layer in modeling file
* More improvements
* Fix CUDA loading
* More improvements
* Improve conversion script
* Improve conversion script
* Make it work until encoder_outputs
* Make forward pass work
* More improvements
* Make logits match original implementation
* Make implementation also support single_scale model
* Add support for single_scale and dilation checkpoint
* Add support for with_box_refine model
* Support also two stage model
* Improve tests
* Fix more tests
* Make more tests pass
* Upload all models to the hub
* Clean up some code
* Improve decoder outputs
* Rename intermediate hidden states and reference points
* Improve model outputs
* Move tests to dedicated folder
* Improve model outputs
* Fix retain_grad test
* Improve docs
* Clean up and make test_initialization pass
* Improve variable names
* Add copied from statements
* Improve docs
* Fix style
* Improve docs
* Improve docs, move tests to model folder
* Fix rebase
* Remove DetrForSegmentation from auto mapping
* Apply suggestions from code review
* Improve variable names and docstrings
* Apply some more suggestions from code review
* Apply suggestion from code review
* better docs and variables names
* hint to num_queries and two_stage confusion
* remove asserts and code refactor
* add exception if two_stage is True and with_box_refine is False
* use f-strings
* Improve docs and variable names
* Fix code quality
* Fix rebase
* Add require_torch_gpu decorator
* Add pip install ninja to CI jobs
* Apply suggestion of @sgugger
* Remove DeformableDetrForObjectDetection from auto mapping
* Remove DeformableDetrModel from auto mapping
* Add model to toctree
* Add model back to mappings, skip model in pipeline tests
* Apply @sgugger's suggestion
* Fix imports in the init
* Fix copies
* Add CPU implementation
* Comment out GPU function
* Undo previous change
* Apply more suggestions
* Remove require_torch_gpu annotator
* Fix quality
* Add logger.info
* Fix logger
* Fix variable names
* Fix initializaztion
* Add missing initialization
* Update checkpoint name
* Add model to doc tests
* Add CPU/GPU equivalence test
* Add Deformable DETR to pipeline tests
* Skip model for object detection pipeline
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Use int64 throughout TFLongFormer
* make style
* Do some more fixed casting in TFLongFormer
* Fix some wonky "is None" conditionals
* Cast all the dtypes, salt the earth
* Fix copies to TFLED as well and do some casting there
* dtype fix in TFLongformer test
* Make fixup
* Expand tolerances on the LED tests too (I think this is a TF32 thing)
* Expand test tolerances for LED a tiny bit (probably a Tensorfloat thing again)
* Fix train_step and test_step, correctly enable CLIP fit test
* Stop using get_args on older Python versions
* Don't use get_origin either
* UnionType is actually even newer, don't use that either
* Apply the same fix to test_loss_computation
* Just realized I was accidentally skipping a bunch of tests!
* Fix test_loss_computation for models without separable labels
* Fix scalar losses in test_step and train_step
* Stop committing your breakpoints
* Fix Swin loss shape
* Fix Tapas loss shape
* Shape fixes for TAPAS, DeIT, HuBERT and ViTMAE
* Add loss computation to TFMobileBertForPreTraining
* make fixup and move copied from statement
* make fixup and move copied from statement
* Correct copied from
* Add labels and next_sentence_label inputs to TFMobileBERT
* Make sure total_loss is always defined
* Update tests/test_modeling_tf_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix copied from
* Ensure CTC models get labels in tests
* Ensure CTC models get labels in tests
* Fix tests for vit_mae
* Fix tests for vit_mae
* Fix tests for vit_mae
* Reduce batch size for wav2vec2 testing because it was causing OOM
* Skip some TAPAS tests that are failing
* Skip a failing HuBERT test
* make style
* Fix mobilebertforpretraining test
* Skip Wav2Vec2 tests that use huge amounts of mem
* Skip keras_fit for Wav2Vec2 as well
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* First draft
* Improve conversion script
* Make vision encoder work
* More improvements
* Improve conversion script
* Fix quality
* Add MultiframeIntegrationTransformer
* More improvements
* Make MiT output work
* Fix quality
* Add prompts generator
* Add tests
* Fix some tests
* Fix some more tests
* Fix more tests
* Improve conversion script
* Fix model outputs
* Fix more tests
* Add XClipProcessor
* Use processor in conversion script
* Fix integration test
* Update README, fix docs
* Fix all tests
* Add MIT output to XClipOutput
* Create better variable names
* Rename XClip to XCLIP
* Extend conversion script
* Add support for large models
* Add support for 16 frame models
* Add another model'
* Fix module issue
* Apply suggestions from code review
* Add figure to docs
* Fix CLIPProcessor issue
* Apply suggestions from code review
* Delete file
* Convert more checkpoints
* Convert last checkpoint
* Update nielsr to microsoft