* Add first draft
* Update conversion script
* Improve conversion script
* Improve conversion script some more
* Add conditional embeddings
* Add initial decoder
* Fix activation function of decoder
* Make decoder outputs match original implementation
* Make decoder outputs match original implementation
* Add more copied from statements
* Improve model outputs
* Fix auto tokenizer file
* Fix more tests
* Add test
* Improve README and docs, improve conditional embeddings
* Fix more tests
* Remove print statements
* Remove initial embeddings
* Improve conversion script
* Add interpolation of position embeddings
* Finish addition of interpolation of position embeddings
* Add support for refined checkpoint
* Fix refined checkpoint
* Remove unused parameter
* Improve conversion script
* Add support for training
* Fix conversion script
* Add CLIPSegFeatureExtractor
* Fix processor
* Fix CLIPSegProcessor
* Fix conversion script
* Fix most tests
* Fix equivalence test
* Fix README
* Add model to doc tests
* Use better variable name
* Convert other checkpoint as well
* Update config, add link to paper
* Add docs
* Update organization
* Replace base_model_prefix with clip
* Fix base_model_prefix
* Fix checkpoint of config
* Fix config checkpoint
* Remove file
* Use logits for output
* Fix tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Add test for SentencePiece not adding special tokens to strings
* Add SentencePieceStringConversionMixin to fix issue 15003
* Fix conversion from tokens to string for most SentencePiece tokenizers
Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer
* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
* Fix DebertaV2Tokenizer
* Ignore LayoutXLMTokenizer in SentencePiece string conversion test
* Run 'make style' and 'make quality'
* Clean convert_tokens_to_string test
Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.
* Remove commented out code
* Improve robustness of convert_tokens_to_string test
Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.
* Inline and remove SentencePieceStringConversionMixin
The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.
* Run 'make style' and 'make quality'
* Revert removal of space in convert_tokens_to_string
* Remove redundant import
* Revert test text to original
* Uncomment the lowercasing of the reverse_text variable
* Mimic Rust tokenizer behavior for tokenizers
- Albert
- Barthez
- Camembert
- MBart50
- T5
* Fix accidentally skipping test in wrong tokenizer
* Add test for equivalent Rust and slow tokenizer behavior
* Override _decode in BigBirdTokenizer to mimic Rust behavior
* Override _decode in FNetTokenizer to mimic Rust behavior
* Override _decode in XLNetTokenizer to mimic Rust behavior
* Remove unused 're' import
* Update DebertaV2Tokenizer to mimic Rust tokenizer
* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
* Ignore problematic tests in Deberta V2
* Add comment on why the Deberta V2 tests are skipped
* initial commit
* First draft that gets outputs without crashing!
* Add all the ported openfold dependencies
* testing
* Restructure config files for ESMFold
* Debugging to find output discrepancies
* Mainly style
* Make model runnable without extra deps
* Remove utils and merge them to the modeling file
* Use correct gelu and remove some debug prints
* More cleanup
* Update esm docs
* Update conversion script to support ESMFold properly
* Port some top-level changes from ESMFold repo
* Expand EsmFold docstrings
* Make attention_mask optional (default to all 1s)
* Add inference test for ESMFold
* Use config and not n kwargs
* Add modeling output class
* Remove einops
* Remove chunking in ESM FFN
* Update tests for ESMFold
* Quality
* REpo consistency
* Remove tree dependency from ESMFold
* make fixup
* Add an error in case my structure map function breaks later
* Remove needless code
* Stop auto-casting the LM to float16 so CPU tests pass
* Stop auto-casting the LM to float16 so CPU tests pass
* Final test updates
* Split test file
* Copyright and quality
* Unpin PyTorch to see built doc
* Fix config file to_dict() method
* Add some docstrings to the output
* Skip TF checkpoint tests for ESM until we reupload those
* make fixup
* More docstrings
* Unpin to get even with main
* Flag example to write
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Support segformer fx
* Add fx_compatible attribute to test_modeling_segformer.py
* Update glpn model (fx support)
glpn model was copied from segformer.
* Update utils/fx.py | add semantic-segmentation
for SegformerForSemanticSegmentation model
* Fix minor import order(isort)
* Add random input generation for segformer fx
Co-authored-by: noelbird <lduldu00228@gmail.com>
* Wip
* Add safetensors support for TensorFlow
* First tests
* Add final test for now
* Retrigger CI like this
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Factored out some code in the image-segmentation pipeline
Re-enable `small_model_pt`.
Re-enable `small_model_pt`.
Enabling the current test with the current values.
Debugging the values on the CI.
More logs ? Printing doesn't work ?
Using the CI values instead. Seems to be a Pillow sensitivity.
Added a test showcasing that models not supporting some tasks get a
clear error.
Factored out code.
Further factor out.
Fixup.
Bad rebase.
Put `panoptic` before `instance` as it should be a superset.
* Fixing tests.
* Adding subtasks tests
+ Fixes `instance` segmentation which was broken due to default and
non kwargs arguments.
* Fix bad replace.
* support sentencepiece for bertjapanesetokenizer
* add test vocab file for sentencepiece, bertjapanesetokenizer
* make BasicTokenizer be identical to transformers.models.bert.tokenization_bert.BasicTokenizer
* fix missing of \n in comment
* fix init argument missing in tests
* make spm_file be optional, exclude spiece.model from tests/fixtures, and add description comments
* make comment length less than 119
* apply doc style check
* First step of PT->TF for composite models
* Update the tests
* For VisionEncoderDecoderModel
* Fix
* Fix
* Add comment
* Fix
* clean up import
* Save memory
* For (TF)EncoderDecoderModel
* For (TF)EncoderDecoderModel
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Re-enable `small_model_pt`.
Re-enable `small_model_pt`.
Enabling the current test with the current values.
Debugging the values on the CI.
More logs ? Printing doesn't work ?
Using the CI values instead. Seems to be a Pillow sensitivity.
* Update src/transformers/pipelines/image_segmentation.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* add: the contrastive search for generaton_utils
* add: testing scripts for contrastive search under examples/text-generation
* update the quality of codes
* revise the docstring; make the generation_contrastive_search.py scripts;
* revise the examples/pytorch/text-generation/run_generation_contrastive_search.py to the auto-APIs format
* revise the necessary documents
* fix: revise the docstring of generation_contrastive_search.py
* Fix the code indentation
* fix: revise the nits and examples in contrastive_search docstring.
* fix the copyright
* delete generation_contrastive_search.py
* revise the logic in contrastive_search
* update the intergration test and the docstring
* run the tests over
* add the slow decorate to the contrastive_search intergrate test
* add more test
* do the style, quality, consistency checks
* Clean up deprecation warnings
Notes:
Changed some strings in tests to raw strings, which will change the literal content of the strings as they are fed into whatever machine handles them.
Test cases for past in the past/past_key_values switch changed/removed due to warning of impending removal
* Add PILImageResampling abstraction for PIL.Image.Resampling