* [FT] First commit for graphormer architecture.
The model has no tokenizer, as it uses a collator and preprocessing function for its input management.
Architecture to be tested against original one.
The arch might need to be changed to fit the checkpoint, but a revert to the original arch will make the code less nice to read.
TODO: doc
* [FIX] removed test model
* [FIX] import error
* [FIX] black and flake
* [DOC] added paper refs
* [FIX] [DOC]
* [FIX] black
* [DOC] Updated READMEs
* [FIX] Order of imports + rm Tokenizer calls
* [FIX] Moved assert in class to prevent doc build failure
* [FIX] make fix-copies
* [Doc] update from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [FIX] Removed Graphormer from Sequence classification model list
* [DOC] Added HF copyright to Cython file
* [DOC] Fixed comments
* [FIX] typos in class doc + removed config classes.
Todo: update doc from paper definitions
* [FIX] Removed dependency to fairseq, and replaced all asserts with Exception management
* [FIX] Homogeneized initialization of weights to pretrained constructor
* [FIX] [CP] Updated multi_hop parameter to get same results as in original implementation
* [DOC] Relevant parameter description in the configuration file
* [DOC] Updated doc and comments in main graphormer file
* [FIX] make style and quality checks
* [DOC] Fix doc format
* [FIX] [WIP] Updated part of the tests, though still a wip
* [FIX] [WIP]
* [FIX] repo consistency
* [FIX] Changed input names for more understandability
* [FIX] [BUG] updated num_classes params for propagation in the model
* simplified collator
* [FIX] Updated tests to follow new naming pattern
* [TESTS] Updated test suite along with model
* |FIX] rm tokenizer import
* [DOC] add link to graphormerdoc
* Changed section in doc from text model to graph model
* Apply suggestions from code review
Spacing, inits
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [DOC] Explain algos_graphormer functions
* Cython soft import protection
* Rm call to Callable in configuration graphormer
* [FIX] replaced asserts with Exceptions
* Add org to graphormer checkpoints
* Prefixed classes with Graphormer
* Management of init functions
* format
* fixes
* fix length file
* update indent
* relaunching ci
* Errors for missing cython imports
* fix style
* fix style doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Extended the CV preprocessing section with more details and refactored the example
* added padding to the CV section, though it is a special case
* Added a tip about post processing methods
* make style
* link update
* Apply suggestions from review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* review feedback
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* `blip` support for training
* remove labels creation
* remove unneeded `decoder_input_ids` creation
* final changes
- add colab link to documentation
- reduction = mean for loss
* fix nits
* update link
* clearer error message
* initial commit, refactoring the text generation api reference
* removed repetitive code examples
* Refactoring the text generation docs to reduce repetition
* make style
* Part of the "text generation" rework: adding a high-level overview of the text generation strategies
* code samples update via make style
* fixed a few formatting issues
* Apply suggestions from review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fixed spaces, and switched two links to markdown
* Apply Steven's suggestions from review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* new lines after headers to fix link rendering
* review feedback addressed. added links to image captioning and audio transcription examples
* minor capitalization fix
* addressed the review feedback
* Apply suggestions from review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Applied review suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Added TF example for image classification
* Code style polishing
* code style polishing
* minor polishing
* fixed a link in a tip, and a typo in the inference TF content
* Apply Amy's suggestions from review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/tasks/image_classification.mdx
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* review feedback addressed
* make style
* added PushToHubCallback with save_strategy="no"
* minor polishing
* added PushToHubCallback with save_strategy=no
* minor polishing
* Update docs/source/en/tasks/image_classification.mdx
* added data augmentation
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make style
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* torch.jit._state
* Fix past CI
* Fix for perceiver
* Fix REALM
* Fix for Bloom
* Fix for SwinMode
* Fix for TrajectoryTransformerModel
* Fix for test_wav2vec2_with_lm
* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Copy RoBERTa
* formatting
* implement RoBERTa with prelayer normalization
* update test expectations
* add documentation
* add convertion script for DinkyTrain weights
* update checkpoint repo
Unfortunately the original checkpoints assumes a hacked roberta model
* add to RoBERTa-PreLayerNorm docs to toc
* run utils/check_copies.py
* lint files
* remove unused import
* fix check_repo reporting wrongly a test is missing
* fix import error, caused by rebase
* run make fix-copies
* add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS
* Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup: Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add missing Flax header
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* expected_slice -> EXPECTED_SLICE
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update copies after rebase
* add missing copied from statements
* make fix-copies
* make prelayernorm explicit in code
* fix checkpoint path for the original implementation
* add flax integration tests
* improve docs
* update utils/documentation_tests.txt
* lint files
* Remove Copyright notice
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make fix-copies
* Remove EXPECTED_SLICE calculation comments
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* generate from config mvp
* fix failing tests
* max_time test
* Load default gen config at model load time; Update docs
* further documentation; add tests
* adapt rag to the new structure
* handle models not instantiated with from_pretained (like in tests)
* better default generation config
* add can_generate fn
* handle legacy use case of ad hoc model config changes
* initialize gen config from config in individual methods, if gen config is none
* fix _get_decoder_start_token_id when called outside GenerationMixin
* correct model config load order (set attr > model config > decoder config)
* update rag to match latest changes
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* load gen config from model config in model.from_pretrained
* fix can_generate fn
* handle generate calls without a previous from_pretrained (e.g. tests)
* add legacy behavior (and a warning)
* lower logger severity
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add templates for gpt-sw3
* Add templates for gpt-sw3
* Added sentencepiece tokenizer
* intermediate commit with many changes
* fixed conflicts
* Init commit for tokenization port
* Tokenization progress
* Remove fast tokenizer
* Clean up and rename spm.model -> spiece.model
* Remove TF -> PT conversion script template, Clean up Megatron -> PT script
* Optimize encode & decode performance
* added new attention
* added new attention
* attention for gpt-sw3 working
* attention good
* Cache is now working
* fixed attention mask so that it works with causal attention
* fixed badbmm bug for cpu and caching
* updated config with correct parameters
* Refactor and leave optimizations as separate functions to avoid breaking expected functionality
* Fix special tokens mapping for both tokenizers
* cleaning up of code and comments
* HF compatible attention outputs
* Tokenizer now passing tests, add documentation
* Update documentation
* reverted back to base implementation after checking that it is identical to pretrained model
* updated gpt-sw3 config
* updated conversion script
* aligned parameters with gpt-sw3 config
* changed default scale_attn_by_inverse_layer_idx to true
* removed flag from conversion script
* added temporary model path
* reverted back to functioning convert script
* small changes to default config
* updated tests for gpt-sw3
* make style, make quality, minor cleanup
* Change local paths to testing online repository
* Change name: GptSw3 -> GPTSw3
* Remove GPTSw3TokenizerFast references
* Use official model repository and add more model sizes
* Added reference to 6.7b model
* Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel
* Remove pointers to non-existing TFGPTSw3
* Add GPTSw3 to docs/_toctree.yml
* Remove TF artifacts from GPTSw3 in __init__ files
* Update README:s with 'make fix-copies'
* Add 20b model to archive list
* Add documentation for GPT-Sw3
* Fix typo in documentation for GPT-Sw3
* Do 'make fix-copies' again after having updated docs
* Fix some typos in docs
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Resolve comments from PR feedback
* Resolve more comments from PR feedback, also set use_cache=True in convert script
* Add '# Copied from' comments for GPTSw3 modeling
* Set 'is_parallelizable = False'
* Remove '# Copied from' where code was modified and add 'with x->y' when appropriate
* Remove parallelize in mdx
* make style, make quality
* Update GPTSw3Config default values and corresponding documentation
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available
* Make style, make quality
* Add dummy object for GPTSw3Tokenizer via 'make fix-copies'
* make fix-copies
* Remove GPTSw3 modeling classes
* make style, make quality
* Add GPTSw3 auto-mappings for other GPT2 heads
* Update docs/source/en/model_doc/gpt-sw3.mdx
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove old TODO-comment
* Add example usage to GPTSw3Tokenizer docstring
* make style, make quality
* Add implementation details and example usage to gpt-sw3.mdx
Co-authored-by: JoeyOhman <joeyoh@kth.se>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* read to load
* base functionality
* revert init
* fix dummy data
* moving right along
* moving right along
* finally
* cleanup
* pull out comment
* add test
* update docstring for main class
* flake comments and rewriting copies from make repo-consistency`
* remove irrelevant differences/accidental spaces
* put copies back after space removals
* mid
* final test pass
* stray comment
* update test file
* update test file
* fixup
* black
* missed
* black missed one more
* sytle
* add doc update
* fix order of output class
* comment
* Revert "comment"
This reverts commit 03f86b6948.
* remove redundant function, and redundant reshape
* move change out of common
* style
* put common spaces back
* reorder kwargs in output
* doc style
* [WIP] Rework the pipeline tutorial
- Switch to `asr` instead of another NLP task.
- It also has simpler to understand results.
- Added a section with interaction with `datasets`.
- Added a section with writing a simple webserver.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Addressing comments.
* Links.
* Fixing docs format.
* Adding pipeline_webserver to _toctree.
* Warnig -> Tip warnings={true}.
* Fix link ?
* Links ?
* Fixing link, adding chunk batching.
* Oops.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/pipeline_tutorial.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* biogpt initial commit
* updated init
* fix faster decoding with use_cache
* 1. fix input_ids and input_embeds with correct device
2. added _keys_to_ignore_on_load_missing
3. updated prepare_inputs_for_generation
* add activation_dropout and scale_embedding
* replace fsmt attention with bart attention
* added test
* run make fix-copies
* doc init and fix build
* updated README with proper information
* 1. added tips to docs
2. updated BioGptTokenizer func
* 1. added tokenizer test
2. refactor tokenizer
* make fixup
* add biogpt fairseq to hf converter
* updated layer names more
similar to original checkpoints
* config update doc string and set defaults
* added "#copied" from bart model and
updated doc strings
* enable model_input_names in tokenizer
* 1. positionalembedding depending on attention_mask
2. added attention mask to prepare for generation
* added test to verify past and generation
* BioGptLMHeadModel -> BioGptForCausalLM
* fix typo
* tokenization and test
Copyright and updated assertion
* updated Copyright and
one func at time in line
* Copyright updates and
minor doc fix
* replace assertion with ValueError
* rm extra space
* added code syntax
* revert cmnt position change
* add tokenizer to auto
* updated doc string
* tokenizer doc string update
* biogpt hub model update to microsoft/biogpt
* make fixup
* rm cmnt to fix flake8 5.0.4 vs 6 error
* add minimal working gpt2 tokenizer
* graph mode and output equivalence tests working
* not today tensorflow. serialization test passing!
* fix style, documentation, docstrings and all that jazz
* passing consistency checks
* move keras nlp to tf dependencies
* fix tf modeling utils and gpt2 attention to enable compiling
* fix (I hope) keras nlp dependencies
* rever changes on generation
* remove debug prints
* remove redundant tf dummy objects
* add from config, get config and max length settings to address review
* let flake ignore the error on distillation you are welcome
* test from config
* add padding test
* address sgugger review
* Add Donut image processor
* Update src/transformers/image_transforms.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Fix docstrings
* Full var names in docstring
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* First draft
* Fix backwards compatibility
* More fixes
* More fixes
* Make backbone more general
* Improve backbone
* Improve test
* Fix config checkpoint
* Address comments
* Use model_type
* Address more comments
* Fix special model names
* Remove MaskFormerSwinModel and MaskFormerSwinPreTrainedModel from main init
* Fix typo
* Update backbone
* Apply suggestion
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* First draft
* Make conversion script work
* Add id2label mapping, run code quality
* Fix copies
* Add first draft of feature extractor
* Update conversion script to use feature extractor
* Make more tests pass
* Add docs
* update input_features to input_values + pad by default to max length
* Fix doc tests
* Add feature extractor tests
* Add proper padding/truncation to feature extractor
* Add support for conversion of all audioset checkpoints
* Improve docs and extend conversion script
* Fix README
* Rename spectogram to spectrogram
* Fix copies
* Add integration test
* Remove dummy conv
* Update to ast
* Update organization
* Fix init
* Rename model to AST
* Add require_torchaudio annotator
* Move import of ASTFeatureExtractor under a is_speech_available
* Fix rebase
* Add pipeline config
* Update name of classifier head
* Rename time_dimension and frequency_dimension for clarity
* Remove print statement
* Fix pipeline test
* Fix pipeline test
* Fix index table
* Fix init
* Fix conversion script
* Rename to ForAudioClassification
* Fix index table
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* add model files etc for MobileNetV2
rename files for MobileNetV1
initial implementation of MobileNetV1
fix conversion script
cleanup
write docs
tweaks
fix conversion script
extract hidden states
fix test cases
make fixup
fixup it all
remove main from doc link
fixes
fix tests
fix up
use google org
fix weird assert
* fixup
* use google organization for checkpoints
* Update _toctree and clone original content
* Translate first three sections
* Add more translated chapters. Only 3 more left.
* Finish translation
* Run style from doc-builder
* Address recommended changes from reviewer
* Add DiNAT
* Adds DiNAT + tests
* Minor fixes
* Added HF model
* Add natten to dependencies.
* Cleanup
* Minor fixup
* Reformat
* Optional NATTEN import.
* Reformat & add doc to _toctree
* Reformat (finally)
* Dummy objects for DiNAT
* Add NAT + minor changes
Adds NAT as its own independent model + docs, tests
Adds NATTEN to ext deps to ensure ci picks it up.
* Remove natten from `all` and `dev-torch` deps, add manual pip install to ci tests
* Minor fixes.
* Fix READMEs.
* Requested changes to docs + minor fixes.
* Requested changes.
* Add NAT/DiNAT tests to layoutlm_job
* Correction to Dinat doc.
* Requested changes.
* Add resources of OpenAI GPT
* Delete Deploy section and add .
* Add scripts
* Update docs/source/en/model_doc/openai-gpt.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Delete causal-language-modeling section
* Add TFOpenAIGPTLMHeadModel
* Add resources from community
* Delete a link
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Adds image-guided object detection method to OwlViTForObjectDetection class as described in the original paper. One-shot/ image-guided object detection enables users to use a query image to search for similar objects in the input image.
Co-Authored-By: Dhruv Karan k4r4n.dhruv@gmail.com
* WIP: Added CLIP resources from HuggingFace blog
* ADD: Notebooks documentation to clip
* Add link straight to notebook
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Change notebook links to colab
Co-authored-by: Ambuj Pawar <your_email@abc.example>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* allow loading projection in text and vision model
* begin tests
* finish test for CLIPTextModelTest
* style
* add slow tests
* add new classes for projection heads
* remove with_projection
* add in init
* add in doc
* fix tests
* fix some more tests
* fix copies
* fix docs
* remove leftover from fix-copies
* add the head models in IGNORE_NON_AUTO_CONFIGURED
* fix docstr
* fix tests
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add docstr for models
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* docs: fix: set overflowing image width to auto-scale
* docs: fix: new language Korean is also affected
* docs: fix: unnecessary line break in index page
docs: i18n: first draft of index page
docs: fix: first revision of index page
docs: i18n: missed section - supported frameworks
docs: fix: second revision of index page
review by @ArthurZucker
refactor: remove untranslated files from korean
docs: fix: remove untranslated references from toctree.yml
feat: enable korean docs in gh actions
docs: feat: add in_translation page as placeholder
docs: bug: testing if internal toc need alphabet chars
docs: fix: custom english anchor for non-alphanumeric headings
review by @sgugger
docs: i18n: translate comments on install methods in _config.py
docs: refactor: more concise wording for translations
* add model files etc for MobileNetV2
* rename files for MobileNetV1
* initial implementation of MobileNetV1
* fix conversion script
* cleanup
* write docs
* tweaks
* fix conversion script
* extract hidden states
* fix test cases
* make fixup
* fixup it all
* rename V1 to V2
* fix checkpoints
* fixup
* implement first block + weight conversion
* add remaining layers
* add output stride and dilation
* fixup
* add tests
* add deeplabv3+ head
* a bit of fixup
* finish deeplab conversion
* add link to doc
* fix issue with JIT trace
in_height and in_width would be Tensor objects during JIT trace, which caused Core ML conversion to fail on the remainder op. By making them ints, the result of the padding calculation becomes a constant value.
* cleanup
* fix order of models
* fix rebase error
* remove main from doc link
* add image processor
* remove old feature extractor
* fix converter + other issues
* fixup
* fix unit test
* add to onnx tests (but these appear broken now)
* add post_process_semantic_segmentation
* use google org
* remove unused imports
* move args
* replace weird assert
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object
* Add first draft
* Update conversion script
* Improve conversion script
* Improve conversion script some more
* Add conditional embeddings
* Add initial decoder
* Fix activation function of decoder
* Make decoder outputs match original implementation
* Make decoder outputs match original implementation
* Add more copied from statements
* Improve model outputs
* Fix auto tokenizer file
* Fix more tests
* Add test
* Improve README and docs, improve conditional embeddings
* Fix more tests
* Remove print statements
* Remove initial embeddings
* Improve conversion script
* Add interpolation of position embeddings
* Finish addition of interpolation of position embeddings
* Add support for refined checkpoint
* Fix refined checkpoint
* Remove unused parameter
* Improve conversion script
* Add support for training
* Fix conversion script
* Add CLIPSegFeatureExtractor
* Fix processor
* Fix CLIPSegProcessor
* Fix conversion script
* Fix most tests
* Fix equivalence test
* Fix README
* Add model to doc tests
* Use better variable name
* Convert other checkpoint as well
* Update config, add link to paper
* Add docs
* Update organization
* Replace base_model_prefix with clip
* Fix base_model_prefix
* Fix checkpoint of config
* Fix config checkpoint
* Remove file
* Use logits for output
* Fix tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'
* docs: Resolve many typos in the English docs
Typos found via 'codespell ./docs/source/en'
* fix jit trace error for classification usecase, update related doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add implementation in torch 1.14.0
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* initial commit
* First draft that gets outputs without crashing!
* Add all the ported openfold dependencies
* testing
* Restructure config files for ESMFold
* Debugging to find output discrepancies
* Mainly style
* Make model runnable without extra deps
* Remove utils and merge them to the modeling file
* Use correct gelu and remove some debug prints
* More cleanup
* Update esm docs
* Update conversion script to support ESMFold properly
* Port some top-level changes from ESMFold repo
* Expand EsmFold docstrings
* Make attention_mask optional (default to all 1s)
* Add inference test for ESMFold
* Use config and not n kwargs
* Add modeling output class
* Remove einops
* Remove chunking in ESM FFN
* Update tests for ESMFold
* Quality
* REpo consistency
* Remove tree dependency from ESMFold
* make fixup
* Add an error in case my structure map function breaks later
* Remove needless code
* Stop auto-casting the LM to float16 so CPU tests pass
* Stop auto-casting the LM to float16 so CPU tests pass
* Final test updates
* Split test file
* Copyright and quality
* Unpin PyTorch to see built doc
* Fix config file to_dict() method
* Add some docstrings to the output
* Skip TF checkpoint tests for ESM until we reupload those
* make fixup
* More docstrings
* Unpin to get even with main
* Flag example to write
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Translated multiple_choice.mdx, question_answering.mdx. Added them to _toctree.yml
* Added translation for a missed line.
* Update _toctree.yml as per Omar's suggestions
* Update multiple_choice.mdx as per Omar's comments
* Updt question_answering.mdx as per Omar's comments
* [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial.
* [ run_scripts.mdx ] - Translated to Portuguese the run scripts tutorial.
* add: the contrastive search for generaton_utils
* add: testing scripts for contrastive search under examples/text-generation
* update the quality of codes
* revise the docstring; make the generation_contrastive_search.py scripts;
* revise the examples/pytorch/text-generation/run_generation_contrastive_search.py to the auto-APIs format
* revise the necessary documents
* fix: revise the docstring of generation_contrastive_search.py
* Fix the code indentation
* fix: revise the nits and examples in contrastive_search docstring.
* fix the copyright
* delete generation_contrastive_search.py
* revise the logic in contrastive_search
* update the intergration test and the docstring
* run the tests over
* add the slow decorate to the contrastive_search intergrate test
* add more test
* do the style, quality, consistency checks
* Adapt FE methods to transforms library
* Mixin for saving the image processor
* Base processor skeleton
* BatchFeature for packaging image processor outputs
* Initial image processor for GLPN
* REmove accidental import
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Add rescale back and remove ImageType
* fix import mistake
* Fix enum var reference
* Can transform and specify image data format
* Remove redundant function
* Update reference
* Data format flag for rescale
* Fix typo
* Fix dimension check
* Fixes to make IP and FE outputs match
* Add tests for transforms
* Add test for utils
* Update some docstrings
* Make sure in channels last before converting to PIL
* Remove default to numpy batching
* Fix up
* Add docstring and model_input_types
* Use feature processor config from hub
* Alias GLPN feature extractor to image processor
* Alias feature extractor mixin
* Add return_numpy=False flag for resize
* Fix up
* Fix up
* Use different frameworks safely
* Safely import PIL
* Call function checking if PIL available
* Only import if vision available
* Address Sylvain PR comments
Co-authored-by: Sylvain.gugger@gmail.com
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/image_transforms.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Update src/transformers/models/glpn/feature_extraction_glpn.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add in docstrings
* Fix TFSwinSelfAttention to have relative position index as non-trainable weight (#18226)
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Refactor `TFSwinLayer` to increase serving compatibility (#18352)
* Refactor `TFSwinLayer` to increase serving compatibility
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix missed parameters while refactoring
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix window_reverse to calculate batch size
Signed-off-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add TF prefix to TF-Res test class (#18481)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove py.typed (#18485)
* Fix pipeline tests (#18487)
* Fix pipeline tests
* Make sure all pipelines tests run with init changes
* Use new huggingface_hub tools for download models (#18438)
* Draft new cached_file
* Initial draft for config and model
* Small fixes
* Fix first batch of tests
* Look in cache when internet is down
* Fix last tests
* Bad black, not fixing all quality errors
* Make diff less
* Implement change for TF and Flax models
* Add tokenizer and feature extractor
* For compatibility with main
* Add utils to move the cache and auto-do it at first use.
* Quality
* Deal with empty commit shas
* Deal with empty etag
* Address review comments
* Fix `test_dbmdz_english` by updating expected values (#18482)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Move cache folder to huggingface/hub for consistency with hf_hub (#18492)
* Move cache folder to just huggingface
* Thank you VsCode for this needless import
* Move to hub
* Forgot one
* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` (#18484)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Forgot one new_ for cache migration
* disable Onnx test for google/long-t5-tglobal-base (#18454)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Typo reported by Joel Grus on TWTR (#18493)
* Just re-reading the whole doc every couple of months 😬 (#18489)
* Delete valohai.yaml
* NLP => ML
* typo
* website supports https
* datasets
* 60k + modalities
* unrelated link fixing for accelerate
* Ok those links were actually broken
* Fix link
* Make `AutoTokenizer` auto-link
* wording tweak
* add at least one non-nlp task
* `transformers-cli login` => `huggingface-cli login` (#18490)
* zero chance anyone's using that constant no?
* `transformers-cli login` => `huggingface-cli login`
* `transformers-cli repo create` => `huggingface-cli repo create`
* `make style`
* Add seed setting to image classification example (#18519)
* [DX fix] Fixing QA pipeline streaming a dataset. (#18516)
* [DX fix] Fixing QA pipeline streaming a dataset.
QuestionAnsweringArgumentHandler would iterate over the whole dataset
effectively killing all properties of the pipeline.
This restores nice properties when using `Dataset` or `Generator` since
those are meant to be consumed lazily.
* Handling TF better.
* Clean up hub (#18497)
* Clean up utils.hub
* Remove imports
* More fixes
* Last fix
* update fsdp docs (#18521)
* updating fsdp documentation
* typo fix
* Fix compatibility with 1.12 (#17925)
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* fix torch.onnx.symbolic_opset12 import
* Reject bad version
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove debug statement
* Specify en in doc-builder README example (#18526)
Co-authored-by: Ankur Goyal <ankur@impira.com>
* New cache fixes: add safeguard before looking in folders (#18522)
* unpin resampy (#18527)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* ✨ update to use interlibrary links instead of Markdown (#18500)
* Add example of multimodal usage to pipeline tutorial (#18498)
* 📝 add example of multimodal usage to pipeline tutorial
* 🖍 apply feedbacks
* 🖍 apply niels feedback
* [VideoMAE] Add model to doc tests (#18523)
* Add videomae to doc tests
* Add pip install decord
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update perf_train_gpu_one.mdx (#18532)
* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper (#18473)
* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script
* make fixup changes
* PR comments
* changed input to Acceletor based on PR comment, ran make fixup
* Added comment explaining the sync_gradients statement
* Fixed lr scheduler max steps
* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper
* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper
* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script
* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py
* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script
* Add Spanish translation of converting_tensorflow_models.mdx (#18512)
* Add file in spanish docs to be translated
* Finish translation to Spanish
* Improve Spanish wording
* Add suggested changes from review
* Spanish translation of summarization.mdx (#15947) (#18477)
* Add Spanish translation of summarization.mdx
* Apply suggestions from code review
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Let's not cast them all (#18471)
* add correct dtypes when checking for params dtype
* forward contrib credits
* Update src/transformers/modeling_utils.py
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* more comments
- added more comments on why we cast only floating point parameters
* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* fix: data2vec-vision Onnx ready-made configuration. (#18427)
* feat: add the data2vec conf that are missing https://huggingface.co/docs/transformers/serialization
* fix: wrong config
* Add mt5 onnx config (#18394)
* update features
* MT5OnnxConfig added with updated with tests and docs
* fix imports
* fix onnc_config_cls for mt5
Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>
* Minor update of `run_call_with_unpacked_inputs` (#18541)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* BART - Fix attention mask device issue on copied models (#18540)
* attempt to fix attn mask device
* fix bart `_prepare_decoder_attention_mask`
- add correct device
- run `make fix-copies` to propagate the fix
* Adding a new `align_to_words` param to qa pipeline. (#18010)
* Adding a new `align_to_words` param to qa pipeline.
* Update src/transformers/pipelines/question_answering.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Import protection.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* 📝 update metric with evaluate (#18535)
* Restore _init_weights value in no_init_weights (#18504)
* Recover _init_weights value in no_init_weights
For potential nested use.
In addition, users might modify private no_init_weights as well.
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove private variable change check
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up comment
* 📝 update documentation build section (#18548)
* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models (#17901)
* first commit
* correct replace function
* add final changes
- works like charm!
- cannot implement tests yet
- tested
* clean up a bit
* add bitsandbytes dependencies
* working version
- added import function
- added bitsandbytes utils file
* small fix
* small fix
- fix import issue
* fix import issues
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit
- move bitsandbytes utils to utils
- change comments on functions
* reformat docstring
- reformat docstring on init_empty_weights_8bit
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert bad formatting
* change to bitsandbytes
* refactor a bit
- remove init8bit since it is useless
* more refactoring
- fixed init empty weights issue
- added threshold param
* small hack to make it work
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
* revmoe the small hack
* modify utils file
* make style + refactor a bit
* create correctly device map
* add correct dtype for device map creation
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
- remove with torch.grad
- do not rely on Python bool magic!
* add docstring
- add docstring for new kwargs
* add docstring
- comment `replace_8bit_linear` function
- fix weird formatting
* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add
* few modifs
- typo doc
- force cast into float16 when load_in_8bit is enabled
* added colab link
* add test architecture + docstring a bit
* refactor a bit testing class
* make style + refactor a bit
* enhance checks
- add more checks
- start writing saving test
* clean up a bit
* male style
* add more details on doc
* add more tests
- still needs to fix 2 tests
* replace by "or"
- could not fix it from GitHub GUI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit testing code + add readme
* make style
* fix import issue
* Update src/transformers/modeling_utils.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* add few comments
* add more doctring + make style
* more docstring
* raise error when loaded in 8bit
* make style
* add warning if loaded on CPU
* add small sanity check
* fix small comment
* add bitsandbytes on dockerfile
* Improve documentation
- improve documentation from comments
* add few comments
* slow tests pass on the VM but not on the CI VM
* Fix merge conflict
* make style
* another test should pass on a multi gpu setup
* fix bad import in testing file
* Fix slow tests
- remove dummy batches
- no more CUDA illegal memory errors
* odify dockerfile
* Update docs/source/en/main_classes/model.mdx
* Update Dockerfile
* Update model.mdx
* Update Dockerfile
* Apply suggestions from code review
* few modifications
- lm head can stay on disk/cpu
- change model name so that test pass
* change test value
- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging
* modify installation guidelines
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* replace `n`by `name`
* merge `load_in_8bit` and `low_cpu_mem_usage`
* first try - keep the lm head in full precision
* better check
- check the attribute `base_model_prefix` instead of computing the number of parameters
* added more tests
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit
* improve documentation
- fix typos for installation
- change title in the documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* TF: XLA-trainable DeBERTa v2 (#18546)
* fix deberta issues
* add different code paths for gpu and tpu
* shorter gpu take along axis
* Stable Dropout without tf cond
* variable must be float
* Preserve hub-related kwargs in AutoModel.from_pretrained (#18545)
* Preserve hub-related kwargs in AutoModel.from_pretrained
* Fix tests
* Remove debug statement
* TF Examples Rewrite (#18451)
* Finished QA example
* Dodge a merge conflict
* Update text classification and LM examples
* Update NER example
* New Keras metrics WIP, fix NER example
* Update NER example
* Update MC, summarization and translation examples
* Add XLA warnings when shapes are variable
* Make sure batch_size is consistently scaled by num_replicas
* Add PushToHubCallback to all models
* Add docs links for KerasMetricCallback
* Add docs links for prepare_tf_dataset and jit_compile
* Correct inferred model names
* Don't assume the dataset has 'lang'
* Don't assume the dataset has 'lang'
* Write metrics in text classification
* Add 'framework' to TrainingArguments and TFTrainingArguments
* Export metrics in all examples and add tests
* Fix training args for Flax
* Update command line args for translation test
* make fixup
* Fix accidentally running other tests in fp16
* Remove do_train/do_eval from run_clm.py
* Remove do_train/do_eval from run_mlm.py
* Add tensorflow tests to circleci
* Fix circleci
* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix save path for tests
* Fix some model card kwargs
* Explain the magical -1000
* Actually enable tests this time
* Skip text classification PR until we fix shape inference
* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Use commit hash to look in cache instead of calling head (#18534)
* Use commit hash to look in cache instead of calling head
* Add tests
* Add attr for local configs too
* Stupid typos
* Fix tests
* Update src/transformers/utils/hub.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address Julien's comments
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* `pipeline` support for `device="mps"` (or any other string) (#18494)
* `pipeline` support for `device="mps"` (or any other string)
* Simplify `if` nesting
* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix? @sgugger
* passing `attr=None` is not the same as not passing `attr` 🤯
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update philosophy to include other preprocessing classes (#18550)
* 📝 update philosophy to include other preprocessing classes
* 🖍 apply feedbacks
* Properly move cache when it is not in default path (#18563)
* Adds CLIP to models exportable with ONNX (#18515)
* onnx config for clip
* default opset as 14
* changes from the original repo
* input values order fix
* outputs fix
* remove unused import
* ran make fix-copies
* black format
* review comments: forward ref, import fix, model change revert, .to cleanup
* make style
* formatting fixes
* revert groupvit
* comment for cast to int32
* comment fix
* make .T as .t() for onnx conversion
* ran make fix-copies
* remove unneeded comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix copies
* remove comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* raise atol for MT5OnnxConfig (#18560)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix string (#18568)
* Segformer TF: fix output size in documentation (#18572)
* Segformer TF: fix output size in doc
* Segformer pytorch: fix output size in doc
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
* Fix resizing bug in OWL-ViT (#18573)
* Fixes resizing bug in OWL-ViT
* Defaults to square resize if size is set to an int
* Sets do_center_crop default value to False
* Fix LayoutLMv3 documentation (#17932)
* fix typos
* fix sequence_length docs of LayoutLMv3Model
* delete trailing white spaces
* fix layoutlmv3 docs more
* apply make fixup & quality
* change to two versions of input docstring
* apply make fixup & quality
* Skip broken tests
* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training (#18486)
* changing BartLearnedPositionalEmbedding forward signature and references to it
* removing debugging dead code (thanks style checker)
* blackened modeling_bart file
* removing copy inconsistencies via make fix-copies
* changing references to copied signatures in Bart variants
* make fix-copies once more
* using expand over repeat (thanks @michaelbenayoun)
* expand instead of repeat for all model copies
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
* german docs translation (#18544)
* Create _config.py
* Create _toctree.yml
* Create index.mdx
not sure about "du / ihr" oder "sie"
* Create quicktour.mdx
* Update _toctree.yml
* Update build_documentation.yml
* Update build_pr_documentation.yml
* fix build
* Update index.mdx
* Update quicktour.mdx
* Create installation.mdx
* Update _toctree.yml
* Deberta V2: Fix critical trace warnings to allow ONNX export (#18272)
* Fix critical trace warnings to allow ONNX export
* Force input to `sqrt` to be float type
* Cleanup code
* Remove unused import statement
* Update model sew
* Small refactor
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Use broadcasting instead of repeat
* Implement suggestion
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Match deberta v2 changes in sew_d
* Improve code quality
* Update code quality
* Consistency of small refactor
* Match changes in sew_d
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* [FX] _generate_dummy_input supports audio-classification models for labels (#18580)
* Support audio classification architectures for labels generation, as well as provides a flag to print warnings or not
* Use ENV_VARS_TRUE_VALUES
* Fix docstrings with last version of hf-doc-builder styler (#18581)
* Fix docstrings with last version of hf-doc-builder styler
* Remove empty Parameter block
* Bump nbconvert from 6.0.1 to 6.3.0 in /examples/research_projects/lxmert (#18565)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump nbconvert in /examples/research_projects/visual_bert (#18566)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix owlvit tests, update docstring examples (#18586)
* Return the permuted hidden states if return_dict=True (#18578)
* Load sharded pt to flax (#18419)
* initial commit
* add small test
* add cross pt tf flag to test
* fix quality
* style
* update test with new repo
* fix failing test
* update
* fix wrong param ordering
* style
* update based on review
* update related to recent new caching mechanism
* quality
* Update based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* quality and style
* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add type hints for ViLT models (#18577)
* Add type hints for Vilt models
* Add missing return type for TokenClassification class
* update doc for perf_train_cpu_many, add intel mpi introduction (#18576)
* update doc for perf_train_cpu_many, add mpi introduction
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typos (#18594)
* FSDP bug fix for `load_state_dict` (#18596)
* Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` (#18600)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Generate: validate `model_kwargs` (and catch typos in generate arguments) (#18261)
* validate generate model_kwargs
* generate tests -- not all models have an attn mask
* Supporting seq2seq models for `bitsandbytes` integration (#18579)
* Supporting seq2seq models for `bitsandbytes` integration
- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check
* small modification
- tie the weights before looking at tied weights!
* Add Donut (#18488)
* First draft
* Improve script
* Update script
* Make conversion work
* Add final_layer_norm attribute to Swin's config
* Add DonutProcessor
* Convert more models
* Improve feature extractor and convert base models
* Fix bug
* Improve integration tests
* Improve integration tests and add model to README
* Add doc test
* Add feature extractor to docs
* Fix integration tests
* Remove register_buffer
* Fix toctree and add missing attribute
* Add DonutSwin
* Make conversion script work
* Improve conversion script
* Address comment
* Fix bug
* Fix another bug
* Remove deprecated method from docs
* Make Swin and Swinv2 untouched
* Fix code examples
* Fix processor
* Update model_type to donut-swin
* Add feature extractor tests, add token2json method, improve feature extractor
* Fix failing tests, remove integration test
* Add do_thumbnail for consistency
* Improve code examples
* Add code example for document parsing
* Add DonutSwin to MODEL_NAMES_MAPPING
* Add model to appropriate place in toctree
* Update namespace to appropriate organization
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Fix URLs (#18604)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update BLOOM parameter counts (#18531)
* Update BLOOM parameter counts
* Update BLOOM parameter counts
* [doc] fix anchors (#18591)
the manual anchors end up being duplicated with automatically added anchors and no longer work.
* [fsmt] deal with -100 indices in decoder ids (#18592)
* [fsmt] deal with -100 indices in decoder ids
Fixes: https://github.com/huggingface/transformers/issues/17945
decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index.
For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.
Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.
* style
* small change (#18584)
* Flax Remat for LongT5 (#17994)
* [Flax] Add remat (gradient checkpointing)
* fix variable naming in test
* flip: checkpoint using a method
* fix naming
* fix class naming
* apply PVP's suggestions from code review
* add gradient_checkpointing to examples
* Add gradient_checkpointing to run_mlm_flax
* Add remat to longt5
* Add gradient checkpointing test longt5
* Fix args errors
* Fix remaining tests
* Make fixup & quality fixes
* replace kwargs
* remove unecessary kwargs
* Make fixup changes
* revert long_t5_flax changes
* Remove return_dict and copy to LongT5
* Remove test_gradient_checkpointing
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
* mac m1 `mps` integration (#18598)
* mac m1 `mps` integration
* Update docs/source/en/main_classes/trainer.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* addressing comments
* Apply suggestions from code review
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* resolve comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* Change scheduled CIs to use torch 1.12.1 (#18644)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add checks for some workflow jobs (#18583)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* TF: Fix generation repetition penalty with XLA (#18648)
* Update longt5.mdx (#18634)
* Update run_translation_no_trainer.py (#18637)
* Update run_translation_no_trainer.py
found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint
* fixs `no_decay` and `resume_step` issue
1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
* [bnb] Minor modifications (#18631)
* bnb minor modifications
- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* put in one block
- put bash instructions in one block
* update readme
- refactor a bit hardware requirements
* change text a bit
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add link to paper
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update tests/mixed_int8/README.md
* Apply suggestions from code review
* refactor a bit
* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add A6000
* clarify a bit
* remove small part
* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Examples: add Bloom support for token classification (#18632)
* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)
* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)
* Fix Yolos ONNX export test (#18606)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fixup
* Fix up
* Move PIL default arguments inside function for safe imports
* Add image utils to toctree
* Update `rescale` method to reflect changes in #18677
* Update docs/source/en/internal/image_processing_utils.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Address Niels PR comments
* Add normalize method to transforms library
* Apply suggestions from code review - remove defaults to None
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix docstrings and revert to PIL.Image.XXX resampling
Use PIL.Image.XXX resampling values instead of PIL.Image.Resampling.XXX enum as it's only in the recent version >= 9.10 and version is not yet pinned and older version support deprecated
* Some more docstrings and PIL.Image tidy up
* Reorganise arguments so flags by modifiers
* Few last docstring fixes
* Add normalize to docs
* Accept PIL.Image inputs with deprecation warning
* Update src/transformers/image_transforms.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update warning to include version
* Trigger CI - hash clash on doc build
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Ankur Goyal <ankrgyl@gmail.com>
Co-authored-by: Ankur Goyal <ankur@impira.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Mishig Davaadorj <dmishig@gmail.com>
Co-authored-by: Rasmus Arpe Fogh Jensen <Rasmus.arpe@gmail.com>
Co-authored-by: Ian Castillo <7807897+donelianc@users.noreply.github.com>
Co-authored-by: AguilaCudicio <aguila.cudicio@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Niklas Hansson <niklas.sven.hansson@gmail.com>
Co-authored-by: Thomas Chaigneau <t.chaigneau.tc@gmail.com>
Co-authored-by: YouJiacheng <1503679330@qq.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Dhruv Karan <k4r4n.dhruv@gmail.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Maxime G <joihn@users.noreply.github.com>
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
Co-authored-by: Wonseok Lee (Jack) <rollerkid02@snu.ac.kr>
Co-authored-by: Dan Jones <dan.j.jones2@gmail.com>
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
Co-authored-by: flozi00 <flozi00.fz@gmail.com>
Co-authored-by: iiLaurens <iiLaurens@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
Co-authored-by: zhoutang776 <47708118+zhoutang776@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Partial TF port for ESM model
* Add ESM-TF tests
* Add the various imports for TF-ESM
* TF weight conversion almost ready
* Stop ignoring the decoder weights in PT
* Add tests and lots of fixes
* fix-copies
* Fix imports, add model docs
* Add get_vocab() to tokenizer
* Fix vocab links for pretrained files
* Allow multiple inputs with a sep
* Use EOS as SEP token because ESM vocab lacks SEP
* Correctly return special tokens mask from ESM tokenizer
* make fixup
* Stop testing unsupported embedding resizing
* Handle TF bias correctly
* Skip all models with slow tokenizers in the token classification test
* Fixing the batch/unbatcher of pipelines to accomodate the `None` being
passed around.
* Fixing pipeline bug caused by slow tokenizer being different.
* Update src/transformers/models/esm/modeling_tf_esm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/esm/modeling_tf_esm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/esm/modeling_tf_esm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update set_input_embeddings and the copyright notices
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Adapt FE methods to transforms library
* Mixin for saving the image processor
* Base processor skeleton
* BatchFeature for packaging image processor outputs
* Initial image processor for GLPN
* REmove accidental import
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Fixup and docs
* BatchFeature for packaging image processor outputs
* Import BatchFeature from feature_extraction_utils
* Fixup and docs
* Mixin for saving the image processor
* Fixup and docs
* Add rescale back and remove ImageType
* fix import mistake
* Fix enum var reference
* Can transform and specify image data format
* Remove redundant function
* Update reference
* Data format flag for rescale
* Fix typo
* Fix dimension check
* Fixes to make IP and FE outputs match
* Add tests for transforms
* Add test for utils
* Update some docstrings
* Make sure in channels last before converting to PIL
* Remove default to numpy batching
* Fix up
* Add docstring and model_input_types
* Use feature processor config from hub
* Alias GLPN feature extractor to image processor
* Alias feature extractor mixin
* Add return_numpy=False flag for resize
* Fix up
* Fix up
* Use different frameworks safely
* Safely import PIL
* Call function checking if PIL available
* Only import if vision available
* Address Sylvain PR comments
Co-authored-by: Sylvain.gugger@gmail.com
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/image_transforms.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Update src/transformers/models/glpn/feature_extraction_glpn.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add in docstrings
* Fix TFSwinSelfAttention to have relative position index as non-trainable weight (#18226)
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Refactor `TFSwinLayer` to increase serving compatibility (#18352)
* Refactor `TFSwinLayer` to increase serving compatibility
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix missed parameters while refactoring
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
* Fix window_reverse to calculate batch size
Signed-off-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add TF prefix to TF-Res test class (#18481)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove py.typed (#18485)
* Fix pipeline tests (#18487)
* Fix pipeline tests
* Make sure all pipelines tests run with init changes
* Use new huggingface_hub tools for download models (#18438)
* Draft new cached_file
* Initial draft for config and model
* Small fixes
* Fix first batch of tests
* Look in cache when internet is down
* Fix last tests
* Bad black, not fixing all quality errors
* Make diff less
* Implement change for TF and Flax models
* Add tokenizer and feature extractor
* For compatibility with main
* Add utils to move the cache and auto-do it at first use.
* Quality
* Deal with empty commit shas
* Deal with empty etag
* Address review comments
* Fix `test_dbmdz_english` by updating expected values (#18482)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Move cache folder to huggingface/hub for consistency with hf_hub (#18492)
* Move cache folder to just huggingface
* Thank you VsCode for this needless import
* Move to hub
* Forgot one
* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` (#18484)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Forgot one new_ for cache migration
* disable Onnx test for google/long-t5-tglobal-base (#18454)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Typo reported by Joel Grus on TWTR (#18493)
* Just re-reading the whole doc every couple of months 😬 (#18489)
* Delete valohai.yaml
* NLP => ML
* typo
* website supports https
* datasets
* 60k + modalities
* unrelated link fixing for accelerate
* Ok those links were actually broken
* Fix link
* Make `AutoTokenizer` auto-link
* wording tweak
* add at least one non-nlp task
* `transformers-cli login` => `huggingface-cli login` (#18490)
* zero chance anyone's using that constant no?
* `transformers-cli login` => `huggingface-cli login`
* `transformers-cli repo create` => `huggingface-cli repo create`
* `make style`
* Add seed setting to image classification example (#18519)
* [DX fix] Fixing QA pipeline streaming a dataset. (#18516)
* [DX fix] Fixing QA pipeline streaming a dataset.
QuestionAnsweringArgumentHandler would iterate over the whole dataset
effectively killing all properties of the pipeline.
This restores nice properties when using `Dataset` or `Generator` since
those are meant to be consumed lazily.
* Handling TF better.
* Clean up hub (#18497)
* Clean up utils.hub
* Remove imports
* More fixes
* Last fix
* update fsdp docs (#18521)
* updating fsdp documentation
* typo fix
* Fix compatibility with 1.12 (#17925)
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* Fix compatibility with 1.12
* Remove pin from examples requirements
* Update torch scatter version
* fix torch.onnx.symbolic_opset12 import
* Reject bad version
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove debug statement
* Specify en in doc-builder README example (#18526)
Co-authored-by: Ankur Goyal <ankur@impira.com>
* New cache fixes: add safeguard before looking in folders (#18522)
* unpin resampy (#18527)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* ✨ update to use interlibrary links instead of Markdown (#18500)
* Add example of multimodal usage to pipeline tutorial (#18498)
* 📝 add example of multimodal usage to pipeline tutorial
* 🖍 apply feedbacks
* 🖍 apply niels feedback
* [VideoMAE] Add model to doc tests (#18523)
* Add videomae to doc tests
* Add pip install decord
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update perf_train_gpu_one.mdx (#18532)
* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper (#18473)
* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script
* make fixup changes
* PR comments
* changed input to Acceletor based on PR comment, ran make fixup
* Added comment explaining the sync_gradients statement
* Fixed lr scheduler max steps
* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper
* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper
* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script
* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py
* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script
* Add Spanish translation of converting_tensorflow_models.mdx (#18512)
* Add file in spanish docs to be translated
* Finish translation to Spanish
* Improve Spanish wording
* Add suggested changes from review
* Spanish translation of summarization.mdx (#15947) (#18477)
* Add Spanish translation of summarization.mdx
* Apply suggestions from code review
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Let's not cast them all (#18471)
* add correct dtypes when checking for params dtype
* forward contrib credits
* Update src/transformers/modeling_utils.py
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* more comments
- added more comments on why we cast only floating point parameters
* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* fix: data2vec-vision Onnx ready-made configuration. (#18427)
* feat: add the data2vec conf that are missing https://huggingface.co/docs/transformers/serialization
* fix: wrong config
* Add mt5 onnx config (#18394)
* update features
* MT5OnnxConfig added with updated with tests and docs
* fix imports
* fix onnc_config_cls for mt5
Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>
* Minor update of `run_call_with_unpacked_inputs` (#18541)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* BART - Fix attention mask device issue on copied models (#18540)
* attempt to fix attn mask device
* fix bart `_prepare_decoder_attention_mask`
- add correct device
- run `make fix-copies` to propagate the fix
* Adding a new `align_to_words` param to qa pipeline. (#18010)
* Adding a new `align_to_words` param to qa pipeline.
* Update src/transformers/pipelines/question_answering.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Import protection.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* 📝 update metric with evaluate (#18535)
* Restore _init_weights value in no_init_weights (#18504)
* Recover _init_weights value in no_init_weights
For potential nested use.
In addition, users might modify private no_init_weights as well.
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove private variable change check
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up comment
* 📝 update documentation build section (#18548)
* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models (#17901)
* first commit
* correct replace function
* add final changes
- works like charm!
- cannot implement tests yet
- tested
* clean up a bit
* add bitsandbytes dependencies
* working version
- added import function
- added bitsandbytes utils file
* small fix
* small fix
- fix import issue
* fix import issues
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit
- move bitsandbytes utils to utils
- change comments on functions
* reformat docstring
- reformat docstring on init_empty_weights_8bit
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert bad formatting
* change to bitsandbytes
* refactor a bit
- remove init8bit since it is useless
* more refactoring
- fixed init empty weights issue
- added threshold param
* small hack to make it work
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
* revmoe the small hack
* modify utils file
* make style + refactor a bit
* create correctly device map
* add correct dtype for device map creation
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
- remove with torch.grad
- do not rely on Python bool magic!
* add docstring
- add docstring for new kwargs
* add docstring
- comment `replace_8bit_linear` function
- fix weird formatting
* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add
* few modifs
- typo doc
- force cast into float16 when load_in_8bit is enabled
* added colab link
* add test architecture + docstring a bit
* refactor a bit testing class
* make style + refactor a bit
* enhance checks
- add more checks
- start writing saving test
* clean up a bit
* male style
* add more details on doc
* add more tests
- still needs to fix 2 tests
* replace by "or"
- could not fix it from GitHub GUI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit testing code + add readme
* make style
* fix import issue
* Update src/transformers/modeling_utils.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* add few comments
* add more doctring + make style
* more docstring
* raise error when loaded in 8bit
* make style
* add warning if loaded on CPU
* add small sanity check
* fix small comment
* add bitsandbytes on dockerfile
* Improve documentation
- improve documentation from comments
* add few comments
* slow tests pass on the VM but not on the CI VM
* Fix merge conflict
* make style
* another test should pass on a multi gpu setup
* fix bad import in testing file
* Fix slow tests
- remove dummy batches
- no more CUDA illegal memory errors
* odify dockerfile
* Update docs/source/en/main_classes/model.mdx
* Update Dockerfile
* Update model.mdx
* Update Dockerfile
* Apply suggestions from code review
* few modifications
- lm head can stay on disk/cpu
- change model name so that test pass
* change test value
- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging
* modify installation guidelines
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* replace `n`by `name`
* merge `load_in_8bit` and `low_cpu_mem_usage`
* first try - keep the lm head in full precision
* better check
- check the attribute `base_model_prefix` instead of computing the number of parameters
* added more tests
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit
* improve documentation
- fix typos for installation
- change title in the documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* TF: XLA-trainable DeBERTa v2 (#18546)
* fix deberta issues
* add different code paths for gpu and tpu
* shorter gpu take along axis
* Stable Dropout without tf cond
* variable must be float
* Preserve hub-related kwargs in AutoModel.from_pretrained (#18545)
* Preserve hub-related kwargs in AutoModel.from_pretrained
* Fix tests
* Remove debug statement
* TF Examples Rewrite (#18451)
* Finished QA example
* Dodge a merge conflict
* Update text classification and LM examples
* Update NER example
* New Keras metrics WIP, fix NER example
* Update NER example
* Update MC, summarization and translation examples
* Add XLA warnings when shapes are variable
* Make sure batch_size is consistently scaled by num_replicas
* Add PushToHubCallback to all models
* Add docs links for KerasMetricCallback
* Add docs links for prepare_tf_dataset and jit_compile
* Correct inferred model names
* Don't assume the dataset has 'lang'
* Don't assume the dataset has 'lang'
* Write metrics in text classification
* Add 'framework' to TrainingArguments and TFTrainingArguments
* Export metrics in all examples and add tests
* Fix training args for Flax
* Update command line args for translation test
* make fixup
* Fix accidentally running other tests in fp16
* Remove do_train/do_eval from run_clm.py
* Remove do_train/do_eval from run_mlm.py
* Add tensorflow tests to circleci
* Fix circleci
* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix save path for tests
* Fix some model card kwargs
* Explain the magical -1000
* Actually enable tests this time
* Skip text classification PR until we fix shape inference
* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Use commit hash to look in cache instead of calling head (#18534)
* Use commit hash to look in cache instead of calling head
* Add tests
* Add attr for local configs too
* Stupid typos
* Fix tests
* Update src/transformers/utils/hub.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address Julien's comments
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* `pipeline` support for `device="mps"` (or any other string) (#18494)
* `pipeline` support for `device="mps"` (or any other string)
* Simplify `if` nesting
* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix? @sgugger
* passing `attr=None` is not the same as not passing `attr` 🤯
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update philosophy to include other preprocessing classes (#18550)
* 📝 update philosophy to include other preprocessing classes
* 🖍 apply feedbacks
* Properly move cache when it is not in default path (#18563)
* Adds CLIP to models exportable with ONNX (#18515)
* onnx config for clip
* default opset as 14
* changes from the original repo
* input values order fix
* outputs fix
* remove unused import
* ran make fix-copies
* black format
* review comments: forward ref, import fix, model change revert, .to cleanup
* make style
* formatting fixes
* revert groupvit
* comment for cast to int32
* comment fix
* make .T as .t() for onnx conversion
* ran make fix-copies
* remove unneeded comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix copies
* remove comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* raise atol for MT5OnnxConfig (#18560)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix string (#18568)
* Segformer TF: fix output size in documentation (#18572)
* Segformer TF: fix output size in doc
* Segformer pytorch: fix output size in doc
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
* Fix resizing bug in OWL-ViT (#18573)
* Fixes resizing bug in OWL-ViT
* Defaults to square resize if size is set to an int
* Sets do_center_crop default value to False
* Fix LayoutLMv3 documentation (#17932)
* fix typos
* fix sequence_length docs of LayoutLMv3Model
* delete trailing white spaces
* fix layoutlmv3 docs more
* apply make fixup & quality
* change to two versions of input docstring
* apply make fixup & quality
* Skip broken tests
* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training (#18486)
* changing BartLearnedPositionalEmbedding forward signature and references to it
* removing debugging dead code (thanks style checker)
* blackened modeling_bart file
* removing copy inconsistencies via make fix-copies
* changing references to copied signatures in Bart variants
* make fix-copies once more
* using expand over repeat (thanks @michaelbenayoun)
* expand instead of repeat for all model copies
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
* german docs translation (#18544)
* Create _config.py
* Create _toctree.yml
* Create index.mdx
not sure about "du / ihr" oder "sie"
* Create quicktour.mdx
* Update _toctree.yml
* Update build_documentation.yml
* Update build_pr_documentation.yml
* fix build
* Update index.mdx
* Update quicktour.mdx
* Create installation.mdx
* Update _toctree.yml
* Deberta V2: Fix critical trace warnings to allow ONNX export (#18272)
* Fix critical trace warnings to allow ONNX export
* Force input to `sqrt` to be float type
* Cleanup code
* Remove unused import statement
* Update model sew
* Small refactor
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Use broadcasting instead of repeat
* Implement suggestion
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Match deberta v2 changes in sew_d
* Improve code quality
* Update code quality
* Consistency of small refactor
* Match changes in sew_d
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* [FX] _generate_dummy_input supports audio-classification models for labels (#18580)
* Support audio classification architectures for labels generation, as well as provides a flag to print warnings or not
* Use ENV_VARS_TRUE_VALUES
* Fix docstrings with last version of hf-doc-builder styler (#18581)
* Fix docstrings with last version of hf-doc-builder styler
* Remove empty Parameter block
* Bump nbconvert from 6.0.1 to 6.3.0 in /examples/research_projects/lxmert (#18565)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump nbconvert in /examples/research_projects/visual_bert (#18566)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)
---
updated-dependencies:
- dependency-name: nbconvert
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix owlvit tests, update docstring examples (#18586)
* Return the permuted hidden states if return_dict=True (#18578)
* Load sharded pt to flax (#18419)
* initial commit
* add small test
* add cross pt tf flag to test
* fix quality
* style
* update test with new repo
* fix failing test
* update
* fix wrong param ordering
* style
* update based on review
* update related to recent new caching mechanism
* quality
* Update based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* quality and style
* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add type hints for ViLT models (#18577)
* Add type hints for Vilt models
* Add missing return type for TokenClassification class
* update doc for perf_train_cpu_many, add intel mpi introduction (#18576)
* update doc for perf_train_cpu_many, add mpi introduction
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/perf_train_cpu_many.mdx
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typos (#18594)
* FSDP bug fix for `load_state_dict` (#18596)
* Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` (#18600)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Generate: validate `model_kwargs` (and catch typos in generate arguments) (#18261)
* validate generate model_kwargs
* generate tests -- not all models have an attn mask
* Supporting seq2seq models for `bitsandbytes` integration (#18579)
* Supporting seq2seq models for `bitsandbytes` integration
- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check
* small modification
- tie the weights before looking at tied weights!
* Add Donut (#18488)
* First draft
* Improve script
* Update script
* Make conversion work
* Add final_layer_norm attribute to Swin's config
* Add DonutProcessor
* Convert more models
* Improve feature extractor and convert base models
* Fix bug
* Improve integration tests
* Improve integration tests and add model to README
* Add doc test
* Add feature extractor to docs
* Fix integration tests
* Remove register_buffer
* Fix toctree and add missing attribute
* Add DonutSwin
* Make conversion script work
* Improve conversion script
* Address comment
* Fix bug
* Fix another bug
* Remove deprecated method from docs
* Make Swin and Swinv2 untouched
* Fix code examples
* Fix processor
* Update model_type to donut-swin
* Add feature extractor tests, add token2json method, improve feature extractor
* Fix failing tests, remove integration test
* Add do_thumbnail for consistency
* Improve code examples
* Add code example for document parsing
* Add DonutSwin to MODEL_NAMES_MAPPING
* Add model to appropriate place in toctree
* Update namespace to appropriate organization
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Fix URLs (#18604)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update BLOOM parameter counts (#18531)
* Update BLOOM parameter counts
* Update BLOOM parameter counts
* [doc] fix anchors (#18591)
the manual anchors end up being duplicated with automatically added anchors and no longer work.
* [fsmt] deal with -100 indices in decoder ids (#18592)
* [fsmt] deal with -100 indices in decoder ids
Fixes: https://github.com/huggingface/transformers/issues/17945
decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index.
For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.
Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.
* style
* small change (#18584)
* Flax Remat for LongT5 (#17994)
* [Flax] Add remat (gradient checkpointing)
* fix variable naming in test
* flip: checkpoint using a method
* fix naming
* fix class naming
* apply PVP's suggestions from code review
* add gradient_checkpointing to examples
* Add gradient_checkpointing to run_mlm_flax
* Add remat to longt5
* Add gradient checkpointing test longt5
* Fix args errors
* Fix remaining tests
* Make fixup & quality fixes
* replace kwargs
* remove unecessary kwargs
* Make fixup changes
* revert long_t5_flax changes
* Remove return_dict and copy to LongT5
* Remove test_gradient_checkpointing
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
* mac m1 `mps` integration (#18598)
* mac m1 `mps` integration
* Update docs/source/en/main_classes/trainer.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* addressing comments
* Apply suggestions from code review
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* resolve comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
* Change scheduled CIs to use torch 1.12.1 (#18644)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add checks for some workflow jobs (#18583)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* TF: Fix generation repetition penalty with XLA (#18648)
* Update longt5.mdx (#18634)
* Update run_translation_no_trainer.py (#18637)
* Update run_translation_no_trainer.py
found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint
* fixs `no_decay` and `resume_step` issue
1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
* [bnb] Minor modifications (#18631)
* bnb minor modifications
- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* put in one block
- put bash instructions in one block
* update readme
- refactor a bit hardware requirements
* change text a bit
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add link to paper
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update tests/mixed_int8/README.md
* Apply suggestions from code review
* refactor a bit
* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add A6000
* clarify a bit
* remove small part
* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Examples: add Bloom support for token classification (#18632)
* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)
* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)
* Fix Yolos ONNX export test (#18606)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fixup
* Fix up
* Move PIL default arguments inside function for safe imports
* Add image utils to toctree
* Update `rescale` method to reflect changes in #18677
* Update docs/source/en/internal/image_processing_utils.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Address Niels PR comments
* Apply suggestions from code review - remove defaults to None
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix docstrings and revert to PIL.Image.XXX resampling
Use PIL.Image.XXX resampling values instead of PIL.Image.Resampling.XXX enum as it's only in the recent version >= 9.10 and version is not yet pinned and older version support deprecated
* Some more docstrings and PIL.Image tidy up
* Reorganise arguments so flags by modifiers
* Few last docstring fixes
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Ankur Goyal <ankrgyl@gmail.com>
Co-authored-by: Ankur Goyal <ankur@impira.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Mishig Davaadorj <dmishig@gmail.com>
Co-authored-by: Rasmus Arpe Fogh Jensen <Rasmus.arpe@gmail.com>
Co-authored-by: Ian Castillo <7807897+donelianc@users.noreply.github.com>
Co-authored-by: AguilaCudicio <aguila.cudicio@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Niklas Hansson <niklas.sven.hansson@gmail.com>
Co-authored-by: Thomas Chaigneau <t.chaigneau.tc@gmail.com>
Co-authored-by: YouJiacheng <1503679330@qq.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Dhruv Karan <k4r4n.dhruv@gmail.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Maxime G <joihn@users.noreply.github.com>
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
Co-authored-by: Wonseok Lee (Jack) <rollerkid02@snu.ac.kr>
Co-authored-by: Dan Jones <dan.j.jones2@gmail.com>
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
Co-authored-by: flozi00 <flozi00.fz@gmail.com>
Co-authored-by: iiLaurens <iiLaurens@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
Co-authored-by: zhoutang776 <47708118+zhoutang776@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Add initial files for depth estimation pipelines
* Add test file for depth estimation pipeline
* Update model mapping names
* Add updates for depth estimation output
* Add generic test
* Hopefully fixing the tests.
* Check if test passes
* Add make fixup and make fix-copies changes after rebase with main
* Rebase with main
* Fixing up depth pipeline.
* This is not used anymore.
* Fixing the test. `Image` is a module `Image.Image` is the type.
* Update docs/source/en/main_classes/pipelines.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft
* Fix more things
* Improve more things
* Remove some head models
* Fix more things
* Add missing layers
* Remove tokenizer
* Fix more things
* Fix copied from statements
* Make all tests pass
* Remove print statements
* Remove files
* Fix README and docs
* Add integration test and fix organization
* Add tips
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Make tests faster, improve docs
* Fix doc tests
* Add model to toctree
* Add docs
* Add note about creating new checkpoint
* Remove is_decoder
* Make tests smaller, add docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* implemented TFCvtModel and TFCvtForImageClassification and modified relevant files, added an exception in convert_tf_weight_name_to_pt_weight_name, added quick testing file to compare with pytorch model
* added docstring + testing file in transformers testing suite
* added test in testing file, modified docs to pass repo-consistency, passed formatting test
* refactoring + passing all test
* small refacto, removing unwanted comments
* improved testing config
* corrected import error
* modified acces to pretrained model archive list, to pass tf_test
* corrected import structure in init files
* modified testing for keras_fit with cpu
* correcting PR issues + Refactoring
* Refactoring : improving readability and reducing the number of permutations
* corrected momentum value + cls_token initialization
* removed from_pt as weights were added to the hub
* Update tests/models/cvt/test_modeling_tf_cvt.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Add `OPTForQuestionAnswering`
- added `OPTForQuestionAnswering` class based on `BloomForQuestionAnswering`
- added `OPTForQuestionAnswering` in common tests
- all common tests pass
- make fixup done
* added docstrings for OPTForQuestionAnswering
* Fix docstrings for OPTForQuestionAnswering
* Add ZeroShotObjectDetectionPipeline (#18445)
* Add AutoModelForZeroShotObjectDetection task
This commit also adds the following
- Add explicit _processor method for ZeroShotObjectDetectionPipeline.
This is necessary as pipelines don't auto infer processors yet and
`OwlVitProcessor` wraps tokenizer and feature_extractor together, to
process multiple images at once
- Add auto tests and other tests for ZeroShotObjectDetectionPipeline
* Add AutoModelForZeroShotObjectDetection task
This commit also adds the following
- Add explicit _processor method for ZeroShotObjectDetectionPipeline.
This is necessary as pipelines don't auto infer processors yet and
`OwlVitProcessor` wraps tokenizer and feature_extractor together, to
process multiple images at once
- Add auto tests and other tests for ZeroShotObjectDetectionPipeline
* Add batching for ZeroShotObjectDetectionPipeline
* Fix doc-string ZeroShotObjectDetectionPipeline
* Fix output format: ZeroShotObjectDetectionPipeline
- Improves MaskFormer docs, corrects minor typos
- Restructures MaskFormerFeatureExtractor.post_process_panoptic_segmentation for better readability, adds target_sizes argument for optional resizing
- Adds post_process_semantic_segmentation and post_process_instance_segmentation methods.
- Adds a deprecation warning to post_process_segmentation method in favour of post_process_instance_segmentation
* add bloom for question answering
- attempt to add Bloom for question answering
- adapted from `GPTJForQuestionAnswering`
- Fixed `num_labels` to `2` for common tests
- Added a bit of docstring
- All common tests pass
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert changes related to `num_labels`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Rebase ESM PR and update all file formats
* Fix test relative imports
* Add __init__.py to the test dir
* Disable gradient checkpointing
* Remove references to TFESM... FOR NOW >:|
* Remove completed TODOs from tests
* Convert docstrings to mdx, fix-copies from BERT
* fix-copies for the README and index
* Update ESM's __init__.py to the modern format
* Add to _toctree.yml
* Ensure we correctly copy the pad_token_id from the original ESM model
* Ensure we correctly copy the pad_token_id from the original ESM model
* Tiny grammar nitpicks
* Make the layer norm after embeddings an optional flag
* Make the layer norm after embeddings an optional flag
* Update the conversion script to handle other model classes
* Remove token_type_ids entirely, fix attention_masking and add checks to convert_esm.py
* Break the copied from link from BertModel.forward to remove token_type_ids
* Remove debug array saves
* Begin ESM-2 porting
* Add a hacky workaround for the precision issue in original repo
* Code cleanup
* Remove unused checkpoint conversion code
* Remove unused checkpoint conversion code
* Fix copyright notices
* Get rid of all references to the TF weights conversion
* Remove token_type_ids from the tests
* Fix test code
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add credit
* Remove _ args and __ kwargs in rotary embedding
* Assertively remove asserts
* Replace einsum with torch.outer()
* Fix docstring formatting
* Remove assertions in tokenization
* Add paper citation to ESMModel docstring
* Move vocab list to single line
* Remove ESMLayer from init
* Add Facebook copyrights
* Clean up RotaryEmbedding docstring
* Fix docstring formatting
* Fix docstring for config object
* Add explanation for new config methods
* make fix-copies
* Rename all the ESM- classes to Esm-
* Update conversion script to allow pushing to hub
* Update tests to point at my repo for now
* Set config properly for tests
* Remove the gross hack that forced loss of precision in inv_freq and instead copy the data from the model being converted
* make fixup
* Update expected values for slow tests
* make fixup
* Remove EsmForCausalLM for now
* Remove EsmForCausalLM for now
* Fix padding idx test
* Updated README and docs with ESM-1b and ESM-2 separately (#19221)
* Updated README and docs with ESM-1b and ESM-2 separately
* Update READMEs, longer entry with 3 citations
* make fix-copies
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Tom Sercu <tsercu@fb.com>
Co-authored-by: Your Name <you@example.com>
* chore: initial commit
* chore: adding util methods
yet to work on the nn.functional.interpolate port with align_corener=True
* chore: refactor the utils
* used tf.compat.v1.image.resize to align the F.interpolate function
* added type hints to the method signatures
* added references to the gists where one 2 one alignment of torch and tf has been shown
* chore: adding the layers
* chore: porting all the layers from torch to tf
This is the initial draft, nothing is tested yet.
* chore: aligning the layers with reference to tf clip
* chore: aligning the modules
* added demaraction comments
* added copied and adapted from comments
* chore: aligning with CLIP
* chore: wrangling the layers to keep it tf compatible
* chore: aligning the names of the layers for porting
* chore: style changes
* chore: adding docs and inits
* chore: adding tfp dependencis
the code is taken from TAPAS
* chore: initial commit for testing
* chore: aligning the vision embeddings with the vit implementatino
* chore: changing model prefix
* chore: fixing the name of the model and the layer normalization test case
* chore: every test passes but the slow ones
* chore: fix style and integration test
* chore: moving comments below decorators
* chore: make fixup and fix-copies changes
* chore: adding the Vision and Text Model to check_repo
* chore: modifying the prefix name to align it with the torch implementation
* chore: fix typo in configuration
* choer: changing the name of the model variable
* chore: adding segmentation flag
* chore: gante's review
* chore: style refactor
* chore: amy review
* chore: adding shape_list to parts that have been copied from other snippets
* chore: init batchnorm with torch defaults
* chore: adding shape_list to pass the tests
* test fix: adding seed as 0
* set seed
* chore: changing the straight through trick to fix -ve dimensinos
* chore: adding a dimension to the loss
* chore: adding reviewers and contributors names to the docs
* chore: added changes after review
* chore: code quality fixup
* chore: fixing the segmentation snippet
* chore: adding to the layer calls
* chore: changing int32 to int64 for inputs of serving
* chore: review changes
* chore: style changes
* chore: remove from_pt=True
* fix: repo consistency
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add DeformableDetrFeatureExtractor
* Fix post_process
* Fix name
* Add tests for feature extractor
* Fix doc tests
* Fix name
* Address comments
* Apply same fix to DETR and YOLOS as well
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Add tips
* Add BEiT figure
* Fix URL
* Move tip to start
* Add tip to TF model as well
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* add gpt-neox-japanese model and tokenizer as new model
* Correction to PR's comment for GPT NeoX Japanese
- Fix to be able to use gpu
- Add comment # Copied... at the top of RotaryEmbedding
- Implement nn.Linear instead of original linear class
- Add generation test under @slow
* fix bias treatment for gpt-neox-japanese
* Modidy gpt-neox-japanese following PR
- add doc for bias_dropout_add
- style change following a PR comment
* add document for gpt-neox-japanese
* remove unused import from gpt-neox-japanese
* fix README for gpt-neox-japanese
* First draft
* More improvements
* Improve model, add custom CUDA code
* Import torch before
* Add script that imports custom layer
* Add everything in new ops directory
* Import custom layer in modeling file
* Fix ARCHIVE_MAP typo
* Creating the custom kernel on the fly.
* Import custom layer in modeling file
* More improvements
* Fix CUDA loading
* More improvements
* Improve conversion script
* Improve conversion script
* Make it work until encoder_outputs
* Make forward pass work
* More improvements
* Make logits match original implementation
* Make implementation also support single_scale model
* Add support for single_scale and dilation checkpoint
* Add support for with_box_refine model
* Support also two stage model
* Improve tests
* Fix more tests
* Make more tests pass
* Upload all models to the hub
* Clean up some code
* Improve decoder outputs
* Rename intermediate hidden states and reference points
* Improve model outputs
* Move tests to dedicated folder
* Improve model outputs
* Fix retain_grad test
* Improve docs
* Clean up and make test_initialization pass
* Improve variable names
* Add copied from statements
* Improve docs
* Fix style
* Improve docs
* Improve docs, move tests to model folder
* Fix rebase
* Remove DetrForSegmentation from auto mapping
* Apply suggestions from code review
* Improve variable names and docstrings
* Apply some more suggestions from code review
* Apply suggestion from code review
* better docs and variables names
* hint to num_queries and two_stage confusion
* remove asserts and code refactor
* add exception if two_stage is True and with_box_refine is False
* use f-strings
* Improve docs and variable names
* Fix code quality
* Fix rebase
* Add require_torch_gpu decorator
* Add pip install ninja to CI jobs
* Apply suggestion of @sgugger
* Remove DeformableDetrForObjectDetection from auto mapping
* Remove DeformableDetrModel from auto mapping
* Add model to toctree
* Add model back to mappings, skip model in pipeline tests
* Apply @sgugger's suggestion
* Fix imports in the init
* Fix copies
* Add CPU implementation
* Comment out GPU function
* Undo previous change
* Apply more suggestions
* Remove require_torch_gpu annotator
* Fix quality
* Add logger.info
* Fix logger
* Fix variable names
* Fix initializaztion
* Add missing initialization
* Update checkpoint name
* Add model to doc tests
* Add CPU/GPU equivalence test
* Add Deformable DETR to pipeline tests
* Skip model for object detection pipeline
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* NeptuneCallback improvements
* After review suggestions and deduplication of initial run
* Added volatile checkpoints support due to missing post-rebase commit
* Update README per review comments
- Remove list formatting
- Correct Neptune docs link
Co-authored-by: Sabine <sabine.nyholm@neptune.ai>
* First draft
* Improve conversion script
* Make vision encoder work
* More improvements
* Improve conversion script
* Fix quality
* Add MultiframeIntegrationTransformer
* More improvements
* Make MiT output work
* Fix quality
* Add prompts generator
* Add tests
* Fix some tests
* Fix some more tests
* Fix more tests
* Improve conversion script
* Fix model outputs
* Fix more tests
* Add XClipProcessor
* Use processor in conversion script
* Fix integration test
* Update README, fix docs
* Fix all tests
* Add MIT output to XClipOutput
* Create better variable names
* Rename XClip to XCLIP
* Extend conversion script
* Add support for large models
* Add support for 16 frame models
* Add another model'
* Fix module issue
* Apply suggestions from code review
* Add figure to docs
* Fix CLIPProcessor issue
* Apply suggestions from code review
* Delete file
* Convert more checkpoints
* Convert last checkpoint
* Update nielsr to microsoft
* [WIP] Skeleton of VisualQuestionAnweringPipeline extended to support LayoutLM-like models
* Fixup
* Use the full encoding
* Basic refactoring to DocumentQuestionAnsweringPipeline
* Cleanup
* Improve args, docs, and implement preprocessing
* Integrate OCR
* Refactor question_answering pipeline
* Use refactored QA code in the document qa pipeline
* Fix tests
* Some small cleanups
* Use a string type annotation for Image.Image
* Update encoding with image features
* Wire through the basic docs
* Handle invalid response
* Handle empty word_boxes properly
* Docstring fix
* Integrate Donut model
* Fixup
* Incorporate comments
* Address comments
* Initial incorporation of tests
* Address Comments
* Change assert to ValueError
* Comments
* Wrap `score` in float to make it JSON serializable
* Incorporate AutoModeLForDocumentQuestionAnswering changes
* Fixup
* Rename postprocess function
* Fix auto import
* Applying comments
* Improve docs
* Remove extra assets and add copyright
* Address comments
Co-authored-by: Ankur Goyal <ankur@impira.com>
* Update TF fine-tuning docs
* Fix formatting
* Add some section headers so the right sidebar works better
* Squiggly it
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Explain things in the text, not the comments
* Make the two dataset creation methods into a list
* Move the advice about collation out of a <Tip>
* Edits for clarity
* Edits for clarity
* Edits for clarity
* Replace `to_tf_dataset` with `prepare_tf_dataset` in the fine-tuning pages
* Restructure the page a little bit
* Restructure the page a little bit
* Restructure the page a little bit
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* use tokenizer to output tensor
* add preprocessing for decoder_input_ids for bare T5Model
* add preprocessing to tf and flax
* linting
* linting
* Update src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/t5/modeling_tf_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add Image2TextGenerationPipeline to supported pipelines
* Add Flax and Tensorflow support
* Add Flax and Tensorflow small tests
* Add default model for Tensorflow
* Add docstring
* Fix doc style
* Add tiny models for pytorch and flax
* Remove flax from pipeline.
Fix tests
* Use ydshieh/vit-gpt2-coco-en as a default for both PyTorch and Tensorflow
* Fix Tensorflow support
Co-authored-by: Olivier Dehaene <olivier@huggingface.co>
* Implement ONNX support for Longformer
Fix repo consistency check complaints
Fix value mismatches
Add pooler output for default model
Increase validation atol to accommodate multiple-choice error
Fix copies
Fix chunking for longer sequence lengths
Add future comment
* Fix issue in mask_invalid_locations
* Remove torch imports in configuration_longformer
* Change config access to fix LED
* Push opset version to support tril
* Work in review comments (mostly style)
* Add Longformer to ONNX tests
* bnb minor modifications
- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* put in one block
- put bash instructions in one block
* update readme
- refactor a bit hardware requirements
* change text a bit
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add link to paper
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update tests/mixed_int8/README.md
* Apply suggestions from code review
* refactor a bit
* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add A6000
* clarify a bit
* remove small part
* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* onnx config for clip
* default opset as 14
* changes from the original repo
* input values order fix
* outputs fix
* remove unused import
* ran make fix-copies
* black format
* review comments: forward ref, import fix, model change revert, .to cleanup
* make style
* formatting fixes
* revert groupvit
* comment for cast to int32
* comment fix
* make .T as .t() for onnx conversion
* ran make fix-copies
* remove unneeded comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix copies
* remove comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* first commit
* correct replace function
* add final changes
- works like charm!
- cannot implement tests yet
- tested
* clean up a bit
* add bitsandbytes dependencies
* working version
- added import function
- added bitsandbytes utils file
* small fix
* small fix
- fix import issue
* fix import issues
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit
- move bitsandbytes utils to utils
- change comments on functions
* reformat docstring
- reformat docstring on init_empty_weights_8bit
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert bad formatting
* change to bitsandbytes
* refactor a bit
- remove init8bit since it is useless
* more refactoring
- fixed init empty weights issue
- added threshold param
* small hack to make it work
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
* revmoe the small hack
* modify utils file
* make style + refactor a bit
* create correctly device map
* add correct dtype for device map creation
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
- remove with torch.grad
- do not rely on Python bool magic!
* add docstring
- add docstring for new kwargs
* add docstring
- comment `replace_8bit_linear` function
- fix weird formatting
* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add
* few modifs
- typo doc
- force cast into float16 when load_in_8bit is enabled
* added colab link
* add test architecture + docstring a bit
* refactor a bit testing class
* make style + refactor a bit
* enhance checks
- add more checks
- start writing saving test
* clean up a bit
* male style
* add more details on doc
* add more tests
- still needs to fix 2 tests
* replace by "or"
- could not fix it from GitHub GUI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit testing code + add readme
* make style
* fix import issue
* Update src/transformers/modeling_utils.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* add few comments
* add more doctring + make style
* more docstring
* raise error when loaded in 8bit
* make style
* add warning if loaded on CPU
* add small sanity check
* fix small comment
* add bitsandbytes on dockerfile
* Improve documentation
- improve documentation from comments
* add few comments
* slow tests pass on the VM but not on the CI VM
* Fix merge conflict
* make style
* another test should pass on a multi gpu setup
* fix bad import in testing file
* Fix slow tests
- remove dummy batches
- no more CUDA illegal memory errors
* odify dockerfile
* Update docs/source/en/main_classes/model.mdx
* Update Dockerfile
* Update model.mdx
* Update Dockerfile
* Apply suggestions from code review
* few modifications
- lm head can stay on disk/cpu
- change model name so that test pass
* change test value
- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging
* modify installation guidelines
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* replace `n`by `name`
* merge `load_in_8bit` and `low_cpu_mem_usage`
* first try - keep the lm head in full precision
* better check
- check the attribute `base_model_prefix` instead of computing the number of parameters
* added more tests
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit
* improve documentation
- fix typos for installation
- change title in the documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* update features
* MT5OnnxConfig added with updated with tests and docs
* fix imports
* fix onnc_config_cls for mt5
Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>
* Delete valohai.yaml
* NLP => ML
* typo
* website supports https
* datasets
* 60k + modalities
* unrelated link fixing for accelerate
* Ok those links were actually broken
* Fix link
* Make `AutoTokenizer` auto-link
* wording tweak
* add at least one non-nlp task
* First draft
* Add VideoMAEForVideoClassification
* Improve conversion script
* Add VideoMAEForPreTraining
* Add VideoMAEFeatureExtractor
* Improve VideoMAEFeatureExtractor
* Improve docs
* Add first draft of model tests
* Improve VideoMAEForPreTraining
* Fix base_model_prefix
* Make model take pixel_values of shape (B, T, C, H, W)
* Add loss computation of VideoMAEForPreTraining
* Improve tests
* Improve model testsé
* Make all tests pass
* Add VideoMAE to main README
* Add tests for VideoMAEFeatureExtractor
* Add integration test
* Improve conversion script
* Rename patch embedding class
* Remove VideoMAELayer from init
* Update design of patch embeddings
* Improve comments
* Improve conversion script
* Improve conversion script
* Add conversion of pretrained model
* Add loss verification of pretrained model
* Add loss verification of unnormalized targets
* Add integration test for pretraining model
* Apply suggestions from code review
* Fix bug to make feature extractor resize only shorter edge
* Address more comments
* Improve normalization of videos
* Add doc examples
* Move constants to dedicated script
* Remove scripts
* Transfer checkpoints, fix docs
* Update script
* Update image mean and std
* Fix doc tests
* Set return_tensors to NumPy by default
* Revert the previous change
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Add file in spanish docs to be translated
* Translate first two sections to Spanish
* Translate four additional sections to Spanish
* Finish translation to Spanish
* Improve writing style in Spanish
* Add suggested changes from reviewer
This PR moves GroupViT and LXMert to their correct sections. As pointed out by @NielsRogge and @LysandreJik, GroupViT and LXMert are both multimodal models.
* add LUKE models for downstream tasks
* add new LUKE models to docs
* fix typos
* remove commented lines
* exclude None items from tuple return values
Left the term fine-tuning since there is no correct translation into Italian and the English term is generally used. The same was done with some terms like "learning rate"
* start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch and should import it before use
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add doc for perf_train_cpu_many
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Add files generated using transformer-cli add-new-model-like command
* Add changes for swinv2 attention and forward method
* Add fixes
* Add modifications for weight conversion and remaining args in swin model
* Add changes for patchmerging
* Add changes for SwinV2selfattention
* Update conversion script
* Add final fixes for the swin_v2 model
* Add changes for conversion script for pretrained window size case
* Add pretrained window size value from config in SwinV2Encoder class
* Make fixup
* Add swinv2 to models_not_in_readme to utils/check_copies.py
* Modify Swinv2v2 to Swin Transformer V2
* Remove copied from, to run make fixup command
* Add updates to swinv2tf from main branch
* Add pretrained_window_size to config, to make tests pass
* Add modified weights from nandwalritik profile for swinv2
* Update model weights from swinv2 from nandwalritik profile
* Add fix for build_pr_documentation CI fix
* Add fixes for weight conversion
* Add change to make input with padding work
* Add fixes for test cases
* Add few changes from swin to swinv2 to pass test cases
* Remove tests for tensorflow as swinv2 for TF is not added yet
* Overide test_pt_tf_model_equivalence function as TF implementation for swinv2 is not added yet
* Add modeling_tf_swinv2 to _ignore_modules as test file is removed for this one right now.
* Update docs url for swinv2 in README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Undo changes for check_repo
* Update url in readme.md
* Remove overrided function to test pt_tf_model_equivalence
* Remove TF model imports for Swinv2 as its not implemented in this PR
* Add changes for index.mdx
* Add swinv2 papers link,abstract and contributors details
* Rename cpb_mlp to continous_position_bias_mlp
* Add tips for swinv2 model
* Update src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Fix indentation for docstring example in src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update import order in src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add copyright statements in weights conversion script.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Remove Swinv2 from models_not_in_readme
* Reformat code
* Remove TF implementation file for swinv2
* Update start docstring.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add changes for docstring
* Update orgname for weights to microsoft
* Remove to_2tuple function
* Add copied from statements wherever applicable
* Add copied from to Swinv2ForMaskedImageModelling class
* Reformat code.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add unittest.skip(with reason.) for test_inputs_embeds test case.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add updates for test_modeling_swinv2.py
* Add @unittest.skip() annotation for clarity to create_and_test_config_common_properties function
* Add continuous_position_bias_mlp parameter to conversion script
* Add test for testing masked_image_modelling for swinv2
* Update Swinv2 to Swin Transformer v2 in docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update Swinv2 to Swin Transformer v2 in docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add suggested changes
* Add copied from to forward methods of Swinv2Stage and Swinv2Encoder
* Add push_to_hub flag to weight conversion script
* Change order or Swinv2DropPath class
* Add id2label mapping for imagenet 21k
* Add updated url for SwinV2 functions and classes used in implementation
* Update input_feature dimensions format, mentioned in comments.
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Add suggested changes for modeling_swin2.py
* Update docs
* Remove create_and_test_config_common_properties function, as test_model_common_attributes is sufficient.
* Fix indentation.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes for making Nit objects in code style
* Add suggested changes
* Add suggested changes for test_modelling_swinv2
* make fix-copies
* Update docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Improve docs
* Improve docs of speech one as well
* Apply suggestions from code review
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update index
* Translate to Spanish two sections from custom_models
* Translate to Spanish custom models documentation
* Fixing typos and grammatical errors
* Add requested changes from reviewer
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial
* Delete docs/source/pt-br directory
* [ fast_tokenizers.mdx ] - Continuing work on file
* [ fast_tokenizers.mdx ] - Continuing work on file
* Add fast tokenizers to _toctree.yml
* Eliminated config and toctree.yml
* Nits in fast_tokenizers.mdx
* Finishing create_a_model
* [ create_a_model.mdx ] finishing create a model in pt-br
* [ Changing _toctree.yml ] adding create a model in pt
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* First commit
* final changes
* Changed create_model to create_a_model
Translated into crea un'architettura personalizzata in the file it/_toctree.yml
* Added _toctree.yml in the italian translation loca: serialization title Esporta modelli transformers
* Edit translation for create_model.mdx
* t with '#' will be ignored, and an empty message aborts the commit.
* Added file serialization for translation in italian
* Fix toctree serialization position
I checked the eng toctree and realized I made a mistake.
* Update _toctree.yml
Correct spacing
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add: segformer utils and img. classification.
* add: segmentation layer.
* feat: working implementation of segformer.
* chore: remove unused variable.
* add test, remaining modifications.
* remove: unnecessary files.
* add: rest of the files.
Co-authored-by: matt <rocketknight1@gmail.com>
* chore: remove ModuleList comment.
* chore: apply make style.
* chore: apply make fixup-copies.
* add to check_repo.py
* add decode head to IGNORE_NON_TESTED
* chore: run make style.
* chore: PR comments.
* chore: minor changes to model doc.
* tests: reduction across samples.
* add a note on the space.
* sort importats.
* fix: reduction in loss computation.
* chore: align loss function with that of NER.
* chore: correct utils/documentation_tests.txt
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* chore: simplify the interpolation of logits in loss computation.
* chore: return transposed logits when return_dict=False.
* chore: add link to the tf fine-tuning repo.
* address pr comments.
* address niels's comments.
* remove from_pt=True since tf weights are in.
* remove comment from pt model.
* address niels's comments.
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Run_scripts Italian translation gh-17459
* Updated run_scripts gh-17642
* Updated run_scripts gh-17642
Made the text more gender-neutral.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial work
* More work
* Add tests for custom pipelines on the Hub
* Protect import
* Make the test work for TF as well
* Last PyTorch specific bit
* Add documentation
* Style
* Title in toc
* Bad names!
* Update docs/source/en/add_new_pipeline.mdx
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline"
* Address review comments
* Address more review comments
* Update src/transformers/pipelines/__init__.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Rought TF conversion outline
* Tidy up
* Fix padding differences between layers
* Add back embedder - whoops
* Match test file to main
* Match upstream test file
* Correctly pass and assign image_size parameter
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add in MainLayer
* Correctly name layer
* Tidy up AdaptivePooler
* Small tidy-up
More accurate type hints and remove whitespaces
* Change AdaptiveAvgPool
Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. 9e26607e22 (r900109509)
Co-authored-by: From: matt <rocketknight1@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Use updated AdaptiveAvgPool
Co-authored-by: matt <rocketknight1@gmail.com>
* Make AdaptiveAvgPool compatible with CPU
* Remove image_size from configuration
* Fixup
* Tensorflow -> TensorFlow
* Fix pt references in tests
* Apply suggestions from code review - grammar and wording
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add TFResNet to doc tests
* PR comments - GlobalAveragePooling and clearer comments
* Remove unused import
* Add in keepdims argument
* Add num_channels check
* grammar fix: by -> of
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Remove transposes - keep NHWC throughout forward pass
* Fixup look sharp
* Add missing layer names
* Final tidy up - remove from_pt now weights on hub
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* add onnx support for BLOOM
* use TYPE_CHECKING for type annotations
* fix past_shape for bloom (different from gpt2)
* use logical_or instead of `+` for onnx support
* bigger `atol_for_validation` for larger bloom models
* copied -> taken because it's no longer an exact copy
* remove "copied from" comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* first draft adding Flax-t5-encoder and Flax-mt5-encoder
* imports
* after make fixup
* flax t5 encoder test
* black on test
* make fix-copies
* clean
* all_model_classes -> tuple
* clean test
* is_encoder_decoder=False in t5-enc tester
* remove file docstring before FlaxT5Encoder
* black
* isort
* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* remove _get_encoder_module
* self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder
* bugfix - self.module_class is class itself, not instance;
* docs for mt5 and t5
* call -> __call__ in t5 doc
* FlaxMT5EncoderModel to TYPE_HINT
* run doc-builder to allow change the files
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* chore: initial commit
Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.
* chore: porting the rest of the modules to tensorflow
did not change the documentation yet, yet to try the playground on the model
* Fix initilizations (#1)
* fix: code structure in few cases.
* fix: code structure to align tf models.
* fix: layer naming, bn layer still remains.
* chore: change default epsilon and momentum in bn.
* chore: styling nits.
* fix: cross-loading bn params.
* fix: regnet tf model, integration passing.
* add: tests for TF regnet.
* fix: code quality related issues.
* chore: added rest of the files.
* minor additions..
* fix: repo consistency.
* fix: regnet tf tests.
* chore: reorganize dummy_tf_objects for regnet.
* chore: remove checkpoint var.
* chore: remov unnecessary files.
* chore: run make style.
* Update docs/source/en/model_doc/regnet.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* chore: PR feedback I.
* fix: pt test. thanks to @ydshieh.
* New adaptive pooler (#3)
* feat: new adaptive pooler
Co-authored-by: @Rocketknight1
* chore: remove image_size argument.
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
* Empty-Commit
* chore: remove image_size comment.
* chore: remove playground_tf.py
* chore: minor changes related to spacing.
* chore: make style.
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
* chore: refactored __init__.
* chore: copied from -> taken from./g
* adaptive pool -> global avg pool, channel check.
* chore: move channel check to stem.
* pr comments - minor refactor and add regnets to doc tests.
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* minor fix in the xlayer.
* Empty-Commit
* chore: removed from_pt=True.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add a TF in-graph tokenizer for BERT
* Add from_pretrained
* Add proper truncation, option handling to match other tokenizers
* Add proper imports and guards
* Add test, fix all the bugs exposed by said test
* Fix truncation of paired texts in graph mode, more test updates
* Small fixes, add a (very careful) test for savedmodel
* Add tensorflow-text dependency, make fixup
* Update documentation
* Update documentation
* make fixup
* Slight changes to tests
* Add some docstring examples
* Update tests
* Update tests and add proper lowercasing/normalization
* make fixup
* Add docstring for padding!
* Mark slow tests
* make fixup
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* make fixup
* Properly handle tensorflow-text dummies
* Add CodeGen model
* Add missing key and switch order of super()
* Fix torch.ones init with uint8 instead of bool
* Address comments: copy statements and doc
* update tests
* remove old model parallel
* fix batch gen tests
* fix batch gen test
* update test_gpt2_sample_max_time
* fix codgen test and revert gpt2 test change
* Fix incorrect tie_word_embedding value, typo, URL
* Fix model order in README and styling
* Reorder model list alphabetically
* Set tie_word_embedding to False by default
* Apply suggestions from code review
* Better attn mask name & remove attn masked_bias
* add tokenizer for codegen
* quality
* doc tokenizer
* fix-copies
* add CodeGenTokenizer in converter
* make truncation optional
* add test for truncation
* add copyright
* fix-copies
* fix fast tokenizer decode
* Update src/transformers/models/codegen/tokenization_codegen.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* increase vocab_size in tests
Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add skeleton files
* fix cpu inference link
* add hint to make clear that single gpu section contains general info
* add new files to ToC
* update toctree to have subsection for performance
* add "coming soon" to the still empty sections
* fix missing title
* fix typo
* add reference to empty documents
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py
* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.
* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.
[ pipeline_tutorial.mdx ] - Grammar changes.
* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.
* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.
[ training.mdx ] - Added portuguese translation for training tutorial.
* [ preprocessing.mdx ] - WIP
* Update _toctree.yml
* Adding Pré-processamento to _toctree.yml
* Update accelerate.mdx
* Nits and eliminate preprocessing file while it is ready
* [ index.mdx ] - Translated to Portuguese the index apresentation page.
* [ docs/source/pt ] - Updated _toctree.yml to match newest translations.
* Fix build_pr_documentation.yml
* Fix index nits
* nits in _toctree
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* add new bloom classes
* (feat) add bloom classification tests; make style
* style: change import in test
* add some typehints to bloom classes
* merge main into branch
* fix: input checking in bloom seq classification
* fix tests
* change model class tests
* fix few tests
- more tests should pass
- one test left
* make token classifier return hidden states
* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Initial commit
* Make some fixes
* Make PT model full forward pass
* Drop TF & Flax implementation, fix copies etc
* Add Flax model and update some corresponding stuff
* Drop some TF things
* Update config and flax local attn
* Add encoder_attention_type to config
* .
* Update docs
* Do some cleansing
* Fix some issues -> make style; add some docs
* Fix position_bias + mask addition + Update tests
* Fix repo consistency
* Fix model consistency by removing flax operation over attn_mask
* [WIP] Add PT TGlobal LongT5
* .
* [WIP] Add flax tglobal model
* [WIP] Update flax model to use the right attention type in the encoder
* Fix flax tglobal model forward pass
* Make the use of global_relative_attention_bias
* Add test suites for TGlobal model
* Fix minor bugs, clean code
* Fix pt-flax equivalence though not convinced with correctness
* Fix LocalAttn implementation to match the original impl. + update READMEs
* Few updates
* Update: [Flax] improve large model init and loading #16148
* Add ckpt conversion script accoring to #16853 + handle torch device placement
* Minor updates to conversion script.
* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
* gpu support + dtype fix
* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies
* Remove caching logic for local & tglobal attention
* Apply another batch of suggestions from code review
* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx
* Fix converting script + revert config file change
* Revert "Remove caching logic for local & tglobal attention"
This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
* Stash caching logic in Flax model
* Make side relative bias used always
* Drop caching logic in PT model
* Return side bias as it was
* Drop all remaining model parallel logic
* Remove clamp statements
* Move test files to the proper place
* Update docs with new version of hf-doc-builder
* Fix test imports
* Make some minor improvements
* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray
* Fix TGlobal for ONNX conversion + update docs
* fix _make_global_fixed_block_ids and masked neg value
* update flax model
* style and quality
* fix imports
* remove load_tf_weights_in_longt5 from init and fix copies
* add slow test for TGlobal model
* typo fix
* Drop obsolete is_parallelizable and one warning
* Update __init__ files to fix repo-consistency
* fix pipeline test
* Fix some device placements
* [wip]: Update tests -- need to generate summaries to update expected_summary
* Fix quality
* Update LongT5 model card
* Update (slow) summarization tests
* make style
* rename checkpoitns
* finish
* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
* adding template
* update model
* model update
* update conf for debug model
* update conversion
* update conversion script
* update conversion script
* fix missing keys check
* add tests to test the tokenizer in the local machine
* Change variable name
* add tests on xnli dataset
* add more description
* add descriptions + clearer code
* clearer code
* adding new tests + skipping few tests because of env problems
* change comment
* add dtype on the configuration
* add test embeddings
* add hardcoded test
* fix dtype issue
* adding torch.float16 to config
* adding more metrics (min, max, mean)
* add sum
* now the test passes with almost equal
* add files for conversion - test passes on cpu gpu
* add final changes
* cleaning code
* add new args in the docstring
* fix one liner function
* remove macros
* remove forward attention
* clean up init funtion
* add comments on the issue
* rm scale mask softmax
* do make style
* fix dtype in init
* fixing for loop on att probs
* fix style with black
* fix style + doc error
* fix and debug CI errors (docs + style)
* some updates
- change new operations
- finally add scaled softmax
- added new args in the config
* make use cache working
* add changes
- save sharded models
- final changes on the modeling script
* add changes
- comment on alibi
- add TODO on seq length
* test commit
- added a text to test the commit
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* final changes
- attention mask change
- generation works on BS176b
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* changes - model + conversion
* move to correct dir
* put ,
* fex fixes
* fix tokenizer autodoc
* fix minor CI issues
* fix minor CI issues
* fix minor CI issues
* fix style issue
* fix minor import issues
* fix few issues
* remove def main on the test
* add require torch
* replace decorator with 'with'
* fix style
* change to bloom
* add quick fix tokenizer
* fix tokenizer file
* fix tokenizer
- merge tests
- small fixes
* fix import issue
* add bloom to readme
* fix consistency
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
fix comment issues on file headers
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix doc issue
* small fix - modeling test
* some changes
- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests
* remove useless division
* more tests should pass
* more tests should pass
* more tests should pass
* let's try this one
-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed
* refactor
- refactor code
- style changes
- add new threshold for test
* major changes
- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test
* modify readme
* small fixes
* small fix
- better threshold for a test
* remove old test file from fetcher
* fix small typo
* major change
- change BloomLMHead to BloomForCausalLM
* remove onnx config
* major changes
- refactor the code
- remove asserts
- change tol for test
* make style
* small change
* adding a slow test + commenting old ones for now
* make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
* fix duplicates
* cleaning comments on config
* clean a bit conversion file
* refacor a bit modeling file
* refactor tokenizer file
* fix tokenization test issue
* fix tokenization issue #2
* fix tokenization issue second try
* fix test issue
* make style + add suggestions
* change test fetcher
* try this one
- slow tests should pass
- finger crossed
* possible final changes
* make style
* try fix padding side issue
* fix side
* fix padding issue
* fix ko-readme
* fix config auto
* cleaning modeling file
* keep bloom in caps in ko
* update config docs
* remove pretraining_pp
* remove model parallel
* update config
- add correct config files
* fix duplicates
* fix fetcher
* fix refactor issue
- remove divide function
* try to remove alibi
* small fixes
- fix alibi
- remove seq length
- refactor a bit the code
* put correct values
- fix bos and eos token ids
* fix attention mask loop
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* small fixes:
- remove skip bias add
* small fixes
- fix typo in readme
- fix typos in config
* small changes
- remove a test
- add reconstruction test
- change config
* small changes
- change Scaled Softmax to BloomScaledSoftmax
* small fixes
- fix alibi dtype
* major changes
- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring
* fix readmes
* major changes
- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now
* refactor a bit
* refactor a bit
* put correct name on test
* change docstring
* small changes
- fix docstring modeling
- fix test tolerance
* fix small nit
- take dtype from tensors in the conversion script
* minor fix
- fix mdx issue
* minor fix
- change config docstring
* forward contrib credits from PR14084
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply modifications
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* resolve softmax upcast
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
* final changes modeling
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'
* merge commit
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply suggestions
Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix gradient checkpointing
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add slow but exact
* add accelerate compatibility
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
* forward contrib credits
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix torch device on tests
* make style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix nits
Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>
* remove final nits
* fix doc
- add more details on the doc
- add links to checkpoints
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
* put test torchscript to false
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: justheuristic <justheuristic@gmail.com>
* fix alibi
- create alibi only once
* add small doc
* make quality
* replace torch.nn
* remove token type emb
* fix fused op + output bias
* add fused op
- now can control fused operation from config
* remove fused op
* make quality
* small changes
- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests
* Update src/transformers/models/bloom/modeling_bloom.py
* fix slow
* make style
* add accelerate support
* add bloom to deepspeed tests
* minor changes
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* minor change
* slow tests pass
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* minor changes:
- change docstring
- add link to paper
Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
* feat: initial implementation of data2vec segmentation model in TF.
* chore: minor corrections to make the segmenter work.
* chore: removed unncessary files.
* chore: add tests and other modifications.
* fix: loss computation for segmentation.
* chore: remove unused variable.
* chore: formatting.
* added a dummy adaptive pooling layer.
* removed unnecessary file.
* potentially add identifiers to layer names.
* fix: layer naming.
* chore: removed unnecessary print.
* Skipping unneeded test
* chore: add logging to debug tolerance.
* fix: segmentation tests for tfdata2vecvision
* chore: make style.
* fix: layer names, assertion to be resolved.
* Bumping test tolerance a bit
* chore: bump the tol in PT test.
Co-authored-by: matt <rocketknight1@gmail.com>
* added cbs to notebooks, made copy-paste error fix in generation_utils
* initial push for mctc model
* mctc feature extractor done
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* passing attention, now struggling to figure out how attention masks make sense here
* works when excluding attention masks. ask later how one would integrate attention maskshere
* bizarre configuration error (model prefix comes first in config dict json and messes up the order)
* all passing but bizzarre config dict ordering issue when to_dict
* passing all major tests
* feature extraction, processor, tokenizer added & tests passing
* style & consistency & other logistical fixes
* copy paste fix
* model after feature extraction working
* commiting final feature extraction results; need to fix normalization
* feature extraction passing tests; probably should add tests on the specific flashlight-copied functions?
* delete print ; format code a bit
* fixing tests
* passing major tests
* fixing styles
* completed tokenization test with real example; not sure if these values are entirely correct.
* last test fixes from local
* reverting accidentally included custom setup configs
* remove load tf weights; fix config error
* testing couldnt import featureextractor
* fix docs
* fix docs
* resolving comments
* style fixes
* style fixes
* Update to MCTCConv1dSubSampler
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* relposemb fixes
* conv1d name issue; expecting config fail with paraentheses
* fix config issue
* fix config issue
* fix config issue
* change everything to MCTCT
* fixing naming change errors
* archive list
* copyrights and docs
* copyrights and docs
* copyrights and docs
* merge resolution
* move tests, fix to changed optionaldependency structure
* test directories changed
* fixing tests
* how to avoid tf tests?
* how to avoid tf tests?
* tests passing locally
* allow mctctprocessor imported any env
* allow mctctprocessor imported any env
* fixed second round of feedback, need to fix docs
* doc changes not being applied
* all fixed
* style fix
* feedback fixes
* fix copies and feature extraction style fix
* Update tests/models/visual_bert/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* copy paste huggingface:main visual bert
* added eof newline to visual bert; all tests are passing otherwise
* fix slow tests by adding attention mask
* change model id to speechbrain
* make fix-copies
* fix readme unwanted deletes
* fixing readmes, make fix-copies
* consistent M-CTC-T naming
* Update src/transformers/models/mctct/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* all fixed but variable naming
* adjust double quotes
* fixed variable names
* copyright and mr quilter
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct slow tests
* make fix-copies
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* m-ctc-t not mctct
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Quicktour Portuguese Translation
Translated quicktour.mdx until line 161
* Finished translating quicktour.mdx
Ready to upload and adjust eventual .mdx or translation mistakes.
* Add _toctree.yml and fix nits
* Fixed pt-br mdx syntax problem
Closed <frameworkcontent> instance
* Changed </frameworkcontent> line
* Copied missing block from english version of quicktour.mdx
* Reviwed the entire file once again. It should be working now.
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Add the Italian translation of the file installation.mdx and edit _toctree
* Add the Italian translation of the file installation.mdx and edit _toctree
This PR updates our Expert Acceleration Program image with a new image featuring our experts.
This is similar to our Transformers/README.md image update that has proven to be successful.
* initial commit
* add init file
* update globakl init
* update index and dummy objects
* style
* update modelling auto
* fix initi typo in src/transformers
* fix typo in modeling tf auto, opt was in wrong mapping name
* fixed a slow test : saved_model
* style
* fix positionnal embedding if no position id is provided
* update tf test
* update test flax requirements
* fixed serialization
* update
* update tf name to allow smooth convertion
* update flax tests
* style
* fix test typo
* fix tf typo test
* add xla for generate support in causal LM
* fixed bug
* cleaned tf tests
* style
* removed from PT for slow tests
* fix typp
* opt test as slow
* trying to fix GPT2 undefined
* correct documentation and add to test doc
* update tf doc
* fix doc
* fake commit
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* update test based on review
* merged main layer for functionning test
* fixup + quality
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update long comment
* make fix copies
Co-authored-by: Arthur <arthur@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Setup for Italian translation and add first document
- Add 'it' folder for files translated into Italian
- Add _config.py and _toctree.yml files
- Add translation of quicktour.mdx
* Fix style issue of italian documentation files
* Add 'it' to the languages section in the .github/workflows
* Remove - installation from _toctree for Italian
* Translation for index file
- Add index to _toctree.yml
- Add translation of index.mdx
* Fix typo in docs/source/it/index.mdx
* Translate code comments in docs/source/it/_config.py
Co-authored-by: Martina Fumanelli <martinafumanelli@Martinas-MBP.homenet.telecomitalia.it>
* Add onnx configuration for xlm
* Add supported features for xlm
* Add xlm to models exportable with onnx
* Add xlm architecture to test file
* Modify docs
* Make code quality fixes
* Make forward pass work
* More improvements
* Remove unused imports
* Remove timm dependency
* Improve loss calculation of token classifier
* Fix most tests
* Add docs
* Add model integration test
* Make all tests pass
* Add LayoutLMv3FeatureExtractor
* Improve integration test + make fixup
* Add example script
* Fix style
* Add LayoutLMv3Processor
* Fix style
* Add option to add visual labels
* Make more tokenizer tests pass
* Fix more tests
* Make more tests pass
* Fix bug and improve docs
* Fix import of processors
* Improve docstrings
* Fix toctree and improve docs
* Fix auto tokenizer
* Move tests to model folder
* Move tests to model folder
* change default behavior add_prefix_space
* add prefix space for fast
* add_prefix_spcae set to True for Fast
* no space before `unique_no_split` token
* add test to hightligh special treatment of added tokens
* fix `test_batch_encode_dynamic_overflowing` by building a long enough example
* fix `test_full_tokenizer` with add_prefix_token
* Fix tokenizer integration test
* Make the code more readable
* Add tests for LayoutLMv3Processor
* Fix style
* Add model to README and update init
* Apply suggestions from code review
* Replace asserts by value errors
* Add suggestion by @ducviet00
* Add model to doc tests
* Simplify script
* Improve README
* a step ahead to fix
* Update pair_input_test
* Make all tokenizer tests pass - phew
* Make style
* Add LayoutLMv3 to CI job
* Fix auto mapping
* Fix CI job name
* Make all processor tests pass
* Make tests of LayoutLMv2 and LayoutXLM consistent
* Add copied from statements to fast tokenizer
* Add copied from statements to slow tokenizer
* Remove add_visual_labels attribute
* Fix tests
* Add link to notebooks
* Improve docs of LayoutLMv3Processor
* Fix reference to section
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Initial work
* More or less finished with first draft
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix randomly initialized weights
* Update src/transformers/modeling_utils.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Address review comments
* Rename DeepSpeed folder to temporarily fix the test issue?
* Revert to try if Accelerate fix works
* Use latest Accelerate release
* Quality and fixes
* Style
* Quality
* Add doc
* Test + fix
* More blocks
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* add inference example to LayoutLMv2ForQuestionAnswering, passing doctest
* add loss example to LayoutLMv2ForQuestionAnswering, passing doctest
* Add correct doctest for LayoutLMv2ForTokenClassification, passing doctest
* add correct doctest for LayoutLMv2ForSequenceClassification, passing test
* add correct doctest for LayoutLMv2Model, passing test
* make fixup
* fix to address review comments
* make style
* fix doctest line break issue, add to documentaiton_tests.txt, address review comments
* move comment about layoutlmv2 dependencies to the doc page
* format doc page as suggested
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* delete extraneous backtick
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing
* LED docs clarification
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [LED] gradient_checkpointing=True should be passed to TrainingArguments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [LED] docs: remove wrong word
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [LED] docs fix typo
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Automatically sort auto mappings
* Better class extraction
* Some auto class magic
* Adapt test and underlying behavior
* Remove re-used config
* Quality
* [doc] performance/scalability revamp
* link the new docs
* no :
* mixed precision
* work on the first doc
* expand the main doc
* Trigger CI
* style
* revamp single GPU training section
* work on training performance
* remove files not used anymore or will be added later
* final touches
* fix rebase
* Add hardware section to toctree
* fix toctree again
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove `fast_tokenizers` entry that was copied in rebase
* add warning about DP vs DDP
* remove todo
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix missing closure of codeblock
* Update docs/source/en/perf_train_gpu_many.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* sync with #16860
* update toc
Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial
* Delete docs/source/pt-br directory
* [ fast_tokenizers.mdx ] - Continuing work on file
* [ fast_tokenizers.mdx ] - Continuing work on file
* Add fast tokenizers to _toctree.yml
* Eliminated config and toctree.yml
* Nits in fast_tokenizers.mdx
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py
* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.
* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.
[ pipeline_tutorial.mdx ] - Grammar changes.
* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.
* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.
[ training.mdx ] - Added portuguese translation for training tutorial.
* [ preprocessing.mdx ] - WIP
* Update _toctree.yml
* Adding Pré-processamento to _toctree.yml
* Update accelerate.mdx
* Nits and eliminate preprocessing file while it is ready
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* First version - OPT model
* Final changes
- putting use cache to False
* few changes
- remove commented block
* few changes
- remove unecessary files
* fix style issues
* few changes
- remove a test file
- added the logits test
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add gen tests
* few changes
- rm mask filling example on docstring
* few changes
- remove useless args
* some changes
- more tests should pass now
- needs to clean more
- documentation still needs to be done
* fix code quality
* major changes
- change attention architecture to BART-like
- modify some tests
- style fix
* rm useless classes
- remove opt for:
- QA
- cond generation
- seq classif
* Removed autodoc calls to non-existant classes
TOkenizers are not implemented
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/modeling_tf_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Replaced OPTTokeniser with GPT2 tokenizer
* added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
* Removed OPTTokenizer
* make style
* Make style replaces
``` ...).unsqueeze(```
by
``` >>>).unsqueeze(```
* make repo consistency
* Removed PretrainedOPTModel
* fix opt.mdx removed other heads
* fix init, removed 3 heads
* removed heads
* finished cleaning head
* removed seauence classif and question answering
* removed unused imports
* removed useless dummy object for QA, SC and CG
* removed tests for removed useless dummy object for QA, SC and CG
* Removed head_mask using encoder layers which don't exist
* fixed test
* fix line
* added OPT to toctree
* Updated model path with pushed weigths
* fix model path
* fixed code quality
* fixed embeddings and generation tests
* update paths
* clean comments
* removed OPTClassificationHead for sentence classification
* renamed hidden layer
* renamed num layers to standard num_hidden_layers
* num_attention_heads fix
* changes for 125m
* add first version for 125m
* add first version - flax
* add new version
* causal LM output
* replace output type with BaseModelOutputWithPastAndCrossAttentions
* revert working config from 150m to 350m
* clean
* removed decoder input ids
* fixed embed dim
* more embed_dim issues
* make style + removed enc_dec test
* update falx model
* removed troublesome copy
* added is_encoder_decoder=False to config
* added set_input emb fuinction to model class
* requires torch on embed test
* use head mask instead of decoder head mask input param solves a test
* 8 test remaining, update
* Updated create_and_check_decoder_model_past_large_inputs
* Make style
* update op tokenizer with condition
* make style
* See if I can push
* some clean up
* remove linear head hack
* save intermediate
* save correct attention
* add copied from from bart
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix part of the reviewss
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* same changes in naming / conversion
* correct mask
* more fixes
* delete FlaxOPT and TfOPT
* clean traces of Flax and Tf
* fix mask
* fixed positionnal embedding length when past key value is provoded
* get 125m, 6.7b to work
* Added do_layer_norm
* solved mismatch in load dictionnary
* clean up preapre opt input dict
* fixed past key value as bool
* fix previus
* fixed return dict False tuple issue
* All tests are passing
* Make style
* Ignore OPTDecoder non tested
* make fix-copies
* make repo consistency
* small fix
* removed uselss @torch.no_grad decorator
* make styl;e
* fix previous opt test
* style
* make style
* added opt documentation
* update OPT_PRETRAINED_MODEL_ARCHIVE_LIST
* up
* more fixes
* model & config work
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added comment on padding hack (+2)
* cleaup
* review update
* docstring for missing arg
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update pretrained map
* update path and tests
* make style
* styling
* make consistency
* add gpt2 tok new
* more tok fixes
* Update src/transformers/models/auto/tokenization_auto.py
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/models/opt/test_modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update based on reviews
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* make style
* make tokenizer auto tests pass
* apply Lysandre suggestion
* finish tests
* add some good tokenizer tests
* improve docs slighly
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Change nits in Spanish for quicktour.mdx
- Add tasks names in English too.
- Fix small nits in Spanish
* Translate index.mdx to Spanish
* Translate body of index.
* Translated the compatible models list (not the papers´ names). Since this should not be updated manually, I can come back to the original text.
* Add models and a dataset for Spanish in the code exmaples
* Replaced the English models to Spanish versions.
* Add index to _toctree.yml and fix Spanish
* Fix double ““ error
* Change negative example in ASR example
* make style
* Debug style in quicktour.mdx
* [WIP] Add FLAVA model
This PR aims to add [FLAVA](ihttps://arxiv.org/abs/2112.04482) model to the transformers repo.
Following checklist delineates the list of things to be done for this PR
to be complete:
[x] Flava init
[x] Flava base models
[x] Flava layers
[x] Flava Configs
[x] Flava encoders
[x] Flava pretraining models
[ ] Flava classification/retrieval models (To be added in a separate PR)
[x] Documentation updates
[x] Imports updates
[x] Argstring updates
[x] Flava pretrained checkpoints
[x] Flava tests
[x] Flava processors
[x] Sanity check
[x] Lint
* add seed worker and set_deterministic_seed_for_cuda function to enforce reproducability
* change function name to enable determinism, add docstrings, reproducability support for tf
* change function name to enable_determinism_for_distributed_training
* revert changes in set_seed and call set_seed within enable_full_determinism
* add one position argument for seed_worker function
* add full_determinism flag in training args and call enable_full_determinism when it is true
* add enable_full_determinism to documentation
* apply make fixup after the last commit
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* PyTorch FSDP integration in Trainer
* reformatting
make style and make quality are now compliant.
* Updating dependency check
* Trigger CI
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Added spanish translation of autoclass_tutorial.
Added 'local' and 'title' fields for autoclass_tutorial.
* Fixed autoclass_tutorial title in _toctree.yml and autoclass_tutorial.mdx
* First draft
* Add YolosForObjectDetection
* Make forward pass work
* Add mid position embeddings
* Add interpolation of position encodings
* Add expected values
* Add YOLOS to tests
* Add integration test
* Support tiny model as well
* Support all models in conversion script
* Remove mid_pe_size attribute
* Make more tests pass
* Add model to README and fix config
* Add copied from statements
* Rename base_model_prefix to vit
* Add missing YOLOS_PRETRAINED_CONFIG_ARCHIVE_MAP
* Apply suggestions from code review
* Apply more suggestions from code review
* Convert remaining checkpoints
* Improve docstrings
* Add YolosFeatureExtractor
* Add feature extractor to docs
* Add corresponding tests
* Fix style
* Fix docs
* Apply suggestion from code review
* Fix bad rebase
* Fix some more bad rebase
* Fix missing character
* Improve docs and variable names
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Adding support for `array` key in raw dictionnaries in ASR pipeline.
* ES .
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Making it work by not popping `array` first.
* Black 22.3
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add TapexTokenizer
* Improve docstrings and provide option to provide answer
* Remove option for pretokenized inputs
* Add TAPEX to README
* Fix copies
* Remove option for pretokenized inputs
* Initial commit: add tapex fine-tuning examples on both table-based question answering and table-based fact verification.
* - Draft a README file for running the script and introducing some background.
- Remove unused code lines in tabfact script.
- Disable the deafult `pad_to_max_length` option which is memory-consuming.
* * Support `as_target_tokenizer` function for TapexTokenizer.
* Fix the do_lower_case behaviour of TapexTokenizer.
* Add unit tests for target scenarios and cased/uncased scenarios for both source and target.
* * Replace the label BartTokenizer with TapexTokenizer's as_target_tokenizer function.
* Fix typos in tapex example README.
* * fix the evaluation script - remove the property `task_name`
* * Make the label space more clear for tabfact tasks
* * Using a new fine-tuning script for tapex-base on tabfact.
* * Remove the lowercase code outside the tokenizer - we use the tokenizer to control whether do_lower_case
* Guarantee the hyper-parameter can be run without out-of-memory on 16GB card and report the new reproduced number on wikisql
* * Remove the default tokenizer_name option.
* Provide evaluation command.
* * Support for WikiTableQuestion dataset.
* Fix a typo in README.
* * Fix the datasets's key name in WikiTableQuestions
* Run make fixup and move test to folder
* Fix quality
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply some more suggestions from code review
* Improve docstrings
* Overwrite failing test
* Improve comment in example scripts
* Fix rebase
* Add TAPEX to Auto mapping
* Add TAPEX to auto config mappings
* Put TAPEX higher than BART in auto mapping
* Add TAPEX to doc tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
Co-authored-by: SivilTaram <qianlxc@outlook.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* 📝 add image/vision classification and asr
* 🖍 minor formatting fixes
* Fixed a typo in legacy seq2seq_trainer.py (#16531)
* Add ONNX export for BeiT (#16498)
* Add beit onnx conversion support
* Updated docs
* Added cross reference to ViT ONNX config
* call on_train_end when trial is pruned (#16536)
* Type hints added (#16529)
* Fix Bart type hints (#16297)
* Add type hints to PLBart PyTorch
* Remove pending merge conflicts
* Fix PLBart Type Hints
* Add changes from review
* Add VisualBert type hints (#16544)
* Adding missing type hints for mBART model (PyTorch) (#16429)
* added type hints for mbart tensorflow tf implementation
* Adding missing type hints for mBART model
Tensorflow Implementation model added with missing type hints
* Missing Type hints - correction
For TF model
* Code fixup using make quality tests
* Hint types - typo error
* make fix-copies and make fixup
* type hints
* updated files
* type hints update
* making dependent modesls coherent
Co-authored-by: matt <rocketknight1@gmail.com>
* Remove MBart subclass of XLMRoberta in tokenzier docs (#16546)
* Remove MBart subclass of XLMRoberta in tokenzier
* Fix style
* Copy docs from MBart50 tokenizer
* Use random_attention_mask for TF tests (#16517)
* use random_attention_mask for TF tests
* Fix for TFCLIP test (for now).
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Improve code example (#16450)
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
* Pin tokenizers version <0.13 (#16539)
* Pin tokenizers version <0.13
* Style
* Add code samples for TF speech models (#16494)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* [FlaxSpeechEncoderDecoder] Fix dtype bug (#16581)
* [FlaxSpeechEncoderDecoder] Fix dtype bug
* more fixes
* Making the impossible to connect error actually report the right URL. (#16446)
* Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (#16556)
* Add utility to find model labels (#16526)
* Add utility to find model labels
* Use it in the Trainer
* Update src/transformers/utils/generic.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Quality
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Enable doc in Spanish (#16518)
* Reorganize doc for multilingual support
* Fix style
* Style
* Toc trees
* Adapt templates
* Add use_auth to load_datasets for private datasets to PT and TF examples (#16521)
* fix formatting and remove use_auth
* Add use_auth_token to Flax examples
* add a test checking the format of `convert_tokens_to_string`'s output (#16540)
* add new tests
* add comment to overridden tests
* TF: Finalize `unpack_inputs`-related changes (#16499)
* Add unpack_inputs to remaining models
* removed kwargs to `call()` in TF models
* fix TF T5 tests
* [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (#16586)
* initialize the default rank set on TrainerState (#16530)
* initialize the default rank set on TrainerState
* fix style
* Trigger doc build
* Fix CI: test_inference_for_pretraining in ViTMAEModelTest (#16591)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add a template to add missing tokenization test (#16553)
* add a template to add missing tokenization test
* add cookiecutter setting
* improve doc
* Update templates/adding_a_missing_tokenization_test/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* made _load_pretrained_model_low_mem static + bug fix (#16548)
* handle torch_dtype in low cpu mem usage (#16580)
* [Doctests] Correct filenaming (#16599)
* [Doctests] Correct filenaming
* improve quicktour
* make style
* Adding new train_step logic to make things less confusing for users (#15994)
* Adding new train_step logic to make things less confusing for users
* DO NOT ASK WHY WE NEED THAT SUBCLASS
* Metrics now working, at least for single-output models with type annotations!
* Updates and TODOs for the new train_step
* Make fixup
* Temporary test workaround until T5 has types
* Temporary test workaround until T5 has types
* I think this actually works! Needs a lot of tests though
* MAke style/quality
* Revert changes to T5 tests
* Deleting the aforementioned unmentionable subclass
* Deleting the aforementioned unmentionable subclass
* Adding a Keras API test
* Style fixes
* Removing unneeded TODO and comments
* Update test_step too
* Stop trying to compute metrics with the dummy_loss, patch up test
* Make style
* make fixup
* Docstring cleanup
* make fixup
* make fixup
* Stop expanding 1D input tensors when using dummy loss
* Adjust T5 test given the new compile()
* make fixup
* Skipping test for convnext
* Removing old T5-specific Keras test now that we have a common one
* make fixup
* make fixup
* Only skip convnext test on CPU
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Avoiding TF import issues
* make fixup
* Update compile() to support TF 2.3
* Skipping model.fit() on template classes for now
* Skipping model.fit() on template class tests for now
* Replace ad-hoc solution with find_labels
* make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding missing type hints for BigBird model (#16555)
* added type hints for mbart tensorflow tf implementation
* Adding missing type hints for mBART model
Tensorflow Implementation model added with missing type hints
* Missing Type hints - correction
For TF model
* Code fixup using make quality tests
* Hint types - typo error
* make fix-copies and make fixup
* type hints
* updated files
* type hints update
* making dependent modesls coherent
* Type hints for BigBird
* removing typos
Co-authored-by: matt <rocketknight1@gmail.com>
* [deepspeed] fix typo, adjust config name (#16597)
* 🖍 apply feedback
Co-authored-by: Cathy <815244047@qq.com>
Co-authored-by: Jim Rohrer <jrohrer1@gmail.com>
Co-authored-by: Ferdinand Schlatt <fschlatt@gmail.com>
Co-authored-by: Dahlbomii <101373053+Dahlbomii@users.noreply.github.com>
Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com>
Co-authored-by: Rishav Chandra Varma <rishavchandra.v16@iiits.in>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Andres Codas <andrescodas@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* first proposal
* replace model outputs in various models
* conflicts
* docstring
* update poolformer
* minor change in docstring
* CI
* removed poolformer specific outputs from doc
* removed convnext specific outputs from doc
* CI
* weird char in segformer
* conversations
* reverted docstring for BaseModelOutputWithPooling
* update outputs
* changed docstring in BaseModelOutput
* updated docstring in modeling outputs
* typos :)
* fixed typo after copy & paste it all around
* CI
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* segformer
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* ported TFViTMAEIntermediate and TFViTMAEOutput.
* added TFViTMAEModel and TFViTMAEDecoder.
* feat: added a noise argument in the implementation for reproducibility.
* feat: vit mae models with an additional noise argument for reproducibility.
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix confusing PIL instructions
As stated in the documentation
[here](https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html?highlight=pdf#write-only-formats),
PIL can only write PDF's, not read them. Remove references to reading
PDF's via PIL from this page to avoid confusion.
* mention PDF in doc examples using PIL
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Be explicit: PDFs must be converted to images
* fix formatting
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Created the Decision Transformer Modle
* updating tests, copy to other machine
* Added last hidden size to Decision Transformer modelling outputs
* Removed copy of original DT file
* made a temporary change to gpt2 to have it conform with the Decision Transformer version
* Updated tests
* Ignoring a file used to test the DT model
* added comments to config file
* added comments and argument descriptions to decision transformer file
* Updated doc
* Ran "make style"
* Remove old model imports
* Removed unused imports, cleaned up init file
* Update docs/source/model_doc/decision_transformer.mdx
added my username
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Reverted changes made to gpt2
* Removed datasets submodule
* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states
* Added support for return of hidden states, attentions and return dict of gpt2 model.
* Updated tests to include many of the ModelTesterMixin tests.
The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes
* Added missing line to the end of gpt2 file
* Added an integration test for the Decision Transformer
Test performs and autoregressive evaluation for two time steps
* Set done and info to _ to fix failing test
* Updated integration test to be deterministic and check expected outputs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unnecessary config options
* Cleaned up commented code and old comments.
* Cleaned up commented code.
* Changed DecisionTransformer to Decision Transformer
* Added Decision Transformer to the main README file
* Added copy of GTP2 called DecisionTranformerGPT2Model
* isorted imports
* isorted imports
* Added model to non-English README files
* Ran make fix-copies and corrected some cases.
* Updated index file to include Decision Transformer
* Added gpt2 model as copy inside the Decision Transformer model file
* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS
* Deleted redundant checkpoint files (I don't know how these got committed)
* Removed testing files. (These should have never been committed)
* Removed accidentally committed files
* Moved the Decision Transformer test to its own directory
* Add type hints for Pegasus (#16324)
* Funnel type hints (#16323)
* add pt funnel type hints
* add tf funnel type hints
* Add type hints for ProphetNet PyTorch (#16272)
* [GLPN] Improve docs (#16331)
* Add link to notebook
* Add link
* Fix bug
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Added type hints for Pytorch Marian calls (#16200)
* Added type hinting for forward functions in pytorch marian
* typo correction
* Removed type hints on functions from BART per Suraj Patil request
* fix import pb
* fix typo
* corrected tuple call
* ran black
* after fix-copies
Some optional tags on primitives were removed, past_key_values in MarianForCausalLM changed from Tuple of Tuple to List
* Fixing copies to roformer and pegasus
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
* Moved DecisionTransformOutput to modeling_decision_transformer
* Moved the example usage to research project and cleaned comments
* Made tests ignore the copy of gpt2 in Decision Transformer
* Added module output to modelling decision transformer
* removed copied gpt2 model from list of transformers models
* Updated tests and created __init__ file for new test location
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unneeded summary type from config file
* Fixed copies
* Updated pretrained config map to refer to hopper-medium checkpoint
* done (#16340)
* Added Decision transformer to model docs
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add type annotations for Rembert/Splinter and copies (#16338)
* undo black autoformat
* minor fix to rembert forward with default
* make fix-copies, make quality
* Adding types to template model
* Removing List from the template types
* Remove `Optional` from a couple of types that don't accept `None`
Co-authored-by: matt <rocketknight1@gmail.com>
* [Bug template] Shift responsibilities for long-range (#16344)
* Fix code repetition in serialization guide (#16346)
* Adopt framework-specific blocks for content (#16342)
* ✨ refactor code samples with framework-specific blocks
* ✨ update training.mdx
* 🖍 apply feedback
* Updates the default branch from master to main (#16326)
* Updates the default branch from master to main
* Links from `master` to `main`
* Typo
* Update examples/flax/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updated model with custom docstring example
* Created the Decision Transformer Modle
* updating tests, copy to other machine
* Added last hidden size to Decision Transformer modelling outputs
* Removed copy of original DT file
* made a temporary change to gpt2 to have it conform with the Decision Transformer version
* Updated tests
* Ignoring a file used to test the DT model
* added comments to config file
* added comments and argument descriptions to decision transformer file
* Updated doc
* Ran "make style"
* Remove old model imports
* Removed unused imports, cleaned up init file
* Update docs/source/model_doc/decision_transformer.mdx
added my username
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Reverted changes made to gpt2
* Removed datasets submodule
* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states
* Added support for return of hidden states, attentions and return dict of gpt2 model.
* Updated tests to include many of the ModelTesterMixin tests.
The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes
* Added missing line to the end of gpt2 file
* Added an integration test for the Decision Transformer
Test performs and autoregressive evaluation for two time steps
* Set done and info to _ to fix failing test
* Updated integration test to be deterministic and check expected outputs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unnecessary config options
* Cleaned up commented code and old comments.
* Cleaned up commented code.
* Changed DecisionTransformer to Decision Transformer
* Added Decision Transformer to the main README file
* Added copy of GTP2 called DecisionTranformerGPT2Model
* isorted imports
* isorted imports
* Added model to non-English README files
* Ran make fix-copies and corrected some cases.
* Updated index file to include Decision Transformer
* Added gpt2 model as copy inside the Decision Transformer model file
* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS
* Deleted redundant checkpoint files (I don't know how these got committed)
* Removed testing files. (These should have never been committed)
* Removed accidentally committed files
* Moved the Decision Transformer test to its own directory
* Moved DecisionTransformOutput to modeling_decision_transformer
* Moved the example usage to research project and cleaned comments
* Made tests ignore the copy of gpt2 in Decision Transformer
* Added module output to modelling decision transformer
* removed copied gpt2 model from list of transformers models
* Updated tests and created __init__ file for new test location
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unneeded summary type from config file
* Fixed copies
* Updated pretrained config map to refer to hopper-medium checkpoint
* Added Decision transformer to model docs
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updated model with custom docstring example
* Updated copies, config auto, and readme files.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Tegzes <48134725+Tegzes@users.noreply.github.com>
Co-authored-by: Adam Montgomerie <adam@avanssion.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: Jacob Dineen <54680234+jacobdineen@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Updates the default branch from master to main
* Links from `master` to `main`
* Typo
* Update examples/flax/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add Flaubert to ONNX to make it available for conversion.
* Fixed features for FlauBERT. fixup command remove flaubert to docs list.
Co-authored-by: ChainYo <t.chaigneau.tc@gmail.com>
* Remove unused attributes
* Add link to blog and add clarification about input size
* Improve readability of the code
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update training.mdx
Fixed Error Raised Due to Wrongly Accessing Training Sample
* Ran make style
* Revert to Old Commit
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Draft a guide with our code quirks for new models
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* up
* up
* up
* fix
* yeh
* ups
* Empty test commit
* correct quicktour
* correct
* correct
* up
* up
* uP
* uP
* up
* up
* uP
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* Update src/transformers/models/van/modeling_van.py
* finish
* apply suggestions
* remove folder
* revert to daily testing
* [Generate Docs] Correct docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* padding done
* correctly return one attention per layer
* almost correct, attentions are not flatten one tuple per stage
* tests green
* doc
* conversations
* reshaping hidden_states
* view in the test
* reshape_hidden_states in Encoder and Model
* new outputs with reshaped_hidden_states
* conversations
* doc
* Update docs/source/model_doc/swin.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* conversations
* fix tests
* minor changes
* resolved conversations
* attentions one per stage
* typo
* typos
* typos
* function signature
* CI
* clean up tests
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Fix inconsistent example variable naming
- Example code for a sequence classification in Tensorflow had spelling mistakes and incorrect and inconsistent naming
- Changed variable naming to be consistent with the two other TF examples
* Fix incorrect incorrect training examples
* first commit
* ResNet model correctly implemented.
basic modeling + weights conversion is done
removed unused doc
mdx file
doc and conversion script
added feature_extractor to auto
test
minor changes + style + quality
doc
test
Delete process.yml
A left over from my attempt of running circleci locally
* minor changes
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* new test format
* minor changes from conversations
* minor changes from conversations
* make style + quality
* readded the tests
* test + README
* minor changes from conversations
* error in README
* make fix-copies
* removed regression for classification head
* make quality
* fixed loss control flow
* fixed loss control flow
* resolved conversations
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* READMEs
* index.mdx
* minor changes
* updated tests and models
* unused import
* outputs
* Update docs/source/model_doc/resnet.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* added embeddings_size
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* conversation
* added push to hub
* test
* embedding_size
* make fix-copies
* resolved conversations
* CI
* changed organization
* minor changes
* CI
* minor changes
* conversations
* conversation
* doc
* tests
* removed unused docstring
* conversation
* removed unused outputs
* CI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add ONNX support for ViT
* Refactor to use generic preprocessor
* Add vision dep to tests
* Extend ONNX slow tests to ViT
* Add dummy image generator
* Use model_type to determine modality
* Add deprecation warnings for tokenizer argument
* Add warning when overwriting the preprocessor
* Add optional args to docstrings
* Add minimum PyTorch version to OnnxConfig
* Refactor OnnxConfig class variables from CONSTANT_NAME to snake_case
* Add reasonable value for default atol
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* test
* up
* up
* Empty test commit
* up
* update tests
* up
* fix some vision models
* correct
* correct docs
* Trigger notification
* finalize
* check
* correct quicktour
* Apply suggestions from code review
* improve doctests
* Trigger Build
* next try
* next try
* and again
* Output current clone information
* Output current clone information
* Correct path
* add tf round again
* revert to daily job
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* added classes to get started with constrained beam search
* in progress, think i can directly force tokens now but not yet with the round robin
* think now i have total control, now need to code the bank selection
* technically works as desired, need to optimize and fix design choices leading to undersirable outputs
* complete PR #1 without disjunctive decoding
* removed incorrect tests
* Delete k.txt
* Delete test.py
* Delete test.sh
* revert changes to test scripts
* genutils
* full implementation with testing, no disjunctive yet
* shifted docs
* passing all tests realistically ran locally
* removing accidentally included print statements
* fixed source of error in initial PR test
* fixing the get_device() vs device trap
* fixed documentation docstrings about constrained_beam_search
* fixed tests having failing for Speech2TextModel's floating point inputs
* fix cuda long tensor
* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search
* deleted accidentally added test halting code with assert False
* code reformat
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
* fixing based on comments on PR
* took out the testing code that should but work fails without the beam search moditification ; style changes
* fixing comments issues
* docstrings for ConstraintListState
* typo in PhrsalConstraint docstring
* docstrings improvements
* finished adding what is sort of an opinionated implementation of disjunctive generation, but it revealed errors in inner beam search logic during testing.
* fixed bug found in constrained beam search that used beam_idx that were not global across all the batches
* disjunctive constraint working 100% correctly
* passing all tests
* Accidentally included mlruns
* Update src/transformers/generation_beam_constraints.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/generation_beam_constraints.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* complete overhaul of type complexities and other nits
* strict type checks in generate()
* fixing second round of feedback by narsil
* fixed failing generation test because of type check overhaul
* generation test fail fix
* fixing test fails
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add TF logits wrappers
* Add sample method
* add tests for TF logit wrappers
* TF generate sample tests now run on CPU
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* maskformer
* conflicts
* conflicts
* minor fixes
* feature extractor test fix
refactor MaskFormerLoss following conversation
MaskFormer related types should not trigger a module time import error
missed one
removed all the types that are not used
update config mapping
minor updates in the doc
resolved conversation that doesn't need a discussion
minor changes
resolved conversations
fixed DetrDecoder
* minor changes
minor changes
fixed mdx file
test feature_extractor return types
functional losses -> classes
removed the return type test for the feature extractor
minor changes + style + quality
* conflicts?
* rebase master
* readme
* added missing files
* deleded poolformers test that where in the wrong palce
* CI
* minor changes
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* resolved conversations
* minor changes
* conversations
[Unispeech] Fix slow tests (#15818)
* remove soundfile old way of loading audio
* Adapt slow test
[Barthez Tokenizer] Fix saving (#15815)
[TFXLNet] Correct tf xlnet generate (#15822)
* [TFXLNet] Correct tf xlnet
* adapt test comment
Fix the push run (#15807)
Fix semantic segmentation pipeline test (#15826)
Fix dummy_inputs() to dummy_inputs in symbolic_trace doc (#15776)
Add model specific output classes to PoolFormer model docs (#15746)
* Added model specific output classes to poolformer docs
* Fixed Segformer typo in Poolformer docs
Adding the option to return_timestamps on pure CTC ASR models. (#15792)
* Adding the option to return_timestamps on pure CTC ASR models.
* Remove `math.prod` which was introduced in Python 3.8
* int are not floats.
* Reworking the PR to support "char" vs "word" output.
* Fixup!
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
HFTracer.trace should use/return self.graph to be compatible with torch.fx.Tracer (#15824)
Fix tf.concatenate + test past_key_values for TF models (#15774)
* fix wrong method name tf.concatenate
* add tests related to causal LM / decoder
* make style and quality
* clean-up
* Fix TFBertModel's extended_attention_mask when past_key_values is provided
* Fix tests
* fix copies
* More tf.int8 -> tf.int32 in TF test template
* clean-up
* Update TF test template
* revert the previous commit + update the TF test template
* Fix TF template extended_attention_mask when past_key_values is provided
* Fix some styles manually
* clean-up
* Fix ValueError: too many values to unpack in the test
* Fix more: too many values to unpack in the test
* Add a comment for extended_attention_mask when there is past_key_values
* Fix TFElectra extended_attention_mask when past_key_values is provided
* Add tests to other TF models
* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder
* Fix not passing training arg to lm_head in TFRobertaForCausalLM
* Fix tests (with past) for TF Roberta
* add testing for pask_key_values for TFElectra model
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
[examples/summarization and translation] fix readme (#15833)
Add ONNX Runtime quantization for text classification notebook (#15817)
Re-enable doctests for the quicktour (#15828)
* Re-enable doctests for the quicktour
* Re-enable doctests for task_summary (#15830)
* Remove &
Framework split model report (#15825)
Add TFConvNextModel (#15750)
* feat: initial implementation of convnext in tensorflow.
* fix: sample code for the classification model.
* chore: added checked for from the classification model.
* chore: set bias initializer in the classification head.
* chore: updated license terms.
* chore: removed ununsed imports
* feat: enabled argument during using drop_path.
* chore: replaced tf.identity with layers.Activation(linear).
* chore: edited default checkpoint.
* fix: minor bugs in the initializations.
* partial-fix: tf model errors for loading pretrained pt weights.
* partial-fix: call method updated
* partial-fix: cross loading of weights (4x3 variables to be matched)
* chore: removed unneeded comment.
* removed playground.py
* rebasing
* rebasing and removing playground.py.
* fix: renaming TFConvNextStage conv and layer norm layers
* chore: added initializers and other minor additions.
* chore: added initializers and other minor additions.
* add: tests for convnext.
* fix: integration tester class.
* fix: issues mentioned in pr feedback (round 1).
* fix: how output_hidden_states arg is propoagated inside the network.
* feat: handling of arg for pure cnn models.
* chore: added a note on equal contribution in model docs.
* rebasing
* rebasing and removing playground.py.
* feat: encapsulation for the convnext trunk.
* Fix variable naming; Test-related corrections; Run make fixup
* chore: added Joao as a contributor to convnext.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: corrected copyright year and added comment on NHWC.
* chore: fixed the black version and ran formatting.
* chore: ran make style.
* chore: removed from_pt argument from test, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* fix: tests in the convnext subclass, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: moved convnext test to the correct location
* fix: locations for the test file of convnext.
* fix: convnext tests.
* chore: applied sgugger's suggestion for dealing w/ output_attentions.
* chore: added comments.
* chore: applied updated quality enviornment style.
* chore: applied formatting with quality enviornment.
* chore: revert to the previous tests/test_modeling_common.py.
* chore: revert to the original test_modeling_common.py
* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py
* fix: tests for convnext.
* chore: removed output_attentions argument from convnext config.
* chore: revert to the earlier tf utils.
* fix: output shapes of the hidden states
* chore: removed unnecessary comment
* chore: reverting to the right test_modeling_tf_common.py.
* Styling nits
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* minor changes
* doc fix in feature extractor
* doc
* typose
* removed detr logic from config
* removed detr logic from config
* removed num_labels
* small fix in the config
* auxilary -> auxiliary
* make style
* some test is failing
* fix a weird char in config prevending doc-builder
* retry to fix the doc-builder issue
* make style
* new try to fix the doc builder
* CI
* change weights to facebook
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Add data2vec model cloned from roberta
* Add checkpoint conversion script
* Fix copies
* Update docs
* Add checkpoint conversion script
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update documentation
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* add inputs to logits to data2vec'
* correct autio models
* correct config auto
* correct tok auto
* Update utils/tests_fetcher.py
* delete unnecessary files
* delete unnecessary files
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix copies
* Update docs
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update documentation
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* add inputs to logits to data2vec'
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* correct autio models
* correct config auto
* correct tok auto
* delete unnecessary files
* delete unnecessary files
* Update utils/tests_fetcher.py
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Move data2vec tests to new structure
* Fix test imports for text tests
* Remove fairseq files
* Change paper link to arxiv
* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.
* Update text model checkpoint to be facebook/data2vec-text-base
* Add 'Copy from' statements and update paper links and docs
* fix copy from statements
* improve copied from
* correct more copied from statements
* finish copied from stuff
* make style
* add model to README
* add to master
Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* rebase
* Delete shift tokens func
* downsample decoder input seq len for init
* correct attention mask
* add tests
* pt flax cross test
* make fixup
* init file for import
* change pt-flax cross test threshold
* pt-flax test logits only
* move tests
* make repo-consistency
* consistent indentation
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* feat: initial implementation of convnext in tensorflow.
* fix: sample code for the classification model.
* chore: added checked for from the classification model.
* chore: set bias initializer in the classification head.
* chore: updated license terms.
* chore: removed ununsed imports
* feat: enabled argument during using drop_path.
* chore: replaced tf.identity with layers.Activation(linear).
* chore: edited default checkpoint.
* fix: minor bugs in the initializations.
* partial-fix: tf model errors for loading pretrained pt weights.
* partial-fix: call method updated
* partial-fix: cross loading of weights (4x3 variables to be matched)
* chore: removed unneeded comment.
* removed playground.py
* rebasing
* rebasing and removing playground.py.
* fix: renaming TFConvNextStage conv and layer norm layers
* chore: added initializers and other minor additions.
* chore: added initializers and other minor additions.
* add: tests for convnext.
* fix: integration tester class.
* fix: issues mentioned in pr feedback (round 1).
* fix: how output_hidden_states arg is propoagated inside the network.
* feat: handling of arg for pure cnn models.
* chore: added a note on equal contribution in model docs.
* rebasing
* rebasing and removing playground.py.
* feat: encapsulation for the convnext trunk.
* Fix variable naming; Test-related corrections; Run make fixup
* chore: added Joao as a contributor to convnext.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: corrected copyright year and added comment on NHWC.
* chore: fixed the black version and ran formatting.
* chore: ran make style.
* chore: removed from_pt argument from test, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* fix: tests in the convnext subclass, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: moved convnext test to the correct location
* fix: locations for the test file of convnext.
* fix: convnext tests.
* chore: applied sgugger's suggestion for dealing w/ output_attentions.
* chore: added comments.
* chore: applied updated quality enviornment style.
* chore: applied formatting with quality enviornment.
* chore: revert to the previous tests/test_modeling_common.py.
* chore: revert to the original test_modeling_common.py
* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py
* fix: tests for convnext.
* chore: removed output_attentions argument from convnext config.
* chore: revert to the earlier tf utils.
* fix: output shapes of the hidden states
* chore: removed unnecessary comment
* chore: reverting to the right test_modeling_tf_common.py.
* Styling nits
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* custom_models: tiny doc addition
* mention security feature earlier in the section
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [Proposal] Adding ZeroShotImageClassificationPipeline
- Based on CLIP
* WIP, Resurection in progress.
* Resurrection... achieved.
* Reword handling different `padding_value` for `feature_extractor` and
`tokenizer`.
* Thanks doc-builder !
* Adding docs + global namespace `ZeroShotImageClassificationPipeline`.
* Fixing templates.
* Make the test pass and be robust to floating error.
* Adressing suraj's comments on docs mostly.
* Tf support start.
* TF support.
* Update src/transformers/pipelines/zero_shot_image_classification.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* doc for adding a model to the hub
* run make style
* resolved conversation
* removed a line
* removed )
* Update docs/source/add_new_model.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/add_new_model.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added all files, PoolFormerFeatureExtractor still failing tests
* Fixed PoolFormerFeatureExtractor not being able to import
* Completed Poolformer doc
* Applied Suggested fixes
* Fixed errors in modeling_auto.py
* Fix feature extractor, convert docs to Markdown, styling of code
* Remove PoolFormer from check_repo and fix integration test
* Remove Poolformer from check_repo
* Fixed configuration_poolformer.py docs and removed inference.py from poolformer
* Ran with black v22
* Added PoolFormer to _toctree.yml
* Updated poolformer doc
* Applied suggested fixes and added on README.md
* Did make fixup and make fix-copies, tests should pass now
* Changed PoolFormer weights conversion script name and fixed README
* Applied fixes in test_modeling_poolformer.py and modeling_poolformer.py
* Added PoolFormerFeatureExtractor to AutoFeatureExtractor API
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
* TF generate start refactor
* Add tf tests for sample generate
* re-organize
* boom boom
* Apply suggestions from code review
* re-add
* add all code
* make random greedy pass
* make encoder-decoder random work
* further improvements
* delete bogus file
* make gpt2 and t5 tests work
* finish logits tests
* correct logits processors
* correct past / encoder_outputs drama
* refactor some methods
* another fix
* refactor shape_list
* fix more shape list
* import shape
_list
* finish docs
* fix imports
* make style
* correct tf utils
* Fix TFRag as well
* Apply Lysandre's and Sylvais suggestions
* Update tests/test_generation_tf_logits_process.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/tf_utils.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* remove cpu according to gante
* correct logit processor
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add TensorFlow support for ONNX export
* Change documentation to mention conversion with Tensorflow
* Refactor export into export_pytorch and export_tensorflow
* Check model's type instead of framework installation to choose between TF and Pytorch
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Alberto Bégué <alberto.begue@della.ai>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* added classes to get started with constrained beam search
* in progress, think i can directly force tokens now but not yet with the round robin
* think now i have total control, now need to code the bank selection
* technically works as desired, need to optimize and fix design choices leading to undersirable outputs
* complete PR #1 without disjunctive decoding
* removed incorrect tests
* Delete k.txt
* Delete test.py
* Delete test.sh
* revert changes to test scripts
* genutils
* full implementation with testing, no disjunctive yet
* shifted docs
* passing all tests realistically ran locally
* removing accidentally included print statements
* fixed source of error in initial PR test
* fixing the get_device() vs device trap
* fixed documentation docstrings about constrained_beam_search
* fixed tests having failing for Speech2TextModel's floating point inputs
* fix cuda long tensor
* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search
* deleted accidentally added test halting code with assert False
* code reformat
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
* fixing based on comments on PR
* took out the testing code that should but work fails without the beam search moditification ; style changes
* fixing comments issues
* docstrings for ConstraintListState
* typo in PhrsalConstraint docstring
* docstrings improvements
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* PoC for a ProcessorMixin class
* Documentation
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Roll out to other processors
* Add base feature extractor class in init
* Use args and kwargs
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add wrapper classes
* convert inner layers to tf
* Add TF Encoder and Decoder layers
* TFSpeech2Text models
* Loadable model
* TF model with same outputs as PT model
* test skeleton
* correct tests and run the fixup
* correct attention expansion
* TFSpeech2Text pask_key_values with TF format
* electra is added to onnx supported model
* add google/electra-base-generator for test onnx module
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
* add xlm roberta xl
* add convert xlm xl fairseq checkpoint to pytorch
* fix init and documents for xlm-roberta-xl
* fix indention
* add test for XLM-R xl,xxl
* fix model hub name
* fix some stuff
* up
* correct init
* fix more
* fix as suggestions
* add torch_device
* fix default values of doc strings
* fix leftovers
* merge to master
* up
* correct hub names
* fix docs
* fix model
* up
* finalize
* last fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add copied from
* make style
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* clean commit of changes
* apply review feedback, make edits
* fix backticks, minor formatting
* 🖍 make fixup and minor edits
* 🖍 fix # in header
* 📝 update code sample without from_pt
* 📝 final review
* Added missing code in exemplary notebook - custom datasets fine-tuning
Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification.
The missing code concerns adding labels for all but first token in a single word.
The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb).
* Changes requested in the review - keep the code as simple as possible
* First commit
* Add conversion script
* Make conversion script work for base model
* More improvements
* Update conversion script, works for vqa
* Add indexing argument to meshgrid
* Make conversion script work for ViltForPreTraining
* Add ViltForPreTraining to docs
* Fix device issue
* Add processor
* Add MinMaxResize to feature extractor
* Implement call method of ViltProcessor
* Fix tests
* Add integration test
* Add loss calculation for VQA
* Improve tests
* Improve some more tests
* Debug tests
* Small improvements
* Add support for attention_mask
* Remove mask_it
* Add pixel_mask
* Add tests for ViltFeatureExtractor
* Improve tests
* Add ViltForNaturalLanguageVisualReasoning
* Add ViltForNaturalLanguageVisualReasoning to conversion script
* Minor fixes
* Add support for image_embeds, update docstrings to markdown
* Update docs to markdown
* Improve conversion script
* Rename ViltForPreTraining to ViltForMaskedLM
* Improve conversion script
* Convert docstrings to markdown
* Fix code example of retrieval model
* Properly convert masked language model
* Add integration test for nlvr
* Fix code quality
* Apply suggestions from code review
* Add copied from statements
* Fix pretrained_config_archive_map
* Fix docs
* Add model to README
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply more suggestions from code review
* Make code more readable
* Add ViltForNaturalLanguageVisualReasoning to the tests
* Rename ViltForVisualQuestionAnswering to ViltForQuestionAnswering
* Replace pixel_values_2 by single tensor
* Add hidden_states and attentions
* Fix one more test
* Fix all tests
* Update year
* Fix rebase issues
* Fix another rebase issue
* Remove ViltForPreTraining from auto mapping
* Rename ViltForImageRetrievalTextRetrieval to ViltForImageAndTextRetrieval
* Make it possible to use BertTokenizerFast in the processor
* Use BertTokenizerFast by default
* Rename ViltForNaturalLanguageVisualReasoning, define custom model output
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft
* More improvements
* More improvements
* More improvements
* Fix embeddings
* Add conversion script
* Finish conversion script
* More improvements
* Fix forward pass
* Remove print statements
* Add weights initialization
* Add initialization of decoder weights
* Add support for other models in the conversion script
* Fix patch_size for huge model
* Fix most of the tests
* Fix integration test
* Fix docs
* Fix archive_list
* Apply suggestions from code review
* Improve documentation
* Apply more suggestions
* Skip some tests due to non-deterministic behaviour
* Fix test_initialization
* Remove unneccessary initialization of nn.Embedding
* Improve docs
* Fix dummies
* Remove ViTMAEFeatureExtractor from docs
* Add model to README and table of contents
* Delete inference file
* update XLMProphetNet link
* update DPR link
* change prophetnet link
* change link MBART
* change link GPT
* update gpt2 link
* ctrl update link
* update Transformer-XL link
* Update Reformer link
* update xlnet link
* bert update link
* udpate albert link
* roberta update link
* update distilbert link
* update convbert link
* update XLM link
* xlm roberta update link
* update Flaubert link
* update electra link
* update funnel transformer and longformer
* bart update link
* pegasus update link
* udpate marianmt link
* t5 update link
* mt5 update link
* Add ONNX classes to main package
* Remove permalinks from ONNX guide
* Fix ToC entry
* Revert "Add ONNX classes to main package"
This reverts commit eb794a5b00.
* Add ONNX classes to main doc
* Fix syntax highlighting in doc
* Fix text
* Add FeaturesManager to doc
* Use paths to reference ONNX classes
* Add FeaturesManager to init
* Add missing ONNX paths
* Add IBertOnnxConfig and tests
* add all the supported features for IBERT and remove outputs in IbertOnnxConfig
* use OnnxConfig
* fix codestyle
* remove serialization.rst
* codestyle
* Start the work on TFVisionEncoderDecoderModel
* Expose TFVisionEncoderDecoderModel
* fix import
* Add modeling_tf_vision_encoder_decoder to _ignore_modules in get_model_modules()
* reorder
* Apply the fix for checkpoint loading as in #14016
* remove attention_mask + fix VISION_DUMMY_INPUTS
* A minimal change to make TF generate() work for vision models as encoder in encoder-decoder setting
* fix wrong condition: shape_list(input_ids) == 2
* add tests
* use personal TFViTModel checkpoint (for now)
* Add equivalence tests + projection layer
* style
* make sure projection layer can run
* Add examples
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean comments (need to work on TODOs for PyTorch models)
* Remove TF -> PT in check_pt_tf_equivalence for TFVisionEncoderDecoderModel
* fixes
* Revert changes in PT code.
* Update tests/test_modeling_tf_vision_encoder_decoder.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add test_inference_coco_en for TF test
* fix quality
* fix name
* build doc
* add main_input_name
* Fix ckpt name in test
* fix diff between master and this PR
* fix doc
* fix style and quality
* fix missing doc
* fix labels handling
* Delete auto.rst
* Add the changes done in #14016
* fix prefix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add FlaxRoFormer
* Clean code + make quality
* Fix output pooling for FlaxRoFormerForMultipleChoiceModule
* Apply suggestions from code review
* add flax model to repos
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix bad examples
* Add black formatting to style_doc
* Use first nonempty line
* Put it at the right place
* Don't add spaces to empty lines
* Better templates
* Deal with triple quotes in docstrings
* Result of style_doc
* Enable mdx treatment and fix code examples in MDXs
* Result of doc styler on doc source files
* Last fixes
* Break copy from
* Add ElectraForCausalLM and cover some basic tests & need to fix a few tests
* Fix bugs
* make style
* make fix-copies
* Update doc
* Change docstring to markdown format
* Remove redundant update_keys_to_ignore
* Pipeline chunks.
* Batching for Chunking pipelines ?
* Batching for `question-answering` and `zero-shot-cls`.
* Fixing for FNet.
* Making ASR a chunk pipeline.
* Chunking ASR API.
* doc style.
* Fixing ASR test.
* Fixing QA eror (p_mask, padding is 1, not 0).
* Enable both vad and simple chunking.
* Max length for vad.
* remove inference mode, crashing on s2t.
* Revert ChunkPipeline for ASRpipeline.
Too many knobs for simple integration within the pipeline, better stick
to external convenience functions instead, more control to be had,
simpler pipeline and also easier to replace with other things later.
* Drop necessity for PT for these.
* Enabling generators.
* Add mic + cleanup.
* Typo.
* Typo2.
* Remove ASR work, it does not belong in this PR anymore.
* Update src/transformers/pipelines/pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Adding many comments.
* Doc quality.
* `hidden_states` handling.
* Adding doc.
* Bad rebase.
* Autofixing docs.
* Fixing CRITICAL bug in the new Zerocls pipeline.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* First commit to add MarianMT to ONNX
* Now MarianModel.forward() automatically generates decoder_input_ids, like BartModel.forward()
* Adjusted MarianOnnxConfig.inputs and outputs to work with seq2seq-lm feature
* Style fix
* Added support for other features for already supported models
* Partial support for causal and seq2seq models
* Partial support for causal and seq2seq models
* Add default task for MarianMT ONNX
* Remove automatic creation of decoder_input_ids
* Extend inputs and outputs for MarianMT ONNX config
* Add MarianMT to ONNX unit tests
* Refactor
* OnnxSeq2SeqConfigWithPast to support seq2seq models
* Parameterized the onnx tests
* Restored run_mlm.py
* Restored run_mlm.py
* [WIP] BART update
* BART and MBART
* Add past_key_values and fix dummy decoder inputs
Using a sequence length of 1 in generate_dummy_outputs() produces large discrepancies, presumably due to some hidden optimisations.
* Refactor MarianOnnxConfig to remove custom past_key_values logic
* Fix quality
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"
This reverts commit 0f4e39c559.
* is_torch_available test to avoid failing imports
* sorting parameterize parameters to solve ERROR gw0 gw1
* tests fix
* tests fix
* GPT2 with past fix
* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially
* Removed onnx file
* Refactor Marian export to account for base changes
* Fix copies
* Implemented suggestions
* Extend support for causal LM
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"
This reverts commit 0f4e39c559.
* is_torch_available test to avoid failing imports
* sorting parameterize parameters to solve ERROR gw0 gw1
* tests fix
* tests fix
* GPT2 with past fix
* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially
* Removed onnx file
* Implemented suggestions
* Fixed __init__ to resolve conflict with master
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"
This reverts commit 0f4e39c559.
* is_torch_available test to avoid failing imports
* sorting parameterize parameters to solve ERROR gw0 gw1
* tests fix
* tests fix
* GPT2 with past fix
* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially
* Removed onnx file
* Implemented suggestions
* Fixed __init__ to resolve conflict with master
* Remove commented import
* Remove ONNX model
* Remove redundant class method
* Tidy up imports
* Fix quality
* Refactor dummy input function
* Add copied from statements to Marian config functions
* Remove false copied from comments
* Fix copy from comment
Co-authored-by: Massimiliano Bruni <massimiliano.bruni@hcl.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* PoC for conserving old links
* Do the same for other links
* remap the redirects section
* add instructions on how to move sections
* improve
Co-authored-by: Stas Bekman <stas@stason.org>
* Test workflow
* Build doc
* Make a clean build
* Add doc config
* Restore other workflows
* Final job
* Print something in else statements
* Pull before making changes
* Convert a few docs
* And another
* Last tutorials
* New syntax for colab links
* Convert a few docs
* And another
* Last tutorials
* New syntax for colab links
* First draft
* Style and remove mlm
* Make forward pass work
* More improvements
* More improvements
* Fix bug
* More improvements
* More improvements
* Add PerceiverTokenizer first draft
* Improve conversion script
* More improvements
* Make conversion script work for the encoder
* Make conversion script work with local pickle files
* Style & quality, fix-copies
* Add dummy input to conversion script
* Add absolute position embeddings to TextPreProcessor
* Make forward pass of encoder work
* More improvements
* Move text preprocessor to separate script
* More improvements
* More improvements
* Add post processor
* Make MLM model work
* Style
* Add PerceiverForMaskedLM
* Add PerceiverImagePreprocessor
* Make style
* Make PerceiverForImageClassification work
* More improvements
* More improvements
* Use tokenizer in conversion script
* Use PerceiverForMaskedLM in conversion script
* Define custom PerceiverModelOutput
* Improve PerceiverAttention to make it work for both MLM and image classification
* More improvements
* More improvements
* More improvements to the conversion script
* Make conversion script work for both MLM and image classification
* Add PerceiverFeatureExtractor
* More improvements
* Style and quality
* Add center cropping
* Fix bug
* Small fix
* Add print statement
* Fix bug in image preprocessor
* Fix bug with conversion script
* Make output position embeddings an nn.Parameter layer instead of nn.Embedding
* Comment out print statements
* Add position encoding classes
* More improvements
* Use position_encoding_kwargs
* Add PerceiverForImageClassificationFourier
* Make style & quality
* Add PerceiverForImageClassificationConvProcessing
* Style & quality
* Add flow model
* Move processors to modeling file
* Make position encodings modular
* Make basic decoder use modular position encodings
* Add PerceiverForOpticalFlow to conversion script
* Add AudioPreprocessor
* Make it possible for the basic decoder to use Fourier position embeddings
* Add PerceiverForMultimodalAutoencoding
* Improve model for optical flow
* Improve _build_network_inputs method
* Add print statement
* Fix device issue
* Fix device of Fourier embeddings
* Add print statements for debugging
* Add another print statement
* Add another print statement
* Add another print statement
* Add another print statement
* Improve PerceiverAudioPreprocessor
* Improve conversion script for multimodal modal
* More improvements
* More improvements
* Improve multimodal model
* Make forward pass multimodal model work
* More improvements
* Improve tests
* Fix some more tests
* Add output dataclasses
* Make more tests pass
* Add print statements for debuggin
* Add tests for image classification
* Add PerceiverClassifierOutput
* More improvements
* Make more tests pass for the optical flow model
* Make style & quality
* Small improvements
* Don't support training for optical flow model for now
* Fix _prepare_for_class for tests
* Make more tests pass, add some docs
* Add multimodal model to tests
* Minor fixes
* Fix tests
* Improve conversion script
* Make fixup
* Remove pos_dim argument
* Fix device issue
* Potential fix for OOM
* Revert previous commit
* Fix test_initialization
* Add print statements for debugging
* Fix print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Remove need for output_shape
* Comment out output_shape
* Remove unnecessary code
* Improve docs
* Fix make fixup
* Remove PerceiverTextProcessor from init
* Improve docs
* Small improvement
* Apply first batch of suggestions from code review
* Apply more suggestions from code review
* Update docstrings
* Define dicts beforehand for readability
* Rename task to architecture in conversion script, include PerceiverModel in tests
* Add print statements for debugging
* Fix tests on GPU
* Remove preprocessors, postprocessors and decoders from main init
* Add integration test
* Fix docs
* Replace einops by torch
* Update for new docs frontend
* Rename PerceiverForImageClassification
* Improve docs
* Improve docs
* Improve docs of PerceiverModel
* Fix some more tests
* Improve center_crop
* Add PerceiverForSequenceClassification
* Small improvements
* Fix tests
* Add integration test for optical flow model
* Clean up
* Add tests for tokenizer
* Fix tokenizer by adding special tokens properly
* Fix CI
* up
* up
* up
* make it cleaner
* correct
* make styhahalal
* add more tests
* finish
* small fix
* make style
* up
* tryout to solve cicrle ci
* up
* fix more tests
* fix more tests
* apply sylvains suggestions
* fix import
* correct docs
* add pyctcdecode only to speech tests
* fix more tests
* add tf, flax and pt tests
* add pt
* fix last tests
* fix more tests
* Apply suggestions from code review
* change lines
* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* correct tests
* correct tests
* add doc string
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* implement MLukeTokenizer and LukeForMaskedLM
* update tests
* update docs
* add LukeForMaskedLM to check_repo.py
* update README
* fix test and specify the entity pad id in tokenization_(m)luke
* fix EntityPredictionHeadTransform
* Make DefaultDataCollator importable from root
* Add documentation for DefaultDataCollator and add return_tensors argument to all class docstrings
* make style
* Add DefaultDataCollator to data_collator.rst
* Add DefaultDataCollator to data_collator.rst
* Init Flax implementation for Blenderbot
* Add a majority of stuff except for tests
* make style quality
* Add tests and fix some bugs
* Add tests
* Clean source code and fix some bugs
* Fix copies and docs
* Fix jax device condition for tests
* Fix layer norm in the encoder
* Fix a few typos in the test file
* make fix-copies
* make fix-copies
* fix layer norm
* Fix Flax params dtype (#13090)
* Fix PR reference (#13098)
* make fix-copies
* Update tests/test_modeling_flax_blenderbot.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* TF Tapas first commit
* updated docs
* updated logger message
* updated pytorch weight conversion
script to support scalar array
* added use_cache to tapas model config to
work properly with tf input_processing
* 1. rm embeddings_sum
2. added # Copied
3. + TFTapasMLMHead
4. and lot other small fixes
* updated docs
* + test for tapas
* updated testing_utils to check
is_tensorflow_probability_available
* converted model logits post processing using
numpy to work with both PT and TF models
* + TFAutoModelForTableQuestionAnswering
* added TF support
* added test for
TFAutoModelForTableQuestionAnswering
* added test for
TFAutoModelForTableQuestionAnswering pipeline
* updated auto model docs
* fixed typo in import
* added tensorflow_probability to run tests
* updated MLM head
* updated tapas.rst with TF model docs
* fixed optimizer import in docs
* updated convert to np
data from pt model is not
`transformers.tokenization_utils_base.BatchEncoding`
after pipeline upgrade
* updated pipeline:
1. with torch.no_gard removed, pipeline forward handles
2. token_type_ids converted to numpy
* updated docs.
* removed `use_cache` from config
* removed floats_tensor
* updated code comment
* updated Copyright Year and
logits_aggregation Optional
* updated docs and comments
* updated docstring
* fixed model weight loading
* make fixup
* fix indentation
* added tf slow pipeline test
* pip upgrade
* upgrade python to 3.7
* removed from_pt from tests
* revert commit f18cfa9
* added save_directories for _psave_pretrained_pt and _tf, changed model to tf_model and pt_model, enable the notebook to run cleanly from top to bottom without error
* Update quicktour.rst
* added >>>
* dependencies
* added space
* [deepspeed] zero inference
* only z3 makes sense for inference
* fix and style
* docs
* rework
* fix test
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* responding to suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Start the work for TFViTModel
* Convert to TF code - need to check in the follow up commits
* Clean up model code
* Expose TFViTModel
* make style
* make quality
* Add test
* make style & quality
* Fix some imports
* fix wrong usage - *kwargs => ** kwargs
* Fix Conv2D weight loading (PT->TF) issue
* Add tests for images with different sizes + fix model
* Fix some common tests for TFViTModel
* Use inputs instead of input_ids in test_compile_tf_model
* Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name
* Avoid transpose in TFViT call
* Fix Conv2D issue in load_tf2_weights_in_pytorch_model
* Use tf.keras.layers.Conv2D instead of tf.nn.conv2d
* Using simpler heuristic to detect Conv2D layer
* Change convert_tf_weight_name_to_pt_weight_name to return TransposeType
* Check tf_weight_shape is not None before using it
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix missing comma
* fix input dtype
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Start PR doc
* Cleanup the quality checks and document them
* Add reference in the contributing guide
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Rename file as per review suggestion
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add Beit model ouput class
* inherting from BaseModelOuputWithPooling
* updated docs if use_mean_pooling is False
* added beit specific outputs in model docs
* changed the import path
* Fix docs
Co-authored-by: Niels Rogge <niels.rogge1@gmail.com>
* Add first draft
* Make forward pass work
* Improve conversion script
* Add notebook that checks if it works
* Add BeitForSemanticSegmentation to the tests
* More improvements
* Make BeitForSemanticSegmentation consistent with Segformer
* Small bug fix
* Add BeitForSemanticSegmentation to docs
* Make sure model doesn't output hidden states when the user doesn't want to
* Make it possible to convert the large model
* Fix issue
* Fix conversion script for large model
* Add auxiliary_head option to semantic segmentation model
* Apply suggestions from @sgugger's review
* Apply suggestions from code review
* Fix failing test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add the support for the fast (rust) implementation of BlenbderbotTokenizer
* Fix a converter and a typo in a doc
* Apply the patil-suraj's suggestion
* (Nitpick) Fast tokenization -> Fast Tokenization in doc
* Apply the SaulLu's suggestion
* Apply Narsil's suggestion to fix test pipelines
* Add encoder_no_repeat_ngram_size according to the Narsil's suggestion
* Revert the last (unnecessary) commit
* Override pipeline config for Blenderbot to allow for larger pos. emb.
* make fix-copies
* First draft
* Make style & quality
* Improve conversion script
* Add print statement to see actual slice
* Make absolute tolerance smaller
* Fix image classification models
* Add post_process_semantic method
* Disable padding
* Improve conversion script
* Rename to ForSemanticSegmentation, add integration test, remove post_process methods
* Improve docs
* Fix code quality
* Fix feature extractor tests
* Fix tests for image classification model
* Delete file
* Add is_torch_available to feature extractor
* Improve documentation of feature extractor methods
* Apply suggestions from @sgugger's code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply some more suggestions of code review
* Rebase with master
* Fix rebase issues
* Make sure model only outputs hidden states when the user wants to
* Apply suggestions from code review
* Add pad method
* Support padding of 2d images
* Add print statement
* Add print statement
* Move padding method to SegformerFeatureExtractor
* Fix issue
* Add casting of segmentation maps
* Add test for padding
* Add small note about padding
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>