* update models
* why rename
* return attn weights when sdpa
* fixes
* fix attn implementation composite
* fix moshi
* add message
* add typings
* use explicitly all flags for each attn type
* fix some tests
* import what is needed
* kosmos on main has ew attention already, yay
* new models in main, run fixup
* won't fix kosmos yet
* fix-copies
* clean up after rebasing
* fix tests
* style
* dont cast attns to fp32
* did we update ruff? oke, let's just do what it asks
* fix pixtral after rebase
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* add sdpa to OPT
* chore: remove redundant whitespace in OPTDecoder class
* fixup
* bug fix
* add sdpa and attention generate test
* fixup
* Refactor OPTAttention forward method for improved readability and maintainability
* undo refactor for _shape and key,val states
* add OPT to doc, fixup didn't find it for some reason
* change order
* change default attn_implemntation in testing to eager
* [run-slow] opt
* change test_eager_matches_sdpa_generate to the one llama
* Update default attention implementation in testing common
* [run-slow] opt
* remove uneeded print
* [run-slow] opt
* refactor model testers to have attn_implementation="eager"
* [run-slow] opt
* convert test_eager_matches_sdpa_generate to opt-350M
* bug fix when creating mask for opt
* [run-slow] opt
* if layer head mask default to eager
* if head mask is not none fall to eager
* [run-slow] opt
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Clean up Unpack imports (#33631)
clean up Unpack imports
* Fix DPT /Dinov2 sdpa regression on main (#33660)
* fallback to eager if output attentions.
* fix copies
* handle dependency errors in check_imports (#33622)
* handle dependency errors in check_imports
* change log level to warning
* add back self.max_position_embeddings = config.max_position_embeddings (#33550)
* add back self.max_position_embeddings = config.max_position_embeddings
* fix-copies
* Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower (#33613)
fix llavaqwen2 model conversion
* Uniformize kwargs for Udop processor and update docs (#33628)
* Add optional kwargs and uniformize udop
* cleanup Unpack
* nit Udop
* Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (#33203)
* Enable BNB multi-backend support (#31098)
* enable cpu bnb path
* fix style
* fix code style
* fix 4 bit path
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* add multi backend refactor tests
* fix style
* tweak 4bit quantizer + fix corresponding tests
* tweak 8bit quantizer + *try* fixing corresponding tests
* fix dequant bnb 8bit
* account for Intel CPU in variability of expected outputs
* enable cpu and xpu device map
* further tweaks to account for Intel CPU
* fix autocast to work with both cpu + cuda
* fix comments
* fix comments
* switch to testing_utils.torch_device
* allow for xpu in multi-gpu tests
* fix tests 4bit for CPU NF4
* fix bug with is_torch_xpu_available needing to be called as func
* avoid issue where test reports attr err due to other failure
* fix formatting
* fix typo from resolving of merge conflict
* polish based on last PR review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix CI
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix error log
* fix error msg
* add \n in error log
* make quality
* rm bnb cuda restriction in doc
* cpu model don't need dispatch
* fix doc
* fix style
* check cuda avaliable in testing
* fix tests
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix doc
* fix check multibackends
* fix import sort
* remove check torch in bnb
* docs: update bitsandbytes references with multi-backend info
* docs: fix small mistakes in bnb paragraph
* run formatting
* reveret bnb check
* move bnb multi-backend check to import_utils
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix bnb check
* minor fix for bnb
* check lib first
* fix code style
* Revert "run formatting"
This reverts commit ac108c6d6b.
* fix format
* give warning when bnb version is low and no cuda found]
* fix device assignment check to be multi-device capable
* address akx feedback on get_avlbl_dev fn
* revert partially, as we don't want the function that public, as docs would be too much (enforced)
---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix error string after refactoring into get_chat_template (#33652)
* Fix error string after refactoring into get_chat_template
* Take suggestion from CR
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* uniformize git processor (#33668)
* uniformize git processor
* update doctring
* Modular `transformers`: modularity and inheritance for new model additions (#33248)
* update exampel
* update
* push the converted diff files for testing and ci
* correct one example
* fix class attributes and docstring
* nits
* oups
* fixed config!
* update
* nitd
* class attributes are not matched against the other, this is missing
* fixed overwriting self.xxx now onto the attributes I think
* partial fix, now order with docstring
* fix docstring order?
* more fixes
* update
* fix missing docstrings!
* examples don't all work yet
* fixup
* nit
* updated
* hick
* update
* delete
* update
* update
* update
* fix
* all default
* no local import
* fix more diff
* some fix related to "safe imports"
* push fixed
* add helper!
* style
* add a check
* all by default
* add the
* update
* FINALLY!
* nit
* fix config dependencies
* man that is it
* fix fix
* update diffs
* fix the last issue
* re-default to all
* alll the fixes
* nice
* fix properties vs setter
* fixup
* updates
* update dependencies
* make sure to install what needs to be installed
* fixup
* quick fix for now
* fix!
* fixup
* update
* update
* updates
* whitespaces
* nit
* fix
* simplify everything, and make it file agnostic (should work for image processors)
* style
* finish fixing all import issues
* fixup
* empty modeling should not be written!
* Add logic to find who depends on what
* update
* cleanup
* update
* update gemma to support positions
* some small nits
* this is the correct docstring for gemma2
* fix merging of docstrings
* update
* fixup
* update
* take doc into account
* styling
* update
* fix hidden activation
* more fixes
* final fixes!
* fixup
* fixup instruct blip video
* update
* fix bugs
* align gemma2 with the rest as well
* updats
* revert
* update
* more reversiom
* grind
* more
* arf
* update
* order will matter
* finish del stuff
* update
* rename to modular
* fixup
* nits
* update makefile
* fixup
* update order of the checks!
* fix
* fix docstring that has a call inside
* fiix conversion check
* style
* add some initial documentation
* update
* update doc
* some fixup
* updates
* yups
* Mostly todo gimme a minut
* update
* fixup
* revert some stuff
* Review docs for the modular transformers (#33472)
Docs
* good update
* fixup
* mmm current updates lead to this code
* okay, this fixes it
* cool
* fixes
* update
* nit
* updates
* nits
* fix doc
* update
* revert bad changes
* update
* updates
* proper update
* update
* update?
* up
* update
* cool
* nits
* nits
* bon bon
* fix
* ?
* minimise changes
* update
* update
* update
* updates?
* fixed gemma2
* kind of a hack
* nits
* update
* remove `diffs` in favor of `modular`
* fix make fix copies
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Fix CIs post merging modular transformers (#33681)
update
* Fixed docstring for cohere model regarding unavailability of prune_he… (#33253)
* Fixed docstring for cohere model regarding unavailability of prune_head() methods
The docstring mentions that cohere model supports prune_heads() methods. I have fixed the docstring by explicitly mentioning that it doesn't support that functionality.
* Update src/transformers/models/cohere/modeling_cohere.py
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Generation tests: update imagegpt input name, remove unused functions (#33663)
* Improve Error Messaging for Flash Attention 2 on CPU (#33655)
Update flash-attn error message on CPU
Rebased to latest branch
* Gemma2: fix config initialization (`cache_implementation`) (#33684)
* Fix ByteLevel alphabet missing when Sequence pretokenizer is used (#33556)
* Fix ByteLevel alphabet missing when Sequence pretokenizer is used
* Fixed formatting with `ruff`.
* Uniformize kwargs for image-text-to-text processors (#32544)
* uniformize FUYU processor kwargs
* Uniformize instructblip processor kwargs
* Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2
* Uniformize llava_next processor
* Fix save_load test for processor with chat_template only as extra init args
* Fix import Unpack
* Fix Fuyu Processor import
* Fix FuyuProcessor import
* Fix FuyuProcessor
* Add defaults for specific kwargs kosmos2
* Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs
* Add tests processor Udop
* remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature
* Fix overwrite tests kwargs processors
* Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop
* Fix processing test fuyu
* remove unnecessary pad_token check in instructblip ProcessorTest
* Fix BC tests and cleanup
* FIx imports fuyu
* Uniformize Pix2Struct
* Fix wrong name for FuyuProcessorKwargs
* Fix slow tests reversed inputs align fuyu llava-next, change udop warning
* Fix wrong logging import udop
* Add check images text input order
* Fix copies
* change text pair handling when positional arg
* rebase on main, fix imports in test_processing_common
* remove optional args and udop uniformization from this PR
* fix failing tests
* remove unnecessary test, fix processing utils and test processing common
* cleanup Unpack
* cleanup
* fix conflict grounding dino
* 🚨🚨 Setting default behavior of assisted decoding (#33657)
* tests: fix pytorch tensor placement errors (#33485)
This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"
According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.
Fixes: #33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* bump tokenizers, fix added tokens fast (#32535)
* update based on tokenizers release
* update
* nits
* update
* revert re addition
* don't break that yet
* fmt
* revert unwanted
* update tokenizers version
* update dep table
* update
* update in conversion script as well
* some fix
* revert
* fully revert
* fix training
* remove set trace
* fixup
* update
* update
* [Pixtral] Improve docs, rename model (#33491)
* Improve docs, rename model
* Fix style
* Update repo id
* fix code quality after merge
* HFQuantizer implementation for compressed-tensors library (#31704)
* Add compressed-tensors HFQuantizer implementation
* flag serializable as False
* run
* revive lines deleted by ruff
* fixes to load+save from sparseml, edit config to quantization_config, and load back
* address satrat comment
* compressed_tensors to compressed-tensors and revert back is_serializable
* rename quant_method from sparseml to compressed-tensors
* tests
* edit tests
* clean up tests
* make style
* cleanup
* cleanup
* add test skip for when compressed tensors is not installed
* remove pydantic import + style
* delay torch import in test
* initial docs
* update main init for compressed tensors config
* make fix-copies
* docstring
* remove fill_docstring
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* review comments
* review comments
* comments - suppress warnings on state dict load, tests, fixes
* bug-fix - remove unnecessary call to apply quant lifecycle
* run_compressed compatability
* revert changes not needed for compression
* no longer need unexpected keys fn
* unexpected keys not needed either
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* add to_diff_dict
* update docs and expand testing
* Update _toctree.yml with compressed-tensors
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update doc
* add note about saving a loaded model
---------
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
* update model card for opt
* add batch size to inference table
* [slow-run] opt
* [run-slow] opt
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Tibor Reiss <75096465+tibor-reiss@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Muhammad Naufil <m.naufil1@gmail.com>
Co-authored-by: sizhky <yyeshr@gmail.com>
Co-authored-by: Umar Butler <umar@umar.au>
Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
* Add a convenience method for building in your own name scope
* Second attempt at auto layer building
* Revert "Second attempt at auto layer building"
This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be.
* Attempt #3
* Revert "Attempt #3"
This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47.
* Add missing attributes that we're going to need later
* Add some attributes we're going to need later
* A fourth attempt! Feel the power flow through you!
* Revert "A fourth attempt! Feel the power flow through you!"
This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7.
* Add more values we'll need later
* TF refactor that we'll need later
* Revert "TF refactor that we'll need later"
This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9.
* Revert "Revert "TF refactor that we'll need later""
This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13.
* make fixup
* Attempt five!
* Revert "Attempt five!"
This reverts commit 3302207958dfd0374b0447a51c06eea51a506044.
* Attempt six - this time don't add empty methods
* Revert "Attempt six - this time don't add empty methods"
This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb.
* Attempt seven - better base model class detection!
* Revert "Attempt seven - better base model class detection!"
This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9.
* Another attribute we'll need later
* Try again with the missing attribute!
* Revert "Try again with the missing attribute!"
This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7.
* This is the attempt that will pierce the heavens!
* Revert "This is the attempt that will pierce the heavens!"
This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6.
* Attempt seven - snag list is steadily decreasing
* Revert "Attempt seven - snag list is steadily decreasing"
This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316.
* Attempt eight - will an empty snag list do it?
* Revert "Attempt eight - will an empty snag list do it?"
This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b.
* Fixes to Hubert issues that cause problems later
* Trying again with Conv1D/SeparableConv fixes
* Revert "Trying again with Conv1D/SeparableConv fixes"
This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e.
* Apply the build shape fixes to Wav2Vec2 as well
* One more attempt!
* Revert "One more attempt!"
This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c.
* Another attempt!
* Revert "Another attempt!"
This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50.
* Let's see how many failures we get without the internal build method
* Fix OpenAI
* Fix MobileBERT
* (Mostly) fix GroupVIT
* Fix BLIP
* One more BLIP fix
* One more BLIP fix!
* Fix Regnet
* Finally fully fix GroupViT
* Fix Data2Vec and add the new AdaptivePool
* Fix Segformer
* Fix Albert
* Fix Deberta/DebertaV2
* Fix XLM
* Actually fix XLM
* Fix Flaubert
* Fix lxmert
* Fix Resnet
* Fix ConvBERT
* Fix ESM
* Fix Convnext / ConvnextV2
* Fix SAM
* Fix Efficientformer
* Fix LayoutLMv3
* Fix speech_to_text
* Fix mpnet and mobilevit
* Fix Swin
* Fix CTRL
* Fix CVT
* Fix DPR
* Fix Wav2Vec2
* Fix T5
* Fix Hubert
* Fix GPT2
* Fix Whisper
* Fix DeiT
* Fix the encoder-decoder / dual-encoder classes
* make fix-copies
* build in name scope
* Fix summarization test
* Fix tied weight names for BART + Blenderbot
* Fix tied weight name building
* Fix to TFESM weight building
* Update TF SAM
* Expand all the shapes out into Big Boy Shapes
* adds agnostic decorators and availability fns
* renaming decorators and fixing imports
* updating some representative example tests
bloom, opt, and reformer for now
* wip device agnostic functions
* lru cache to device checking functions
* adds `TRANSFORMERS_TEST_DEVICE_SPEC`
if present, imports the target file and updates device to function
mappings
* comments `TRANSFORMERS_TEST_DEVICE_SPEC` code
* extra checks on device name
* `make style; make quality`
* updates default functions for agnostic calls
* applies suggestions from review
* adds `is_torch_available` guard
* Add spec file to docs, rename function dispatch names to backend_*
* add backend import to docs example for spec file
* change instances of to
* Move register backend to before device check as per @statelesshz changes
* make style
* make opt test require fp16 to run
---------
Co-authored-by: arsalanu <arsalanu@graphcore.ai>
Co-authored-by: arsalanu <hzji210@gmail.com>
* Replaces calls to `.cuda` with `.to(torch_device)` in tests
`torch.Tensor.cuda()` is a pre-0.4 solution to changing a tensor's device. It is recommended to prefer `.to(...)` for greater flexibility and error handling. Furthermore, this makes it more consistent with other tests (that tend to use `.to(torch_device)`) and ensures the correct device backend is used (if `torch_device` is neither `cpu` or `cuda`).
* addressing review comments
* more formatting changes in Bloom test
* `make style`
* Update tests/models/bloom/test_modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixes style failures
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix one BLIP arg not being optional, remove misspelled arg
* Remove the lxmert test overrides and just use the base test_saved_model_creation
* saved_model_creation fixes and re-enabling tests across the board
* Remove unnecessary skip
* Stop caching sinusoidal embeddings in speech_to_text
* Fix transfo_xl compilation
* Fix transfo_xl compilation
* Fix the conditionals in xglm
* Set the save spec only when building
* Clarify comment
* Move comment correctly
* Correct embeddings generation for speech2text
* Mark RAG generation tests as @slow
* Remove redundant else:
* Add comment to clarify the save_spec line in build()
* Fix size tests for XGLM at last!
* make fixup
* Remove one band_part operation
* Mark test_keras_fit as @slow
* A fun new PR where I break the entire codebase again
* A fun new PR where I break the entire codebase again
* Handle cross-attention
* Move calls to model(model.dummy_inputs) to the new build() method
* Seeing what fails with the build context thing
* make fix-copies
* Let's see what fails with new build methods
* Fix the pytorch crossload build calls
* Fix the overridden build methods in vision_text_dual_encoder
* Make sure all our build methods set self.built or call super().build(), which also sets it
* make fix-copies
* Remove finished TODO
* Tentatively remove unneeded (?) line
* Transpose b in deberta correctly and remove unused threading local
* Get rid of build_with_dummies and all it stands for
* Rollback some changes to TF-PT crossloading
* Correctly call super().build()
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Don't forget the imports
* Add the imports to tests too
* make fixup
* Refactor tests that depended on get_type_hints
* Better test refactor
* Fix an old hidden bug in the test_keras_fit input creation code
* Fix for the Deit tests
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object
* Add `OPTForQuestionAnswering`
- added `OPTForQuestionAnswering` class based on `BloomForQuestionAnswering`
- added `OPTForQuestionAnswering` in common tests
- all common tests pass
- make fixup done
* added docstrings for OPTForQuestionAnswering
* Fix docstrings for OPTForQuestionAnswering
* Add serving_output and serving methods to some vision models
* Add serving outputs for DeiT
* Don't convert hidden states - differing shapes
* Make saveable
* Fix up
* Make swin saveable
* Add in tests
* Fix funnel tests (can't convert to tensor)
* Fix numpy call
* Tidy up a bit
* Add in hidden states - resnet
* Remove numpy
* Fix failing tests - tensor shape and skipping tests
* Remove duplicated function
* PR comments - formatting and var names
* PR comments
Add suggestions made by Joao Gante:
* Use tf.shape instead of shape_list
* Use @tooslow decorator on tests
* Simplify some of the logic
* PR comments
Address Yih-Dar Sheih comments - making tensor names consistent and make types float
* Types consistent with docs; disable test on swin (slow)
* CI trigger
* Change input_features to float32
* Add serving_output for segformer
* Fixup
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
* Add final_layer_norm to OPT model
* Add JAX and TF version
* Fix Keras name
* Woops
* Allow for non breaking change
* Apply suggestions from code review
* add tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* initial commit
* add init file
* update globakl init
* update index and dummy objects
* style
* update modelling auto
* fix initi typo in src/transformers
* fix typo in modeling tf auto, opt was in wrong mapping name
* fixed a slow test : saved_model
* style
* fix positionnal embedding if no position id is provided
* update tf test
* update test flax requirements
* fixed serialization
* update
* update tf name to allow smooth convertion
* update flax tests
* style
* fix test typo
* fix tf typo test
* add xla for generate support in causal LM
* fixed bug
* cleaned tf tests
* style
* removed from PT for slow tests
* fix typp
* opt test as slow
* trying to fix GPT2 undefined
* correct documentation and add to test doc
* update tf doc
* fix doc
* fake commit
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* update test based on review
* merged main layer for functionning test
* fixup + quality
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update long comment
* make fix copies
Co-authored-by: Arthur <arthur@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Support for Bart and LayoutLM, and partial support for XLNet
* Support for mbart
* A lot of new models supported
* Support for other models
* LayoutLM fix
* Use strings instead of classes
* First version - OPT model
* Final changes
- putting use cache to False
* few changes
- remove commented block
* few changes
- remove unecessary files
* fix style issues
* few changes
- remove a test file
- added the logits test
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add gen tests
* few changes
- rm mask filling example on docstring
* few changes
- remove useless args
* some changes
- more tests should pass now
- needs to clean more
- documentation still needs to be done
* fix code quality
* major changes
- change attention architecture to BART-like
- modify some tests
- style fix
* rm useless classes
- remove opt for:
- QA
- cond generation
- seq classif
* Removed autodoc calls to non-existant classes
TOkenizers are not implemented
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/modeling_tf_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Replaced OPTTokeniser with GPT2 tokenizer
* added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
* Removed OPTTokenizer
* make style
* Make style replaces
``` ...).unsqueeze(```
by
``` >>>).unsqueeze(```
* make repo consistency
* Removed PretrainedOPTModel
* fix opt.mdx removed other heads
* fix init, removed 3 heads
* removed heads
* finished cleaning head
* removed seauence classif and question answering
* removed unused imports
* removed useless dummy object for QA, SC and CG
* removed tests for removed useless dummy object for QA, SC and CG
* Removed head_mask using encoder layers which don't exist
* fixed test
* fix line
* added OPT to toctree
* Updated model path with pushed weigths
* fix model path
* fixed code quality
* fixed embeddings and generation tests
* update paths
* clean comments
* removed OPTClassificationHead for sentence classification
* renamed hidden layer
* renamed num layers to standard num_hidden_layers
* num_attention_heads fix
* changes for 125m
* add first version for 125m
* add first version - flax
* add new version
* causal LM output
* replace output type with BaseModelOutputWithPastAndCrossAttentions
* revert working config from 150m to 350m
* clean
* removed decoder input ids
* fixed embed dim
* more embed_dim issues
* make style + removed enc_dec test
* update falx model
* removed troublesome copy
* added is_encoder_decoder=False to config
* added set_input emb fuinction to model class
* requires torch on embed test
* use head mask instead of decoder head mask input param solves a test
* 8 test remaining, update
* Updated create_and_check_decoder_model_past_large_inputs
* Make style
* update op tokenizer with condition
* make style
* See if I can push
* some clean up
* remove linear head hack
* save intermediate
* save correct attention
* add copied from from bart
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix part of the reviewss
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* same changes in naming / conversion
* correct mask
* more fixes
* delete FlaxOPT and TfOPT
* clean traces of Flax and Tf
* fix mask
* fixed positionnal embedding length when past key value is provoded
* get 125m, 6.7b to work
* Added do_layer_norm
* solved mismatch in load dictionnary
* clean up preapre opt input dict
* fixed past key value as bool
* fix previus
* fixed return dict False tuple issue
* All tests are passing
* Make style
* Ignore OPTDecoder non tested
* make fix-copies
* make repo consistency
* small fix
* removed uselss @torch.no_grad decorator
* make styl;e
* fix previous opt test
* style
* make style
* added opt documentation
* update OPT_PRETRAINED_MODEL_ARCHIVE_LIST
* up
* more fixes
* model & config work
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added comment on padding hack (+2)
* cleaup
* review update
* docstring for missing arg
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update pretrained map
* update path and tests
* make style
* styling
* make consistency
* add gpt2 tok new
* more tok fixes
* Update src/transformers/models/auto/tokenization_auto.py
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/models/opt/test_modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update based on reviews
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* make style
* make tokenizer auto tests pass
* apply Lysandre suggestion
* finish tests
* add some good tokenizer tests
* improve docs slighly
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>