* Don't accidentally mutate the base_model_tp_plan
* Co-authored by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Trigger tests
* Marking grad accum test as slow
* Add a flaky decorator
* Add a flaky decorator
* Use cyril's codeblock
* Don't copy() when it's None
* Use cyril's new codeblock
* make fixup
* Fix converter
* [Broken] Adds Gemma 3 to Hugging Face Transformers
* Consolidating Config and Processor params across impls
* Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right.
* Additional plumbing for CausalLM and ConditionalGeneration variants
* incomplete draft of Orbax conversion script
* More complete checkpoint conversion
* Supporting Gemma 3 1B checkpoints
* Updating RoPE for multiple frequencies
* Adjustments to rotary embedder
* Proof of life for text-only operation
* Updating the conversion script to handle multimodal projection weights
* Fixing tet-only conversions
* Cleaner conversion script with multimodal support and a simpler processor
* Additional refatcors to the Gemma3Processor
* Simplified Processor to work over text representations
* Updated conversion script to join text and vision embeddings at converion time
* Logging for debugging
* Update src/transformers/models/gemma2/modeling_gemma2.py
Co-authored-by: Joshua Lochner <admin@xenova.com>
* Removed extraneous Config params
* Switching to fast tokenizer for checkpoint conversions
* isolating siglip for performance tetsing
* Minor changes for debugging tests against baselines
* Adding average pooling for soft tokens
* Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts
* Updating conversion script for ShieldGemma 2 conversion compatibility
* Allow disable_compile to be provided as a kwarg
* Refresh from modular
* Updated conversion script and corrected sliding window
* Fix type mismatch in cache_position (#4)
* Fix dtype (#5)
* Fix type mismatch in cache_position
* Actually fix in the modular file
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
---------
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
* fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor
* Adding 2D pooling for image embeddings
* Revert "Adding 2D pooling for image embeddings"
This reverts commit 65350cf531.
* Gemma3 average pooling changed from 1D to 2D
* Major refactor to Gemma3MultimodalInputProjection
* Updating Gemm 3 Auto* registrations
* Add option to save Gemma 3 chat template with tokenizer during weights conversion
* Removing unused imports
* Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration
* Removing duplicate config property
* Removing final logit softcapping and 1-indexing of position ids
* Fixing image processor config and none --> None typo
* Fixing sliding window size for 1B
* Updating image_mean and image_std in Image Processor
* Attention masking changed to lower triangular
* Moving image special tokens to conversion script
* Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs
* Remove special token variables from symbol space
* Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration
* tie lm_head and embedding weights
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Correct tied weights in Gemma3CausalLM
* iterative bidirectional attention
* resolving merge conflicts
* Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6
* Correcting RoPE scaling
* clean up first pass, dummy model geenration works
* final clean up before fixing tests
* causal lm test works, so fine
* Fix conversion
* Update src/transformers/models/gemma3/processing_gemma3.py
* model tests are happy
* processor tests are happy
* image processing tests added
* fixup
* Fix pre-processing in conversion
* Inputs merging
* Do not normalize vision embeddings
* Apply Ryan's (and team) changes to attention
* token type ids + mask
* template
* move embed scale, add rope scale, fix tests
* Add chat template to tokenizer
* Use prefix for causal model loading
* use existing code for sliding mask from gemma2
* self.embed_tokens already normalizes
* Correcting Gemma3TextConfig parameters in conversion script
* typo, modular overwrites my fixes
* enable device map for text model
* Conversion updates
* ultra nit: no einsums
* update image token
* copy deepcopy config + some docs
* add some test, still WIP
* Refactoring --include_chat_tempalte logic in converter
* Update src/transformers/models/gemma3/modular_gemma3.py
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Add eos tokens for instruct models
* dump so i can work on dgx
* Removing add_bos by default
* dump
* add fast im proc
* docs for PaS + fixup
* another fixup
* one more fixup
* fix tests
* Inverting prior BOS change
* ultra nit
* Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS
* resize embeds, remove sqrt, add slow test outputs
* FA2 but quality is meh
* nit
* skip FA2, no idea what happened
* last bit for green CI
* please, green CI for docs
* T_T
* Fix for Gemma3 logits
* Support both options for system prompt
* Update src/transformers/models/gemma3/image_processing_gemma3_fast.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/gemma3.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Docs updates now that assets are live
* Style fixes
---------
Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Lysandre <hi@lysand.re>
* initial commit
* small fix
* move stuff to image processing file
* remove stuff in validate turn and fix return tensor
* remove liquid stuff
* in the process of addressing comments
* changes to get the right tokenization
* new __init__ works
* fixing defulat std and mean
* works
* small testing scipt -- to be deleted before merge
* remove redundant code
* addressing comments
* fix inits, add docs templates
* refactor processor, switch to gotocr image processor
* remove image proc from init
* refactor to working llava-style architecture
* Change AyaVisionModel to AyaVisionForConditionalGeneration
* add tests
* fixups
* update doc
* Adding logits_to_keep explicitly in ayavision forward to enable compatibility with cohere model
* better variable names + remove code paths
* Updates to aya_vision.md
* address comments
* adding copied from
* make style and remove unused projector_hidden_act from config
* sort init
* include usage of fast image proc and proc on cuda in doc
* update checkpoint iin test processor
* update checkpoint in test processor 2
* remove test_model and update docstring
* skip failing tests
---------
Co-authored-by: Saurabh Dash <saurabh@cohere.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* Fix StopStringCriteria to handle tokens above len(tokenizer)
This fixes#35244 by clipping token IDs to be within the tokenizer's vocabulary size before performing the embedding lookup. This prevents index errors when model.config.vocab_size > len(tokenizer).
The fix:
1. Adds a clamp operation to ensure token IDs are within bounds
2. Adds a test case to verify the behavior
* Use self.stop_strings instead of stop_strings
* Handle clipping correctly
* make fixup
* Update test to the new embedding vecs
* Use much bigger values in the mismatch test
* Typo fix
* Slight simplification
---------
Co-authored-by: openhands <openhands@all-hands.dev>
* First commit
* Finish model implementation
* First commit
* Finish model implementation
* Register zamba2
* generated modeling and configuration
* generated modeling and configuration
* added hybrid cache
* fix attention_mask in mamba
* dropped unused loras
* fix flash2
* config docstrings
* fix config and fwd pass
* make fixup fixes
* text_modeling_zamba2
* small fixes
* make fixup fixes
* Fix modular model converter
* added inheritances in modular, renamed zamba cache
* modular rebase
* new modular conversion
* fix generated modeling file
* fixed import for Zamba2RMSNormGated
* modular file cleanup
* make fixup and model tests
* dropped inheritance for Zamba2PreTrainedModel
* make fixup and unit tests
* Add inheritance of rope from GemmaRotaryEmbedding
* moved rope to model init
* drop del self.self_attn and del self.feed_forward
* fix tests
* renamed lora -> adapter
* rewrote adapter implementation
* fixed tests
* Fix torch_forward in mamba2 layer
* Fix torch_forward in mamba2 layer
* Fix torch_forward in mamba2 layer
* Dropped adapter in-place sum
* removed rope from attention init
* updated rope
* created get_layers method
* make fixup fix
* make fixup fixes
* make fixup fixes
* update to new attention standard
* update to new attention standard
* make fixup fixes
* minor fixes
* cache_position
* removed cache_position postion_ids use_cache
* remove config from modular
* removed config from modular (2)
* import apply_rotary_pos_emb from llama
* fixed rope_kwargs
* Instantiate cache in Zamba2Model
* fix cache
* fix @slow decorator
* small fix in modular file
* Update docs/source/en/model_doc/zamba2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* several minor fixes
* inherit mamba2decoder fwd and drop position_ids in mamba
* removed docstrings from modular
* reinstate zamba2 attention decoder fwd
* use regex for tied keys
* Revert "use regex for tied keys"
This reverts commit 9007a522b1.
* use regex for tied keys
* add cpu to slow forward tests
* dropped config.use_shared_mlp_adapter
* Update docs/source/en/model_doc/zamba2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* re-convert from modular
---------
Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
`return unittest.skip()` used in the `test_model_parallel_beam_search` in
skip condition for xpu did not actually mark test to be skipped running
under pytest:
* 148 passed, 1 skipped
Other tests use `self.skipTest()`. Reusing this approach and moving the
condition outside the loop (since it does not depend on it) allows to skip
for xpu correctly:
* 148 skipped
Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and
torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch
versions.
Fixes: 1ea3ad1ae ("[tests] use `torch_device` instead of `auto` for model testing (#29531)")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* model can convert to HF and be loaded back
* nit
* works in single batch generation but hallucinates
* use the image tokens
* add image generation
* now it works
* add tests
* update
* add modulare but it doesn't work for porting docstring :(
* skip some tests
* add slow tests
* modular removed the import?
* guess this works
* update
* update
* fix copies
* fix test
* fix copies
* update
* docs
* fix tests
* last fix tests?
* pls
* repo consistency
* more style
* style
* remove file
* address comments
* tiny bits
* update after the new modular
* fix tests
* add one more cond in check attributes
* decompose down/up/mid blocks
* allow static cache generation in VLMs
* nit
* fix copies
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix VAE upsampling
* Update src/transformers/models/emu3/modular_emu3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments
* state overwritten stuff explicitly
* fix copies
* add the flag for flex attn
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial commit for PR
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
* rename dynamic cache
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add more unit tests
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add integration test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add integration test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Add modular bamba file
* Remove trainer changes from unrelated PR
* Modify modular and cofig to get model running
* Fix some CI errors and beam search
* Fix a plethora of bugs from CI/docs/etc
* Add bamba to models with special caches
* Updat to newer mamba PR for mamba sublayer
* fix test_left_padding_compatibility
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix remaining tests
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* missed this test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* ran make style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* move slow tag to integration obj
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* make style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* address comments
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix modular
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* left out one part of modular
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* change model
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Make Rotary modular as well
* Update bamba.md
Added overview, update Model inference card and added config
* Update bamba.md
* Update bamba.md
* Update bamba.md
Minor fixes
* Add docs for config and model back
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Add warning when using fast kernels
* replaced generate example
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Address comments from PR
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Propagate attention fixes
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Fix attention interfaces to the new API
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Fix API for decoder layer
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Remove extra weights
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
---------
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Co-authored-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com>
Co-authored-by: Antoni Viros <ani300@gmail.com>
* softcapping
* soft cap before the mask
* style
* ...
* super nit
* update
* fixes
* update
* small issue with modular
* fix modular imports
* update
* fixup
* simplify a hell lot
* simplify cleaning imports
* finish fixing
* update our design
* nits
* use a deprecation cycle
* updates
* Fix modular (recursive deps need to always be computed after merges!)
* push
* fix
* update
* fix modular order
* make fix-copies
* updates
* update
* ?
* don't compile for now
* ?
* fix some stuff
* donc!
* fix copies
* update
* fixup
* ?
* fix two tests
* fix?
* for now, don't use head info
* eager when output attentoin and sdpa or flash as it's the simplest behaviour (for our tests as well :))
* fix-copies
* revert sdpa check
* Apply suggestions from code review
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* rebase, fix-copies and push
* add a slow integration test
* update the test
* fix left padding issue
* fix test
* remove duplicate scaling
* quality
* add a small test and make sure it works
* 2b
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* blip2 tests
* instructblips
* copies
* fix slow tests
* fix
* uncomment this
* clean up after rebase
* should be model main input
* fix overwritten tests
* oops len should be multiple of frame number
* style
* fix some tests
* tmp commit
* tmp commit
* cull overwrites of deleted tests
* typo
* more specific docstring
* make fixup
* parameterize at the top?
* correction
* more deletions :D
* tmp commit
* for VLMs too
* fix _check_outputs
* test nit
* make fixup
* fix another flaky
* test_generate_from_inputs_embeds -- handle missing attention mask
* Add SynthIDTextWatermarkLogitsProcessor
* esolving comments.
* Resolving comments.
* esolving commits,
* Improving SynthIDWatermark tests.
* switch to PT version
* detector as pretrained model + style
* update training + style
* rebase
* Update logits_process.py
* Improving SynthIDWatermark tests.
* Shift detector training to wikitext negatives and stabilize with lower learning rate.
* Clean up.
* in for 7B
* cleanup
* upport python 3.8.
* README and final cleanup.
* HF Hub upload and initiaze.
* Update requirements for synthid_text.
* Adding SynthIDTextWatermarkDetector.
* Detector testing.
* Documentation changes.
* Copyrights fix.
* Fix detector api.
* ironing out errors
* ironing out errors
* training checks
* make fixup and make fix-copies
* docstrings and add to docs
* copyright
* BC
* test docstrings
* move import
* protect type hints
* top level imports
* watermarking example
* direct imports
* tpr fpr meaning
* process_kwargs
* SynthIDTextWatermarkingConfig docstring
* assert -> exception
* example updates
* no immutable dict (cant be serialized)
* pack fn
* einsum equivalent
* import order
* fix test on gpu
* add detector example
---------
Co-authored-by: Sumedh Ghaisas <sumedhg@google.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
* auto-gptq requirement is removed & model is changed & tokenizer pad token is assigned
* values func is changed with extensions & sequence key value bug is fixed
* map key value check is added in ExtensionsTree
* empty trimmed_ids bug is fixed
* tail_id IndexError is fixed
* empty trimmed_ids bug fix is updated for failed test
* too much specific case for specific tokenizer is removed
* input_ids check is updated
* require auto-gptq import is removed
* key error check is changed with empty list check
* empty input_ids check is added
* empty trimmed_ids fix is checked with numel function
* usage change comments are added
* test changes are commented
* comment style and quality bugs are fixed
* test comment style and quality bug is fixed
* add idefics
* conflicts after merging main
* enable tests but need to fix some
* fix tests
* no print
* fix/skip some slow tests
* continue not skip
* rebasing broken smth, this is the fix
* Default synced_gpus to True when using FullyShardedDataParallel
Fixes#30228
Related:
* https://github.com/pytorch/pytorch/issues/100069
* https://github.com/pytorch/pytorch/issues/123962
Similar to DeepSpeed ZeRO Stage 3, when using FSDP with multiple GPUs and differently sized data per rank, the ranks reach different synchronization points at the same time, leading to deadlock
To avoid this, we can automatically set synced_gpus to True if we detect that a PreTrainedModel is being managed by FSDP using _is_fsdp_managed_module, which was added in 2.0.0 for torch.compile: https://github.com/pytorch/pytorch/blob/v2.0.0/torch/distributed/fsdp/_dynamo_utils.py
* Remove test file
* ruff formatting
* ruff format
* Update copyright year
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add test for FSDP-wrapped model generation
Before #33483, these tests would have hung for 10 minutes before crashing due to a timeout error
* Ruff format
* Move argparse import
* Remove barrier
I think this might cause more problems if one of the workers was killed
* Move import into function to decrease load time
https://github.com/huggingface/transformers/pull/33483#discussion_r1787972735
* Add test for accelerate and Trainer
https://github.com/huggingface/transformers/pull/33483#discussion_r1790309675
* Refactor imports
* Ruff format
* Use nullcontext
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix: handle padding in contrastive search for decoder-only models
* fix: handle padding in contrastive search for encoder-decoder models
* tests: move padding contrastive test to test_util, add t5 test
* fix: handle if model_kwargs["decoder_attention_mask"] is None
* refactor: improve padding input contrastive search generation tests
* chore: _ranking_fast to use LongTensor for cosine_matrix_mask
* change sequence_bias type of SequenceBiasLogitsProcessor tp list, add config tests for all processors
* fix format
* small fix for all_token_bias_pairs_are_valid internal func
* small typo fix in description
* improve test impl, some SequenceBiasLogitsProcessor refactoring
* add tests
* fix whisper
* update
* nit
* add qwen2-vl
* more updates!
* better this way
* fix this one
* fix more tests
* fix final tests, hope so
* fix led
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* pr comments
* not pass pixels and extra for low-mem tests, very flaky because of visio tower
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* don't run custom when not needed?
* update test fetcher filtering
* fixup and updates
* update
* update
* reduce burden
* nit
* nit
* mising comma
* this?
* this?
* more parallelism
* more
* nit for real parallelism on tf and torch examples
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update
* update
* update
* update
* update
* update
* use correct path
* fix path to test files and examples
* filter-tests
* filter?
* filter?
* filter?
* nits
* fix naming of the artifacts to be pushed
* list vs files
* list vs files
* fixup
* fix list of all tests
* fix the install steps
* fix the install steps
* fix the config
* fix the config
* only split if needed
* only split if needed
* extend should fix it
* extend should fix it
* arg
* arg
* update
* update
* run tests
* run tests
* run tests
* more nits
* update
* update
* update
* update
* update
* update
* update
* simpler way to show the test, reduces the complexity of the generated config
* simpler way to show the test, reduces the complexity of the generated config
* style
* oups
* oups
* fix import errors
* skip some tests for now
* update doctestjob
* more parallelism
* fixup
* test only the test in examples
* test only the test in examples
* nits
* from Arthur
* fix generated congi
* update
* update
* show tests
* oups
* oups
* fix torch job for now
* use single upload setp
* oups
* fu**k
* fix
* nit
* update
* nit
* fix
* fixes
* [test-all]
* add generate marker and generate job
* oups
* torch job runs not generate tests
* let repo utils test all utils
* UPdate
* styling
* fix repo utils test
* more parallel please
* don't test
* update
* bit more verbose sir
* more
* hub were skipped
* split by classname
* revert
* maybe?
* Amazing catch
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix
* update
* update
* maybe non capturing
* manual convert?
* pass artifacts as parameters as otherwise the config is too long
* artifact.json
* store output
* might not be safe?
* my token
* mmm?
* use CI job IS
* can't get a proper id?
* ups
* build num
* update
* echo url
* this?
* this!
* fix
* wget
* ish
* dang
* udpdate
* there we go
* update
* update
* pass all
* not .txt
* update
* fetcg
* fix naming
* fix
* up
* update
* update
* ??
* update
* more updates
* update
* more
* skip
* oups
* pr documentation tests are currently created differently
* update
* hmmmm
* oups
* curl -L
* update
* ????
* nit
* mmmm
* ish
* ouf
* update
* ish
* update
* update
* updatea
* nit
* nit
* up
* oups
* documentation_test fix
* test hub tests everything, just marker
* update
* fix
* test_hub is the only annoying one now
* tf threads?
* oups
* not sure what is happening?
* fix?
* just use folder for stating hub
* I am getting fucking annoyed
* fix the test?
* update
* uupdate
* ?
* fixes
* add comment!
* nit
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix: multilingual midel convert to tflite get wrong token
* fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min
---------
Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com>
Co-authored-by: Aya <[kent831217@gmail.com]>
* Add .float() in all generation methods logit outputs
* Switch float-casting of logits to training only for main models
* Add `num_logits_to_keep` in Llama and add it by default in generate
* Apply style
* Add num_logits_to_keep as arg in prepare_input_for_generation
* Add support for Mistral
* Revert models except llama and mistral
* Fix default None value in _supports_num_logits_to_keep()
* Fix dimension of dummy input
* Add exception for prophetnet in _supports_num_logits_to_keep()
* Update _supports_num_logits_to_keep() to use inspect.signature()
* Add deprecation cycle + remove modification with pretraining_tp
* Apply style
* Add most used models
* Apply style
* Make `num_logits_to_keep` an int in all cases to remove if-else clause
* Add compile check for the warning
* Fix torch versions
* style
* Add gemma2
* Update warning version
* Add comment about .float operations in generation utils
* Add tests in GenerationTesterMixin and ModelTesterMixin
* Fix batch size for assisted decoding in tests
* fix small issues in test
* refacor test
* fix slicing removing dim issue
* Add nemotron support (should fix check-copy issue in CIs)
* Trigger new CIs
* Trigger new CIs
* Bump version
* Bump version in TODO
* Trigger CIs
* remove blank space
* Trigger CIs