transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-29 01:02:25 +06:00

Author	SHA1	Message	Date
Younes Belkada	080b700805	FIX / AWQ: Fix failing exllama test (#30288 ) fix filing exllama test	2024-04-17 11:26:35 +02:00
fxmarty	40eb6d6c5f	Fix SDPA sliding window compatibility (#30127 ) * fix sdpa + sliding window * give credit Co-authored-by: ehuaa <ehuamail@163.com> * remove unnecessary warning * fix typog * add test --------- Co-authored-by: ehuaa <ehuamail@163.com>	2024-04-17 17:21:26 +08:00
amyeroberts	c63f158903	BLIP - fix pt-tf equivalence test (#30258 ) * BLIP - fix pt-tf equivalence test * Update tests/models/blip/test_modeling_blip.py * Update more model tests	2024-04-16 17:46:53 +01:00
Zach Mueller	e27d9308be	Raise relevent err when wrong type is passed in as the accelerator_config (#29997 ) * Raise relevent err * Use type instead	2024-04-16 11:21:24 -04:00
Hafedh	0eaef0c709	add `push_to_hub` to pipeline (#29172 ) * add `push_to_hub` to pipeline * fix docs * format with ruff * update save_pretrained * update save_pretrained * remove unnecessary comment * switch to push_to_hub method in DynamicPipelineTester * remove unused imports * update docs for add_new_pipeline * fix docs for add_new_pipeline * add comment * fix italien docs * changes to token retrieval for pipelines * Update src/transformers/pipelines/base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-16 15:34:04 +01:00
Zach Mueller	487505ff45	Allow for str versions of dicts based on typing (#30227 ) * Bookmark, initial impelemtation. Need to test * Clean * Working fully, woop woop * I think working version now, testing * Fin! * rm cast, could keep None * Fix typing issue * rm typehint * Add test * Add tests and make more rigid	2024-04-16 08:15:09 -04:00
amyeroberts	6b78360e6d	Add Idefics2 (#30253 ) * Initial add model additions * Test * All weights loading * Can perform full forward pass * Local and remote the same * Matching local and remote * Fixup * Idefics2Model importable; fixup docstrings * Don't skip by default * Remove deprecated use_resampler arg * Remove self.config * DecoupledLinear takes config * Tidy up * Enable eager attention and tidy up * Most tests passing * Update for batch of processed images * Add image processor * Update doc pages * Update conversion script * Remove erroneous breakpoint * Remove accidendtal spelling change * Update to reflect changes on hub - make generate work * Fix up * Image processor tests * Update tests * Add a processor * Add a processor * Update convert script * Update modeling file - remove fixmes * Bug fix * Add processing test * Use processor * Fix up * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix test * Update config - PR comments and defaults align with checkpoint * Reviewer comments * Add copied froms for flahs attention * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove qk_layer_norm and freeze_layers functionality * Fix * Remove freeze_layer options from config * Sync with upstream main * Fix attention shapes siglip * Remove Llava-next refs - TO REBASE * Use AutoModel for text model * Add comment to explain vision embeddings * Fix issue with tie_word_embeddings * Address review comments * Fix and fix up * Chat templates for idefics * Fix copies * Fix * Add layer norms to FA2 * Fix tests * Apply suggestions from code review Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix * Review comments * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update inputs merger * Merge weights in correct order * Update convert script * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update template * Model code examples (fix idefics too) * More review comments * Tidy up * Update processing * Fix attention mask preparation * Update inputs_merger inputs * Vectorize inputs_merger * Update src/transformers/models/idefics2/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics2/modeling_idefics2.py * Review comments * saying bye to the `qk_layer_norms` * Simplify * Update latents * Remove erroneuous readme changes * Return images when applying chat template * Fix bug - prompt images are for a single sample * Update src/transformers/models/idefics2/modeling_idefics2.py * image splitting * fix test * some more comment * some comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics2/image_processing_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update processor * Update model tests * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Don't add BOS in template * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Remove index in examples * Update tests to reflect #13 * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * PR comment - consistent typing * Update readme and model doc * Update docs * Update checkpoint references * Update examples * Fix and update tests * Small addition * Update tests - remove copied from as no ignore placement copy could be found * Update example * small fixes * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update README.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Connector model as bridge * Fix up * Fix up * Don't pass model inputs for generation kwargs update * IDEFICS-2 -> Idefics2 * Remove config archive name * IDEFICS-2 -> Idefics2 * Add back llava-next * Update readmes * Add requirements for processor tester * Use custom convert_to_rgb to avoid possible BC * Fix doc example * Fix doc example * Skip model doc tests - as model to large * More doc example - account for image splitting * Update src/transformers/image_transforms.py * Fix config doctest --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-04-15 17:03:03 +01:00
Fanli Lin	667939a2d3	[tests] add the missing `require_torch_multi_gpu` flag (#30250 ) add gpu flag	2024-04-15 16:30:52 +01:00
Sai-Suraj-27	06b1192768	fix: Replace deprecated `assertEquals` with `assertEqual` (#30241 ) Replace deprecated assertEquals with assertEqual.	2024-04-15 09:36:06 +01:00
Xu Song	8fd2de933c	Add test for parse_json_file and change typing to os.PathLike (#30183 ) * Add test for parse_json_file * Change Path to PathLike * Fix `Import block is un-sorted or un-formatted` * revert parse_json_file * Fix ruff format * Add parse_json_file test	2024-04-15 09:34:36 +01:00
Yih-Dar	bf9a7ab932	Fix `RecurrentGemmaIntegrationTest.test_2b_sample` (#30222 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-12 17:53:25 +02:00
NielsRogge	5569552cf8	Update output of SuperPointForKeypointDetection (#29809 ) * Remove auto class * Update ImagePointDescriptionOutput * Update model outputs * Rename output class * Revert "Remove auto class" This reverts commit `ed4a8f549d`. * Address comments	2024-04-11 14:59:30 +02:00
lewtun	fbdb978eb5	Fix Llava chat template examples (#30130 )	2024-04-11 10:38:24 +02:00
Eduardo Pacheco	b752ad3019	Adding grounding dino (#26087 ) * Fixed typo when converting weigths to GroundingDINO vision backbone * Final modifications on modeling * Removed unnecessary class * Fixed convert structure * Added image processing * make fixup partially completed * Now text_backbone_config has its own class * Modified convert script * Removed unnecessary config attribute * Added new function to generate sub sentence mask * Renamed parameters with gamma in the name as it's currently not allowed * Removed tokenization and image_processing scripts since we'll map from existing models * Fixed some issues with configuration * Just some modifications on conversion script * Other modifications * Copied deformable detr * First commit * Added bert to model * Bert validated * Created Text and Fusion layers for Encoder * Adapted Encoder layer * Fixed typos * Adjusted Encoder * Converted encoder to hf * Modified Decoder Layer * Modified main decoder class * Removed copy comments * Fixed forward from GroundingDINOModel and GroundingDINODecoder * Added all necessary layers, configurations and forward logic up to GroundingDINOModel * Added all layers to convertion * Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection * Fixed mask input to encoders and fixed nn.MultiheadAttention batch first and attn output * Fixed forward from GroundingDINOTextEnhancerLayer * Fixed output bug with GroundingDINODeformableLayer * Fixed bugs that prevent GroundingDINOForObjectDetection to run forward method * Fixed attentions to be passed correctly * Passing temperature arg when creating Sine position embedding * Removed copy comments * Added temperature argument for position embedding * Fixed typo when converting weigths to GroundingDINO vision backbone * Final modifications on modeling * Removed unnecessary class * Fixed convert structure * Added image processing * make fixup partially completed * Now text_backbone_config has its own class * Modified convert script * Removed unnecessary config attribute * Added new function to generate sub sentence mask * Renamed parameters with gamma in the name as it's currently not allowed * Removed tokenization and image_processing scripts since we'll map from existing models * Fixed some issues with configuration * Just some modifications on conversion script * Other modifications * Fix style * Improve fixup * Improve conversion script * Improve conversion script * Add GroundingDINOProcessor * More improvements * Return token type ids * something * Fix more tests * More improvements * More cleanup * More improvements * Fixed tests, improved modeling and config * More improvements and fixing tests * Improved tests and modeling * Improved tests and added image processor * Improved tests inference * More improvements * More test improvements * Fixed last test * Improved docstrings and comments * Fix style * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Better naming * Better naming * Added Copied statement * Added Copied statement * Moved param init from GroundingDINOBiMultiHeadAttention * Better naming * Fixing clamp style * Better naming * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Improving conversion script * Improved config * Improved naming * Improved naming again * Improved grouding-dino.md * Moved grounding dino to multimodal * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Fixed docstrings and style * Fix docstrings * Remove timm attributes * Reorder imports * More improvements * Add Grounding DINO to pipeline * Remove model from check_repo * Added grounded post_process to GroundingDINOProcessor * Fixed style * Fixed GroundingDINOTextPrenetConfig docstrings * Aligned inputs.keys() when both image and text are passed with model_input_names * Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor * Testing post_process_grounded_object_detection from GroundingDINOProcessor at test_inference_object_detection_head * Fixed order * Marked test with require_torch * Temporarily changed repo_id * More improvements * Fix style * Final improvements * Improve annotators * Fix style * Add is_torch_available * Remove type hints * vocab_tokens as one liner * Removed print statements * Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig * remove unnecessary comments * Removed unnecessary tests on conversion script * Renamed GroundingDINO to camel case GroundingDino * Fixed GroundingDinoProcessor docstrings * loading MSDA kernels in the modeling file * Fix copies * Replace nn.multiheadattention * Replace nn.multiheadattention * Fixed inputs for GroundingDinoMultiheadAttention & order of modules * Fixed processing to avoid messing with inputs * Added more tips for GroundingDino * Make style * Chaning name to align with SAM * Replace final nn.multiheadattention * Fix model tests * Update year, remove GenerationTesterMixin * Address comments * Address more comments * Rename TextPrenet to TextModel * Rename hidden_states * Address more comments * Address more comments * Address comment * Address more comments * Address merge * Address comment * Address comment * Address comment * Make style * Added layer norm eps to layer norms * Address more comments * More fixes * Fixed equivalence * Make fixup * Remove print statements * Address comments * Address comments * Address comments * Address comments * Address comments * Address comments * Add comment * Address comment * Remove overwriting of test * Fix bbox_embed * Improve decoder_bbox_embed_share * Simplify outputs * Updated post_process_grounded_object_detection * Renamed sources to feature_maps * Improved tests for Grounding Dino ImageProcessor and Processor * Fixed test requirements and imports * Fixed image_processing * Fixed processor tests * Fixed imports for image processing tests * Fix copies * Updated modeling * Fix style * Moved functions to correct position * Fixed copy issues * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Keeping consistency custom cuda kernels for MSDA * Make GroundingDinoProcessor logic clearer * Updated Grounding DINO checkpoints * Changed tests to correct structure * Updated gpu-cpu equivalence test * fix copies * Update src/transformers/models/grounding_dino/processing_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/processing_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixed erros and style * Fix copies * Removed inheritance from PreTrainedModel from GroundingDinoTextModel * Fixed GroundingDinoTextModel * Fixed type of default backbone config * Fixed missing methods for GroundingDinoTextModel and Added timm support for GroundingDinoConvEncoder * Addressed comments * Addressed batched image processing tests * Addressed zero shot test comment * Addressed tip comment * Removed GroundingDinoTextModel from check_repo * Removed inplace masking * Addressed comments * Addressed comments * Addressed comments * Fix copies * Fixing timm test * Fixed batching equivalence test * Update docs/source/en/model_doc/grounding-dino.md Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com> * Update docs/source/en/model_doc/grounding-dino.md Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com> * Update docs/source/en/model_doc/grounding-dino.md Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com> * Addressed more comments * Added a new comment * Reduced image size * Addressed more comments * Nits * Nits * Changed the way text_config is initialized * Update src/transformers/models/grounding_dino/processing_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Niels <niels.rogge1@gmail.com> Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>	2024-04-11 08:32:16 +01:00
Arthur	0fe44059ae	Add recurrent gemma (#30143 ) * Fork. * RecurrentGemma initial commit. * Updating __init__.py. * Minor modification to how we initialize the cache. Changing how the config specifies the architecture. * Reformat code to 4 spaces. Fixed a few typos. * Fixed the forward pass. Still unclear on the cache? * Fixed the RecurrentGemmaForCausalLM * Minor comment that we might not need attention_mask and output_attention arguments. * Now cache should work as well. * Adding a temporary example to check whether the model generation works. * Adding the tests and updating imports. * Adding the example file missing in the previous commit. * First working example. * Removing .gitignore and reverting parts of __init__. * Re-add .gitignore. * Addressing comments for configuration. * Move mask creation to `_prepare_inputs_for_generation`. * First try at integration tests: 1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'. 2. `cache_position` not passed * Transfoering between machines. * Running normal tests. * Minor fix. * More fixes. * Addressing more comments. * Minor fixes. * first stab at cleanup * more refactoring * fix copies and else * renaming and get init to work * fix causal mask creation * update * nit * fix a hell lot of things * updates * update conversion script * make all keys importable * nits * add auto mappings * properly convert ffw_up and down * add scaling * fix generations * for recurrent dtype * update * fix going beyong window * fixup * add missing files * current updates to remove last einops * finish modeling refactor * TADA * fix compile * fix most failing testt ? ? * update tests * refactor and update * update * nits, fixup and update tests * more fixup * nits * fix imports * test format * fixups * nits * tuple typing * fix code quality * add model card * fix doc * skip most generation tests * nits * style * doc fixes * fix pr and check_copies? * last nit * oupsy * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * update * Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update based on review * doc nit * fix quality * quality * fix slow test model path * update default dype * ignore attributes that can be safely ignored in check config attributes * 0lallalala come on * save nit * style * remove to dict update * make sure we can also run in float16 * style --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Aleksandar Botev <botev@google.com> Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com> Co-authored-by: anushanf <anushanf@google.com> Co-authored-by: botev <botevmg@gmail.com> Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-10 16:59:13 +02:00
NielsRogge	50c1c19fc7	[UDOP] Fix tests (#29573 ) * Fix tests * Fix tests * Remove no_split_modules	2024-04-10 15:47:17 +02:00
Fanli Lin	185463784e	[tests] make 2 tests device-agnostic (#30008 ) add torch device	2024-04-10 14:46:39 +02:00
Raushan Turganbay	41579763ee	Fix length related warnings in speculative decoding (#29585 ) * avoid generation length warning * add tests * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add tests and minor fixes * refine `min_new_tokens` * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add method to prepare length arguments * add test for min length * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * fix variable naming * empty commit for tests * trigger tests (empty) --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-04-10 12:45:07 +05:00
Marc Sun	58a939c6b7	Fix quantization tests (#29914 ) * revert back to torch 2.1.1 * run test * switch to torch 2.2.1 * udapte dockerfile * fix awq tests * fix test * run quanto tests * update tests * split quantization tests * fix * fix again * final fix * fix report artifact * build docker again * Revert "build docker again" This reverts commit `399a5f9d93`. * debug * revert * style * new notification system * testing notfication * rebuild docker * fix_prev_ci_results * typo * remove warning * fix typo * fix artifact name * debug * issue fixed * debug again * fix * fix time * test notif with faling test * typo * issues again * final fix ? * run all quantization tests again * remove name to clear space * revert modfiication done on workflow * fix * build docker * build only quant docker * fix quantization ci * fix * fix report * better quantization_matrix * add print * revert to the basic one	2024-04-09 17:10:29 +02:00
Yih-Dar	08a194fcd6	Fix slow tests for important models to be compatible with A10 runners (#29905 ) * fix mistral and mixtral * add pdb * fix mixtral tesst * fix * fix mistral ? * add fix gemma * fix mistral * fix * test * anoter test * fix * fix * fix mistral tests * fix them again * final fixes for mistral * fix padding right * fix whipser fa2 * fix * fix * fix gemma * test * fix llama * fix * fix * fix llama gemma * add class attribute * fix CI * clarify whisper * compute_capability * rename names in some comments * Add # fmt: skip * make style * Update tests/models/mistral/test_modeling_mistral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update * update --------- Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-04-09 13:28:54 +02:00
Matt	ec59a42192	Revert workaround for TF safetensors loading (#30128 ) * See if we can get tests to pass with the fixed weights * See if we can get tests to pass with the fixed weights * Replace the revisions now that we don't need them anymore	2024-04-09 11:04:18 +01:00
Sourab Mangrulkar	4e3490f79b	Fix failing DeepSpeed model zoo tests (#30112 ) * fix sequence length errors * fix label column name error for vit * fix the lm_head embedding!=linear layer mismatches for Seq2Seq models	2024-04-09 12:01:47 +05:30
Jonathan Tow	2f12e40822	[`StableLm`] Add QK normalization and Parallel Residual Support (#29745 ) * init: add StableLm 2 support * add integration test for parallel residual and qk layernorm * update(modeling): match qk norm naming for consistency with phi/persimmon * fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity * `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward` * refactor: rename head states var in `StableLmLayerNormPerHead` * tests: update test model and add generate check	2024-04-08 23:51:58 +02:00
fxmarty	1897874edc	Fix falcon with SDPA, alibi but no passed mask (#30123 ) * fix falcon without attention_mask & alibi * add test * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-08 22:25:07 +08:00
amyeroberts	7f9aff910b	Patch fix - don't use safetensors for TF models (#30118 ) * Patch fix - don't use safetensors for TF models * Skip test for TF for now * Update for another test	2024-04-08 13:29:20 +01:00
Fanli Lin	d16f0abc3f	[tests] add `require_bitsandbytes` marker (#30116 ) * add bnb flag * move maker * add accelerator maker	2024-04-08 12:49:31 +01:00
vaibhavagg303	1ed93be48a	[Whisper] Computing features on GPU in batch mode for whisper feature extractor. (#29900 ) * add _torch_extract_fbank_features_batch function in feature_extractor_whisper * reformat feature_extraction_whisper.py file * handle batching in single function * add gpu test & doc * add batch test & device in each __call__ * add device arg in doc string --------- Co-authored-by: vaibhav.aggarwal <vaibhav.aggarwal@sprinklr.com>	2024-04-08 10:36:25 +02:00
Yih-Dar	9b5a6450d4	Fix auto tests (#30067 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-05 17:49:46 +02:00
Michael Benayoun	17cd7a9d28	Fix `torch.fx` symbolic tracing for LLama (#30047 ) * [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * [WIP] fix fx * Apply changes to other models	2024-04-05 15:14:09 +02:00
Marc Sun	4207a4076d	[bnb] Fix offload test (#30039 ) fix bnb test	2024-04-05 13:11:28 +02:00
Wang, Yi	79d62b2da2	if output is tuple like facebook/hf-seamless-m4t-medium, waveform is … (#29722 ) * if output is tuple like facebook/hf-seamless-m4t-medium, waveform is the first element Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * add test and fix batch issue Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * add dict output support for seamless_m4t Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-04-05 09:26:44 +02:00
Yih-Dar	8b52fa6b42	skip `test_encode_decode_fast_slow_all_tokens` for now (#30044 ) skip test_encode_decode_fast_slow_all_tokens for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-05 09:07:41 +02:00
byi8220	75b76a5ea4	[`ProcessingIdefics`] Attention mask bug with padding (#29449 ) * Defaulted IdeficsProcessor padding to 'longest', removed manual padding * make fixup * Defaulted processor call to padding=False * Add padding to processor call in IdeficsModelIntegrationTest as well * Defaulted IdeficsProcessor padding to 'longest', removed manual padding * make fixup * Defaulted processor call to padding=False * Add padding to processor call in IdeficsModelIntegrationTest as well * redefaulted padding=longest again * fixup/doc	2024-04-04 10:11:09 +01:00
Raushan Turganbay	cc75f1ac73	Fix vipllava for generation (#29874 ) * fix vipllava generation * consistent llava code * revert llava tests changes	2024-04-03 17:00:08 +01:00
Ondřej Cífka	240e10626b	Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores (#29248 ) * Fix is_scores_logprobs in WhisperNoSpeechDetection * Add test_whisper_longform_no_speech_detection * Fix typo	2024-04-03 17:53:07 +02:00
Ondřej Cífka	bcd42c4af9	Fix `kwargs` handling in `generate_with_fallback` (#29225 ) * Fix generate_with_fallback *kwargs Change pop to get * Delete keys from kwargs to prevent overriding generation_config * Revert to passing kwargs by reference, but make a (shallow) copy * dict -> copy.copy * Add test_whisper_longform_multi_batch_beam	2024-04-03 17:51:03 +02:00
Ren Xuancheng	851f253f4d	Fix Qwen2Tokenizer (#29929 ) qwen2: fixed tokens starting with # in slow tokenizer; add tests Co-authored-by: jklj077 <17811943+jklj077@users.noreply.github.com>	2024-04-03 17:42:43 +02:00
Yih-Dar	b44df05bc0	Update `tests/utils/tiny_model_summary.json` (#29941 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-03 09:25:01 +02:00
Nicolas Patry	9b0a8ea7d1	Hard error when ignoring tensors. (#27484 ) (#29906 ) * Hard error when ignoring tensors. (#27484) * [WIP] Hard error when ignoring tensors. * Better selection/error when saving a checkpoint. - Find all names we should normally drop (those are in the transformers config) - Find all disjoint tensors (for those we can safely trigger a copy to get rid of the sharing before saving) - Clone those disjoint tensors getting rid of the issue - Find all identical names (those should be declared in the config but we try to find them all anyway.) - For all identical names: - If they are in the config, just ignore them everything is fine - If they are not, warn about them. - For all remainder tensors which are shared yet neither identical NOR disjoint. raise a hard error. * Adding a failing test on `main` that passes here. * We don't need to keep the subfolder logic in this test. * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add small tests. * Dead variable. * Fixup. * Fixing tied_Weights_keys on generic models. * Fixup + T5 encoder/decoder tying (with different layers) * Code quality. * Dynamic member. * trigger * Fixing encoder name for other types of encoder/decoder combos. * Fix scoping. * Update .github/workflows/self-scheduled.yml Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fixing the tied_weights after the call. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-02 16:59:05 +02:00
Minsub Lee (Matt)	15cd68713d	Fix `skip_special_tokens` for `Wav2Vec2CTCTokenizer._decode` (#29311 ) * Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode * Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode * Exclude pad_token filtering since it is used as CTC-blank token * Add small test for skip_special_tokens * Update decoding test for added new token	2024-04-02 16:55:11 +02:00
Yoach Lacombe	0d04b1e25a	Add Flash Attention 2 support to Musicgen and Musicgen Melody (#29939 ) * add FA2 to o.g Musicgen * make style * add FA2 support to Musicgen Melody * add generation FA2 tests to o.g Musicgen * make style and fix copies * add Musicgen to FA2 docs + deprecate list * add sdpa supports to Musicgen's * make style and fix copies * refactor attention implementation arguments * add Copied from to sdpa tests * add copied form in sdpa tests melody * add copied for FA2 generation tests * add FA2 inference copied from * make style	2024-04-02 11:23:49 +01:00
théo gigant	fed27ffc7e	Adding FlaxNoRepeatNGramLogitsProcessor (#29677 ) * fix issue with logit processor in beam search in Flax * adding FlaxNoRepeatNGramLogitsProcessor class + unit test * style correction and code verification * add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests * fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams * replace non-jit compatible masking of ngrams that are not yet generated with jittable version * Revert "fix issue with logit processor in beam search in Flax" This reverts commit `09b70d7e4d`. * add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor * change the method of casting to boolean of banned tokens indices * fix code style * remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop * remove useless loop iterations * set some variables that were calculated and used multiple times * fix format	2024-04-02 11:39:33 +02:00
Hovnatan Karapetyan	416711c3ea	Fix 29807 sinusoidal positional encodings in Flaubert, Informer and XLM (#29904 ) * Fix sinusoidal_embeddings in FlaubertModel * Fix for Informer * Fix for XLM * Move sinusoidal emb for XLM * Move sinusoidal emb for Flaubert * Small cleanup * Add comments on tests code copied from * Add with Distilbert->	2024-04-02 10:27:26 +02:00
Arthur	83b26dd79d	[`generate`] fix breaking change for patch (#29976 ) * fix bug and add tests * nit * otherway to get the cur len instead of attention mask * more places where this might have been broken * nit * oups * inputs_embeds vs input_embeds * test generated outptus * style * nit * fix * skip failing biogpt	2024-04-02 09:51:45 +02:00
Joao Gante	c9f6e5e351	Generate: move misplaced test (#29902 )	2024-04-01 12:45:25 +01:00
Fanli Lin	e4f5b57a3b	[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` (#29975 ) bug fix	2024-04-01 13:08:39 +02:00
Arthur	fa2c49b00b	Fix copies main ci (#29979 ) * fix copies * nit * style * Update utils/check_copies.py	2024-04-01 12:43:58 +02:00
Yoach Lacombe	569f6c7d43	Fix FA2 tests (#29909 ) * fix FA2 tests * refactor inference test name	2024-04-01 07:51:00 +00:00
Zach Mueller	3b8e2932ce	Rework tests to compare trainer checkpoint args (#29883 ) * Start rework * Fix failing test * Include max * Update src/transformers/trainer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-30 22:19:17 -04:00
Yih-Dar	43d17c1836	Mark `test_eager_matches_sdpa_generate` flaky for some models (#29479 ) * fix * revert for qwen2 * revert for qwen2 * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-29 11:51:20 +01:00
Arthur	536ea2aca2	[`LlamaSlowConverter`] Slow to Fast better support (#29797 ) * fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py	2024-03-28 16:19:32 +01:00
Yu Chin Fabian Lim	4df5b9b4b2	Allow GradientAccumulationPlugin to be configured from AcceleratorConfig (#29589 ) * add gradient_accumulation_kwargs to AcceleratorConfig * add suggestions from @muellerzr to docstrings, new behavior and tests * Documentation suggestions from @muellerz Co-authored-by: Zach Mueller <muellerzr@gmail.com> * addressed @muellerzr comments regarding tests and test utils * moved accelerate version to top of file. * @muellerzr's variable fix Co-authored-by: Zach Mueller <muellerzr@gmail.com> * address @amyeroberts. fix tests and docstrings * address @amyeroberts additional suggestions --------- Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-03-28 14:01:40 +00:00
Arthur	a2a7f71604	[ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix (#29453 ) * nit * update test and fix test * fixup	2024-03-28 13:58:40 +01:00
Joao Gante	441de62f49	RoPE models: add numerical sanity-check test for RoPE scaling (#29808 ) * add hard rope scaling test * make fixup * quick rope scaling tests * add copy statements	2024-03-28 11:25:50 +00:00
Christopher Keibel	aac7099c92	add functions to inspect model and optimizer status to trainer.py (#29838 ) * add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py * add tests and raise ValueError when optimizer is None * add second layer to test and freeze its weigths * check if torch is available before running tests * use decorator to check if torch is available Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix test indentation Co-authored-by: Zach Mueller <muellerzr@gmail.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-03-28 10:37:16 +00:00
Joao Gante	248d5d23a2	Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` (#29915 ) * replace torch.testing.assert_allclose by torch.testing.assert_close * missing atol rtol	2024-03-28 09:53:31 +00:00
Eduardo Pacheco	22d159ddf9	Adding Flash Attention 2 Support for GPT2 (#29226 ) * First commit to add flash attention 2 for GPT-2 * more improvements * Make GPT2 pass tests and fixed Decison Transformers copies * Fixed missing arg * fix copies * Added expected speedup * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Added test * Fixed attn attribute * Update docs/source/en/model_doc/gpt2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/gpt2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update Decision transformer attentions * More updates * Passing tests * Fix copies * Fix copies part 2 * Decision transformer updates * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix copies * Decision transformer not supporting flash attn * Addressed comments * Addressed comments * Addressed comments --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-28 09:31:24 +00:00
Lorenzo Verardo	a25037beb9	MixtralSparseMoeBlock: add gate jitter (#29865 ) This commit adds gate jitter to MixtralSparseMoeBlock's input data before passing it through the MoE layer, if turned on.	2024-03-27 16:14:26 +01:00
Raushan Turganbay	0efcf32351	Move `eos_token_id` to stopping criteria (#29459 ) * add eos stopping criteria * minor fix * Update tests/generation/test_stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * check eos is not None and fix tests * make style and fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * camel case everywhere * call stopping criteria list for candidate ids * make style and fixup * Empty commit * Empty commit to pass flaky test * set max length in PromptLookupCandidateGenerator * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * lets fix this typo in docs * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update PR * empty commit --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-27 12:18:10 +00:00
Lysandre Debut	4d8427f739	Reimplement "Automatic safetensors conversion when lacking these files" (#29846 ) * Automatic safetensors conversion when lacking these files (#29390) * Automatic safetensors conversion when lacking these files * Remove debug * Thread name * Typo * Ensure that raises do not affect the main thread * Catch all errors	2024-03-27 08:58:08 +01:00
Hovnatan Karapetyan	a81cf9ee90	Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813 ) * Check for requires_grad when initing weights * Add unit test * Move sinusoidal positional encoding generation after post_init() * Add modules to skip init list * Move create_sinusoidal_embeddings to _init_weights	2024-03-27 06:28:00 +01:00
Anton Vlasjuk	cefb819f7a	Mamba `slow_forward` gradient fix (#29563 ) * FIX: Cached slow forward in mamba - additionally added mamba cached test - added unused test (mamba causal lm forward and backward) - fixed typo: "causl" --> "causal" * formatting * fix: use real `slow_forward` call instead of torch module's * add shape assertion for mixer block test * adjust shape assertion	2024-03-27 04:52:12 +01:00
Bo Zheng	1c39974a4c	Add Qwen2MoE (#29377 ) * add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-27 02:11:55 +01:00
Yanyi Liu	ef60995858	Add `cosine_with_min_lr` scheduler in Trainer (#29341 ) * Add cosine_with_min_lr scheduler * Update error message for missing min_lr or min_lr_rate	2024-03-26 13:57:07 +01:00
Zhihao Lin	998b5bb56f	Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772 ) * update * add ut * update	2024-03-26 12:51:00 +00:00
yunxiangtang	b32bf85b58	Replace 'decord' with 'av' in VideoClassificationPipeline (#29747 ) * replace the 'decord' with 'av' in VideoClassificationPipeline * fix the check of backend in VideoClassificationPipeline * adjust the order of imports * format 'video_classification.py' * format 'video_classification.py' with ruff --------- Co-authored-by: wanqiancheng <13541261013@163.com>	2024-03-26 10:12:24 +00:00
Jonathan Flynn	b5a6d6eeab	Add warnings if training args differ from checkpoint trainer state (#29255 ) * add warnings if training args differ from checkpoint args stored in trainer_state.json * run formatting and styling * add a test * format and styling --------- Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>	2024-03-26 07:13:13 +01:00
Yuki Watanabe	8e9a2207b3	Populate torch_dtype from model to pipeline (#28940 ) * Populate torch_dtype from model to pipeline Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * use property Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Remove default handling Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>	2024-03-25 10:46:40 +01:00
Lysandre Debut	39114c0383	Remove static pretrained maps from the library's internals (#29112 ) * [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-25 10:33:38 +01:00
fxmarty	13b23704a8	Correct llava mask & fix missing setter for `vocab_size` (#29389 ) * correct llava mask * fix vipllava as wlel * mask out embedding for padding tokens * add test * fix style * add setter * fix test on suggestion	2024-03-22 19:57:08 +08:00
Raushan Turganbay	fadb053379	Change in-place operations to out-of-place in LogitsProcessors (#29680 ) * change in-place -> out-of-place * add tests * add more tests * naming consistency * fix doctest * forgot min-length processors * empty * Revert "fix doctest" This reverts commit `4772768457`. * revert change in docstring * Update tests/generation/test_logits_process.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_logits_process.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-21 16:37:33 +00:00
Raushan Turganbay	b469ebc5cf	Prepend `bos token` to Blip generations (#29642 ) * prepend "bos" to blip generation * minor changes * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/instructblip/modeling_instructblip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add generation tester mixin --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-21 16:33:18 +00:00
Matt	de627f5a14	Cast bfloat16 to float32 for Numpy conversions (#29755 ) * Cast bfloat16 to float32 for Numpy conversions * Add test	2024-03-21 14:04:11 +00:00
Arthur	ff841900e4	[`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753 ) * attempt to fix * the actual fix that works with compilation! * this? * temporary update * nit? * dispatcg to memory efficient? * update both models that have static cache support * fix copies fix compile * make sure fix * fix cohere and gemma * fix beams? * nit * slipped through the cracks * nit * nits * update * fix-copies * skip failing tests * nits	2024-03-20 23:47:01 +01:00
Zach Mueller	c78f57729f	Update test reqs to include sentencepiece (#29756 ) * Update test reqs * Clean	2024-03-20 15:53:42 +00:00
NielsRogge	d91fd7f92c	Add LLaVa-1.6, bis (#29586 ) * First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-20 15:51:12 +00:00
Matt	9d999481b2	Add correct batched handling for apply_chat_template (#29222 ) * Add correct batched handling for apply_chat_template * Fix warning method * Add error for incompatible options * expand tests * Add a skip for markuplm * Add skips for other layout models * Skip for LayoutLMv2 * Slightly update the warning message * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * typo fix * Update docstring for conversation kwarg * Update return docstring * Remove the warning, improve error message * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/test_tokenization_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/test_tokenization_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove return_dict=None * Fix up some merge cruft * More merge cruft * Add another skip * Add another skip --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-20 15:50:22 +00:00
amyeroberts	3c17c529cc	SuperPointModel -> SuperPointForKeypointDetection (#29757 )	2024-03-20 15:41:03 +00:00
Matt	11ef35e828	Support sharded safetensors in TF (#29350 ) * Initial commit (still lots of unfinished bits) * (Still untested) add safetensors sharding to save_pretrained * Fix savetensors saving, update default shard size to match PT * Add proper loading of TF-format safetensors * Revert default size in case that changes things * Fix incorrect index name * Update loading priority * Update tests * Make the tests a little more stringent * Expand tests * Add sharded cross-test * Fix argument name * One more test fix * Adding mlx to the list of allowed formats * Remove irrelevant block for safetensors * Refactor warning logging into a separate function * Remove unused skip_logger_warnings arg * Update src/transformers/modeling_tf_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Move function def --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-20 14:22:35 +00:00
NielsRogge	776c9d3af8	[Tests] Remove unused code (#29737 ) Remove unused code	2024-03-20 13:26:00 +01:00
Joao Gante	1a5c500f12	Tests: Musicgen tests + `make fix-copies` (#29734 ) * make fix-copies * some tests fixed * tests fixed	2024-03-20 08:45:53 +01:00
Joao Gante	4294f0c358	Llama: partial 4d masks (#29731 ) * partial 4d masks * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-19 17:32:01 +00:00
Raushan Turganbay	425ba56cdf	Clean-up generation tests after moving methods to private (#29582 ) * clean-up tests * refine comments * fix musicgen tests * make style * remove slow decorator from a test * more clean-up * fix other failing tests	2024-03-19 17:03:31 +00:00
StevenBucaille	56baa03380	Implementation of SuperPoint and AutoModelForKeypointDetection (#28966 ) * Added SuperPoint docs * Added tests * Removed commented part * Commit to create and fix add_superpoint branch with a new branch * Fixed dummy_pt_objects * Committed missing files * Fixed README.md * Apply suggestions from code review Fixed small changes Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py * Removed AutoModelForKeypointDetection and related stuff * Fixed inconsistencies in image_processing_superpoint.py * Moved infer_on_model logic simply in test_inference * Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py * Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale * Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixed from (w, h) to (h, w) as input for tests * Removed unnecessary condition * Moved last_hidden_state to be the first returned * Moved last_hidden_state to be the first returned (bis) * Moved last_hidden_state to be the first returned (ter) * Switched image_width and image_height in tests to match recent changes * Added config as first SuperPointConvBlock init argument * Reordered README's after merge * Added missing first config argument to SuperPointConvBlock instantiations * Removed formatting error * Added SuperPoint to README's de, pt-br, ru, te and vi * Checked out README_fr.md * Fixed README_fr.md * Test fix README_fr.md * Test fix README_fr.md * Last make fix-copies ! * Updated checkpoint path * Removed unused SuperPoint doc * Added missing image * Update src/transformers/models/superpoint/modeling_superpoint.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Removed unnecessary import * Update src/transformers/models/superpoint/modeling_superpoint.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Added SuperPoint to _toctree.yml --------- Co-authored-by: steven <steven.bucaillle@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>	2024-03-19 14:43:02 +00:00
Arthur	2f9a3edbb9	[`GemmaConverter`] use user_defined_symbols (#29473 ) * use user_defined_symbols * fixup * nit * add a very robust test * make sure all models are tested with the `pretrained_tokenizer_to_test` * should we make sure we test all of them? * merge * remove the id * fix test * update * ousies * oups * fixup * fix copies check * remove `pretrained_tokenizer_to_test`	2024-03-19 15:13:56 +01:00
Younes Belkada	f6261d7d81	FEAT / Optim: Add GaLore optimizer (#29588 ) * add galore v1 * add import * add tests and doc * fix doctest * forward contrib credits from discussions * forward contrib credits from discussions * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * fix failing tests' * switch to `optim_target_modules` and clarify docs * more clarification * enhance lookup logic * update a test to add peak memory * add regex, all-linear and single string support * add layer-wise optimization through DummyOptimizers and LRSchedulers * forward contrib credits from discussions and original idea * add a section about DDP not supported in layerwise * Update src/transformers/trainer.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * fix self * check only if layer_wise * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * oops * make use of intervals * clarify comment * add matching tests * GaLoRe -> GaLore * move to `get_scheduler` * add note on docs * add a warning * adapt a bit the docs * update docstring * support original API * Update docs/source/en/trainer.md * slightly refactor * Update docs/source/en/trainer.md Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix args parsing and add tests * remove warning for regex * fix type hint * add note about extra args * make `is_regex` return optional --------- Co-authored-by: Maxime <maximegmd @users.noreply.github.com> Co-authored-by: Wing Lian <winglian @users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: hiyouga <hiyouga@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>	2024-03-19 11:40:23 +01:00
Yoach Lacombe	c43b380e70	Add MusicGen Melody (#28819 ) * first modeling code * make repository * still WIP * update model * add tests * add latest change * clean docstrings and copied from * update docstrings md and readme * correct chroma function * correct copied from and remove unreleated test * add doc to toctree * correct imports * add convert script to notdoctested * Add suggestion from Sanchit Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct get_uncoditional_inputs docstrings * modify README according to SANCHIT feedback * add chroma to audio utils * clean librosa and torchaudio hard dependencies * fix FE * refactor audio decoder -> audio encoder for consistency with previous musicgen * refactor conditional -> encoder * modify sampling rate logics * modify license at the beginning * refactor all_self_attns->all_attentions * remove ignore copy from causallm generate * add copied from for from_sub_models * fix make copies * add warning if audio is truncated * add copied from where relevant * remove artefact * fix convert script * fix torchaudio and FE * modify chroma method according to feedback-> better naming * refactor input_values->input_features * refactor input_values->input_features and fix import fe * add input_features to docstrigs * correct inputs_embeds logics * remove dtype conversion * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation * change warning for chroma length * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * change way to save wav, using soundfile * correct docs and change to soundfile * fix import * fix init proj layers * remove line breaks from md * fix issue with docstrings * add FE suggestions * improve is in logics and remove useless imports * remove custom from_pretrained * simplify docstring code * add suggestions for modeling tests * make style * update converting script with sanity check * remove encoder attention mask from conditional generation * replace musicgen melody checkpoints with official orga * rename ylacombe->facebook in checkpoints * fix copies * remove unecessary warning * add shape in code docstrings * add files to slow doc tests * fix md bug and add md to not_tested * make fix-copies * fix hidden states test and batching --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-03-18 13:06:12 +00:00
Yoach Lacombe	4e98d59443	[FIX] Fix speech2test modeling tests (#29672 ) * fix speech_to_test generation tests * Add details to comment * Update tests/models/speech_to_text/test_modeling_speech_to_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-15 17:58:11 +00:00
Marc Sun	28de2f4de3	[Quantization] Quanto quantizer (#29023 ) * start integration * fix * add and debug tests * update tests * make pytorch serialization works * compatible with device_map and offload * fix tests * make style * add ref * guard against safetensors * add float8 and style * fix is_serializable * Fix shard_checkpoint compatibility with quanto * more tests * docs * adjust memory * better * style * pass tests * Update src/transformers/modeling_utils.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * add is_safe_serialization instead * Update src/transformers/quantizers/quantizer_quanto.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * add QbitsTensor tests * fix tests * simplify activation list * Update docs/source/en/quantization.md Co-authored-by: David Corvoysier <david.corvoysier@gmail.com> * better comment * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: David Corvoysier <david.corvoysier@gmail.com> * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: David Corvoysier <david.corvoysier@gmail.com> * find and fix edge case * Update docs/source/en/quantization.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * pass weights_only_kwarg instead * fix shard_checkpoint loading * simplify update_missing_keys * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * recursion to get all tensors * block serialization * skip serialization tests * fix * change by cuda:0 for now * fix regression * update device_map * fix doc * add noteboon * update torch_dtype * update doc * typo * typo * remove comm --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: David Corvoysier <david.corvoysier@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <younesbelkada@gmail.com>	2024-03-15 11:51:29 -04:00
Fanli Lin	c1993e68b8	[tests] remove deprecated tests for model loading (#29450 ) * gix * fix style * remove equivalent tests * add back for image_processor * remove again	2024-03-15 14:18:41 +00:00
Saurabh Dash	0e4a1c3401	Cohere Model Release (#29622 ) * Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <ahmetustun89@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2024-03-15 14:29:11 +01:00
Joao Gante	c47fcd0830	Trainer: fail early in the presence of an unsavable `generation_config` (#29675 )	2024-03-15 12:59:10 +00:00
Matt	48fbab7330	Allow apply_chat_template to pass kwargs to the template and support a dict of templates (#29658 ) * Allow apply_chat_template to pass kwargs to the template * Fix priority for template_kwargs * Fix docstring * style fix * Add the option for the model to have a dict of templates * Error message cleanup * Add test for chat template dicts * Simplify the chat template dict test and apply it to all tokenizers in self.get_tokenizers() * Save chat template dicts as lists with fixed key names * Add test for serialization/reloading * Add require_jinja just to be safe, even though I don't think we use it	2024-03-14 18:23:14 +00:00
Yih-Dar	7b87ecb047	Fix PVT v2 tests (#29660 ) * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-14 17:00:32 +01:00
Yih-Dar	2cc3cc835f	Add `dataset_revision` argument to `RagConfig` (#29610 ) * add arg --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-14 16:48:11 +01:00
Nate Cibik	1fc505b816	Add PvT-v2 Model (#26812 ) * Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Updated index.md * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Added pvt_v2 to docs/source/end/model_doc * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat. Added additional type support for image size in config * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * Set key and value layers to use separate linear modules. Fixed pruning function * Set AvgPool to 7 * Fixed issue in init * PvT-v2 now works in AutoModel * Successful conversion of pretrained weights for PVT-v2 * Successful conversion of pretrained weights for PVT-v2 models * Added pytests for pvt-v2, all passed * Ran fix-copies and fixup. All checks passed * Added additional ReLU for linear attention mode * pvt_v2_b2_linear converted and working * Reverted batch eval changes for PR * Expanded type support for Pvt-v2 config * Fixed config docstring. Added channels property * Fixed model names in tests * Fixed config backbone compat * Ran fix-copies * Fixed PvtV2Backbone tests * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py * Fixed backbone stuff and fixed tests: all passing * Ran make fixup * Made modifications for code checks * Remove ONNX config from configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use explicit image size dict in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make image_size optional in test_modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove _ntuple use in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove reference to fp16_enabled * Model modules now take config as first argument even when not used * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling" * All LayerNorm now instantiates with config.layer_norm_eps * Added docstring for depth-wise conv layer * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size * Refactored PVTv2 in prep for gradient checkpointing * Gradient checkpointing ready to test * Removed override of _set_gradient_checkpointing * Cleaned out old code * Applied code fixup * Applied code fixup * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Reverted batch eval changes for PR * Fixed config docstring. Added channels property * Fixed config backbone compat * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Ran fix-copies and fixup. All checks passed * Allowed for batching of eval metrics * copied models/pvt to adapt to pvt_v2 * First commit of pvt_v2 * PvT-v2 now works in AutoModel * Fixed config backbone compat * Ran fix-copies * Began debug of pvt_v2 tests * Leave handling of num_labels to base pretrained config class * Deactivated gradient checkpointing tests until it is fixed * Removed PvtV2ImageProcessor which duped PvtImageProcessor * Fixed issue from rebase * Fixed issue from rebase * Set tests for gradient checkpointing to skip those using reentrant since it isn't supported * Fixed issue from rebase * Fixed issue from rebase * Changed model name in docs * Removed duplicate PvtV2Backbone * Work around type switching issue in tests * Fix model name in config comments * Update docs/source/en/model_doc/pvt_v2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed name of variable from 'attn_reduce' to 'sr_type' * Changed from using 'sr_type' to 'linear_attention' for clarity * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed old code * Changed from using 'sr_type' to 'linear_attention' for clarity * Fixed Class names to be more descriptive * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Removed outdated code * Moved paper abstract to single line in pvt_v2.md * Added usage tips to pvt_v2.md * Simplified module inits by passing layer_idx * Fixed typing for hidden_act in PvtV2Config * Removed unusued import * Add pvt_v2 to docs/source/en/_toctree.yml * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive. * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Move function parameters to single line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Update year of copyright to 2024 Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py Make code more explicit Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated sr_ratio to be more explicit spatial_reduction_ratio * Removed excess type hints in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Move params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Removed needless comment in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update copyright date in pvt_v2.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Moved params to single line in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated copyright date in configuration_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Cleaned comments in modeling_pvt_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Renamed spatial_reduction Conv2D operation * Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py " This reverts commit `c4a04416dd`. * Updated conversion script to reflect module name change * Deprecated reshape_last_stage option in config * Removed unused imports * Code formatting * Fixed outdated decorators on test_inference_fp16 * Added "Copied from" comments in test_modeling_pvt_v2.py * Fixed import listing * Updated model name * Force empty commit for PR refresh * Fixed linting issue * Removed # Copied from comments * Added PVTv2 to README_fr.md * Ran make fix-copies * Replace all FoamoftheSea hub references with OpenGVLab * Fixed out_indices and out_features logic in configuration_pvt_v2.py * Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py * Ran code fixup * Fixed order of parent classes in PvtV2Config to fix the to_dict method override --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-13 19:05:20 +00:00
Yih-Dar	fe085560d0	Fix `multi_gpu_data_parallel_forward` for `MusicgenTest` (#29632 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-13 19:12:20 +01:00
Raushan Turganbay	5ac264d8a8	Fix batching tests for new models (Mamba and SegGPT) (#29633 ) * fix batchinng tests for new models * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-13 17:52:49 +00:00
Fanli Lin	a7e5e15472	[tests] make `test_trainer_log_level_replica` to run on accelerators with more than 2 devices (#29609 ) add new arg	2024-03-13 17:44:35 +00:00
Sourab Mangrulkar	350c5d1566	Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA (#29587 ) * fsdp+qlora related changes * fixes * Update quantization_config.py * support fsdp+qlora and dsz3+qlora * Update quantization_config.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * handle fsdp+qlora and dsz3+qlora correctly while model loading * fix param count * quality * fsdp related changes * fsdp changes only when using LoRA/QLoRA * add accelerate version check * refactor, update min accelerate version and add tests 1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict * fix test * Address comments Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * fix the conditional flag * fix conditional flag * address comments Co-Authored-By: Zach Mueller <7831895+muellerzr@users.noreply.github.com> --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Zach Mueller <7831895+muellerzr@users.noreply.github.com>	2024-03-13 22:03:02 +05:30
Joao Gante	1e21c4fbe0	Llama: allow custom 4d masks (#29618 )	2024-03-13 15:07:52 +00:00
Lysandre Debut	11bbb505c7	Adds pretrained IDs directly in the tests (#29534 ) * Adds pretrained IDs directly in the tests * Fix tests * Fix tests * Review!	2024-03-13 14:53:27 +01:00
bytebarde	be3fd8a262	[Flash Attention 2] Add flash attention 2 for GPT-J (#28295 ) * initial implementation of flash attention for gptj * modify flash attention and overwrite test_flash_attn_2_generate_padding_right * update flash attention support list * remove the copy line in the `CodeGenBlock` * address copy mechanism * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add GPTJ attention classes * add expected outputs in the gptj test * Ensure repo consistency with 'make fix-copies' --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-13 08:43:00 +01:00
Raushan Turganbay	8e64ba2890	Add tests for batching support (#29297 ) * add tests for batching support * Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * fixes and comments * use cosine distance for conv models * skip mra model testing * Update tests/models/vilt/test_modeling_vilt.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * finzalize and make style * check model type by input names * Update tests/models/vilt/test_modeling_vilt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixed batch size for all testers * Revert "fixed batch size for all testers" This reverts commit `525f3a0a05`. * add batch_size for all testers * dict from model output * do not skip layoutlm * bring back some code from git revert * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * clean-up * where did minus go in tolerance * make whisper happy * deal with consequences of losing minus * deal with consequences of losing minus * maskformer needs its own test for happiness * fix more models * tag flaky CV models from Amy's approval * make codestyle --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-12 17:46:19 +00:00
Yih-Dar	a15bd3af4e	Update flava tests (#29611 ) * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-12 17:04:53 +01:00
Pedro Cuenca	b382a09e28	Experimental loading of MLX files (#29511 ) * Experimental loading of MLX files * Update exception message * Add test * Style * Use model from hf-internal-testing	2024-03-11 18:42:06 +00:00
Arthur	4f27ee936a	[`Mamba doc`] Post merge updates (#29472 ) * post merge update * nit * oups	2024-03-11 09:46:24 +01:00
Fanli Lin	3f6973db06	[tests] use the correct `n_gpu` in `TrainerIntegrationTest::test_train_and_eval_dataloaders` for XPU (#29307 ) * fix n_gpu * fix style	2024-03-08 10:52:25 -05:00
Jonatan Kłosko	608fa5496c	Make sliding window size inclusive in eager attention (#29519 ) * Make sliding window size inclusive in eager attention * Fix tests	2024-03-08 12:53:17 +00:00
Fanli Lin	1ea3ad1aec	[tests] use `torch_device` instead of `auto` for model testing (#29531 ) * use torch_device * skip for XPU * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-08 11:21:43 +00:00
Wang, Yi	8ee1d47203	fix image-to-text batch incorrect output issue (#29342 ) * fix image-to-text batch incorrect output issue Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * add ci test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update ci test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-03-08 11:11:10 +00:00
Fanli Lin	8e589c83b6	[tests] add the missing `require_sacremoses` decorator (#29504 ) * add sacremoses check * fix style * for FlaubertTokenizer * HerbertTokenizer fix * add typeHint * Update src/transformers/testing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make less skipped * make quality * remove import --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-08 10:13:54 +00:00
Joao Gante	bc764f4263	Generate: left-padding test, revisited (#29515 ) * left-padding test revisited * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-08 10:06:46 +00:00
Nick DeGroot	b338a6c3b8	Fix `VisionEncoderDecoder` Positional Arg (#29497 ) * 🐛 Fix vision encoder decoder positional arg * ✅ Add test for VisionEncoderDecoder with LayoutLMv3 encoder --------- Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>	2024-03-07 20:45:51 +00:00
amyeroberts	4ed9ae623d	test_generation_config_is_loaded_with_model - fall back to pytorch model for now (#29521 ) * Fall back to pytorch model for now * Fix up	2024-03-07 17:30:28 +00:00
Raushan Turganbay	923733c22b	Flava multimodal add attention mask (#29446 ) * flava multimodal add attn mask * make style * check mask is not None	2024-03-07 12:45:47 +01:00
Lysandre Debut	f6133d767a	Revert "Automatic safetensors conversion when lacking these files (#2… (#29507 ) Revert "Automatic safetensors conversion when lacking these files (#29390)" This reverts commit `a69cbf4e64`.	2024-03-07 12:12:41 +01:00
Joao Gante	ffe60fdcd6	v4.39 deprecations 🧼 (#29492 )	2024-03-07 10:44:43 +00:00
regisss	979fccc90f	Enable BLIP for auto VQA (#29499 ) * Enable BLIP for auto VQA * Make style * Add VQA to BLIP pipeline tests	2024-03-07 10:28:01 +01:00
Joao Gante	700d48fb2d	Generate: get generation mode from the generation config instance 🧼 (#29441 )	2024-03-06 11:18:35 +00:00
Joao Gante	41f7b7ae4b	Generate: add tests for caches with `pad_to_multiple_of` (#29462 )	2024-03-06 10:57:04 +00:00
Fanli Lin	00bf44270f	[FIX] `offload_weight()` takes from 3 to 4 positional arguments but 5 were given (#29457 ) * use require_torch_gpu * enable on XPU * fix	2024-03-06 03:58:42 +01:00
Lysandre Debut	a69cbf4e64	Automatic safetensors conversion when lacking these files (#29390 ) * Automatic safetensors conversion when lacking these files * Remove debug * Thread name * Typo * Ensure that raises do not affect the main thread	2024-03-05 13:37:55 +01:00
Arthur	fb1c62e973	[`Add Mamba`] Adds support for the `Mamba` models (#28094 ) * initial-commit * start cleaning * small nits * small nits * current updates * add kernels * small refactoring little step * add comments * styling * nit * nits * Style * Small changes * Push dummy mambda simple slow * nit * Use original names * Use original names and remove norm * Updates for inference params * Style nd updates * nits * Match logits * Add a test * Add expected generated text * nits doc, imports and styling * style * oups * dont install kernels, invite users to install the required kernels * let use use the original packages * styling * nits * fix some copieds * update doc * fix-copies * styling done * nits * fix import check * run but wrong cuda ress * mamba CUDA works :) * fix the fast path * config naming nits * conversion script is not required at this stage * finish fixing the fast path: generation make sense now! * nit * Let's start working on the CIs * style * better style * more nits * test nit * quick fix for now * nits * nit * nit * nit * nits * update test rest * fixup * update test * nit * some fixes * nits * update test values * fix styling * nit * support peft * integrations tests require torchg * also add slow markers * styling * chose forward wisely * nits * update tests * fix gradient checkpointing * fixup * nit * fix doc * check copies * fix the docstring * fix some more tests * style * fix beam search * add init schene * update * nit * fix * fixup the doc * fix the doc * fixup * tentative update but slow is no longer good * nit * should we always use float32? * nits * revert wrong changes * res in float32 * cleanup * skip fmt for now * update generation values * update test values running original model * fixup * update tests + rename inference_params to cache_params + make sure training does not use cache_params * small nits * more nits * fix final CIs * style * nit doc * I hope final doc nits * nit * 🫠 * final touch! * fix torch import * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * Apply suggestions from code review * fix fix and fix * fix base model prefix! * nit * Update src/transformers/models/mamba/__init__.py * Update docs/source/en/model_doc/mamba.md Co-authored-by: Lysandre Debut <hi@lysand.re> * nit --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-03-05 20:01:06 +09:00
Arthur	4d892b7297	[`Udop imports`] Processor tests were not run. (#29456 ) * fix udop imports * sort imports	2024-03-05 11:01:08 +01:00
Arthur	57d007b912	Revert-commit `0d52f9f582` (#29455 ) * style * revert with RP * nit * exact revert	2024-03-05 10:39:42 +01:00
Arthur Zucker	0d52f9f582	more fix	2024-03-05 18:27:25 +09:00
Arthur	132852203a	[`UdopTokenizer`] Fix post merge imports (#29451 ) * update * ... * nits * arf * 🧼 * beat the last guy * style everyone	2024-03-05 09:42:52 +01:00
Fanli Lin	fa7f3cf336	[tests] enable test_pipeline_accelerate_top_p on XPU (#29309 ) * use torch_device * Update tests/pipelines/test_pipelines_text_generation.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-05 09:16:05 +01:00
Ilyas Moutawwakil	4fc708f98c	Exllama kernels support for AWQ models (#28634 ) * added exllama kernels support for awq models * doc * style * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * refactor * moved exllama post init to after device dispatching * bump autoawq version * added exllama test * style * configurable exllama kernels * copy exllama_config from gptq * moved exllama version check to post init * moved to quantization dockerfile --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-03-05 03:22:48 +01:00
NielsRogge	836921fdeb	Add UDOP (#22940 ) * First draft * More improvements * More improvements * More fixes * Fix copies * More improvements * More fixes * More improvements * Convert checkpoint * More improvements, set up tests * Fix more tests * Add UdopModel * More improvements * Fix equivalence test * More fixes * Redesign model * Extend conversion script * Use real inputs for conversion script * Add image processor * Improve conversion script * Add UdopTokenizer * Add fast tokenizer * Add converter * Update README's * Add processor * Add fully fledged tokenizer * Add fast tokenizer * Use processor in conversion script * Add tokenizer tests * Fix one more test * Fix more tests * Fix tokenizer tests * Enable fast tokenizer tests * Fix more tests * Fix additional_special_tokens of fast tokenizer * Fix tokenizer tests * Fix more tests * Fix equivalence test * Rename image to pixel_values * Rename seg_data to bbox * More renamings * Remove vis_special_token * More improvements * Add docs * Fix copied from * Update slow tokenizer * Update fast tokenizer design * Make text input optional * Add first draft of processor tests * Fix more processor tests * Fix decoder_start_token_id * Fix test_initialization * Add integration test * More improvements * Improve processor, add test * Add more copied from * Add more copied from * Add more copied from * Add more copied from * Remove print statement * Update README and auto mapping * Delete files * Delete another file * Remove code * Fix test * Fix docs * Remove asserts * Add doc tests * Include UDOP in exotic model tests * Add expected tesseract decodings * Add sentencepiece * Use same design as T5 * Add UdopEncoderModel * Add UdopEncoderModel to tests * More fixes * Fix fast tokenizer * Fix one more test * Remove parallelisable attribute * Fix copies * Remove legacy file * Copy from T5Tokenizer * Fix rebase * More fixes, copy from T5 * More fixes * Fix init * Use ArthurZ/udop for tests * Make all model tests pass * Remove UdopForConditionalGeneration from auto mapping * Fix more tests * fixups * more fixups * fix the tokenizers * remove un-necessary changes * nits * nits * replace truncate_sequences_boxes with truncate_sequences for fix-copies * nit current path * add a test for input ids * ids that we should get taken from `c9f7a32f57` * nits converting * nits * apply ruff * nits * nits * style * fix slow order of addition * fix udop fast range as well * fixup * nits * Add docstrings * Fix gradient checkpointing * Update code examples * Skip tests * Update integration test * Address comment * Make fixup * Remove extra ids from tokenizer * Skip test * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update year * Address comment * Address more comments * Address comments * Add copied from * Update CI * Rename script * Update model id * Add AddedToken, skip tests * Update CI * Fix doc tests * Do not use Tesseract for the doc tests * Remove kwargs * Add original inputs * Update casting * Fix doc test * Update question * Update question * Use LayoutLMv3ImageProcessor * Update organization * Improve docs * Update forward signature * Make images optional * Remove deprecated device argument * Add comment, add add_prefix_space * More improvements * Remove kwargs --------- Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-04 18:49:02 +01:00
Donggeun Yu	ed74d97871	DeformableDETR support bfloat16 (#29232 ) * Update ms_deform_attn_cuda.cu * Update ms_deform_attn_cuda.cuh * Update modeling_deformable_detr.py * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update modeling_deformable_detr.py * python utils/check_copies.py --fix_and_overwrite * Fix dtype missmatch error * Update test_modeling_deformable_detr.py * Update test_modeling_deformable_detr.py * Update modeling_deformable_detr.py * Update modeling_deformable_detr.py * Support DeformableDETR with bfloat16 * Add test code * Use AT_DISPATCH_FLOATING_TYPES_AND2 Use AT_DISPATCH_FLOATING_TYPES_AND2 * Update tests/models/deformable_detr/test_modeling_deformable_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/deformable_detr/test_modeling_deformable_detr.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix not found require_torch_bf16 function --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-04 14:18:09 +00:00
Zach Mueller	1681a6d452	🚨 Fully revert atomic checkpointing 🚨 (#29370 ) Fully revert atomic checkpointing	2024-03-04 06:17:42 -05:00
Nick DeGroot	8ef9862864	Fix OneFormer `post_process_instance_segmentation` for panoptic tasks (#29304 ) * 🐛 Fix oneformer instance post processing when using panoptic task type * ✅ Add unit test for oneformer instance post processing panoptic bug --------- Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>	2024-03-04 11:04:49 +00:00
Fanli Lin	aade711d1e	[tests] enable automatic speech recognition pipeline tests on XPU (#29308 ) * use require_torch_gpu * enable on XPU	2024-03-04 08:24:38 +01:00
Zach Mueller	1a7c117df9	Fix deprecated arg issue (#29372 ) * Fix deprecated arg issue * Trainer check too * Check for dict or dataclass * Simplify, make config always AcceleratorConfig * Upstream to Trainer	2024-03-01 12:00:29 -05:00
Marc Sun	cec773345a	Fix llama + gemma accelete tests (#29380 )	2024-03-01 10:32:36 -05:00
amyeroberts	f1b1379f37	[`YOLOS`] Fix - return padded annotations (#29300 ) * Fix yolos processing * Add back slow marker - protects for pycocotools in slow * Slow decorator goes above copied from header	2024-03-01 09:42:13 +00:00
Sanchit Gandhi	0a0a279e99	🚨🚨[Whisper Tok] Update integration test (#29368 ) * [Whisper Tok] Update integration test * make style	2024-03-01 09:22:31 +00:00
Younes Belkada	50db7ca4e8	FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes (#29329 ) * fix ESM 8bit * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-01 03:01:53 +01:00
Yih-Dar	44fe1a1cc4	Avoid using uncessary `get_values(MODEL_MAPPING)` (#29362 ) * more fixes * more fixes --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-02-29 17:19:17 +08:00
Younes Belkada	b647acdb53	FIX [`CI`] `require_read_token` in the llama FA2 test (#29361 ) Update test_modeling_llama.py	2024-02-29 04:49:01 +01:00
Younes Belkada	8d8ac9c2df	FIX [`CI`]: Fix failing tests for peft integration (#29330 ) fix failing tests for peft integration	2024-02-29 03:56:16 +01:00
Younes Belkada	1aee9afd1c	FIX [`CI` / `starcoder2`] Change starcoder2 path to correct one for slow tests (#29359 ) change starcoder2 path to correct one	2024-02-29 03:52:13 +01:00
Arthur	8a8a0a4ae0	[`Llama ROPE`] Fix torch export but also slow downs in forward (#29198 ) * remove control flow * update gptneox * update .... * nits * Actually let's just break. Otherwise we are silently failing which imo is not optimal * version BC * fix tests * fix eager causal * nit * add a test * style * nits * nits * more nits for the test * update and fix * make sure cuda graphs are not skipped * read token is needed for meta llama * update! * fiixup * compile test should be slow * fix thet fix copies * stle 🫠	2024-02-28 10:45:53 +01:00
Younes Belkada	ad00c482c7	FIX [`Gemma` / `CI`] Make sure our runners have access to the model (#29242 ) * pu hf token in gemma tests * update suggestion * add to flax * revert * fix * fixup * forward contrib credits from discussion --------- Co-authored-by: ArthurZucker <ArthurZucker@users.noreply.github.com>	2024-02-28 06:25:23 +01:00
RaymondLi0	63caa370e6	Starcoder2 model - bis (#29215 ) * Copy model * changes * misc * fixes * add embed and residual dropout (#30) * misc * remove rms norm and gated MLP * remove copied mentions where its not a copy anymore * remove unused _shape * copied from mistral instead * fix copies * fix copies * add not doctested * fix * fix copyright * Update docs/source/en/model_doc/starcoder2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix doc * revert some changes * add fa2 tests * fix styling nit * fix * push dummy docs --------- Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-02-28 01:24:34 +01:00
Raushan Turganbay	ddf7ac4237	Token level timestamps for long-form generation in Whisper (#29148 )	2024-02-27 18:15:26 +00:00
Andrei Panferov	e3fc90ae68	Cleaner Cache `dtype` and `device` extraction for CUDA graph generation for quantizers compatibility (#29079 ) * input_layernorm as the beacon of hope * cleaner dtype extraction * AQLM + CUDA graph test * is available check * shorter text test	2024-02-27 09:32:39 +01:00
FredericOdermatt	871ba71dfa	GenerationConfig validate both constraints and force_words_ids (#29163 ) GenerationConfig validate both options for constrained decoding: constraints and force_words_ids	2024-02-27 01:43:52 +01:00

1 2 3 4 5 ...

3693 Commits