transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-18 12:08:22 +06:00

Author	SHA1	Message	Date
Sugawara	6daa9cb515	add GPTNeoXForSequenceClassification (#22671 ) * add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: #22561) * fix	2023-04-10 11:52:23 -04:00
xinhe	f74b40208d	use __func__ to check can_generate (#22643 )	2023-04-10 09:06:52 -04:00
Kirill	14fc1a2467	Fix quantization docs typo (#22666 )	2023-04-10 08:53:53 -04:00
Sylvain Gugger	3876fc6839	Make dynamic code work with offline mode (#22661 ) * Make dynamic code work with offline mode * Clean up * Quality	2023-04-10 08:49:42 -04:00
Shikhar Chauhan	98597725f1	(feat): Moving labels to same device as logits for Deit (#22679 )	2023-04-10 08:04:57 -04:00
Shahad Mahmud	870d91fb89	Model parallelism: Moving labels to the same device as logits for BridgeTower models (#22676 ) BrideTower Model parallelism logits device for loss calculation	2023-04-10 08:04:14 -04:00
Joel Lamy-Poirier	e0921c6b53	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 ) * Add model with cli tool * Remove unwanted stuff * Add new code * Remove inference runner * Style * Fix checks * Test updates * make fixup * fix docs * fix doc * fix test * hopefully fix pipeline tests * refactor * fix CIs * add comment * rename to `GPTBigCodeForCausalLM` * correct readme * make fixup + docs * make fixup * fixes * fixes * Remove pruning * Remove import * Doc updates * More pruning removal * Combine copies * Single MQA implementation, remove kv cache pre-allocation and padding * Update doc * Revert refactor to match gpt2 style * Merge back key and value caches, fix some type hints * Update doc * Fix position ids pith padding (PR 21080) * Add conversion script temporarily * Update conversion script * Remove checkpoint conversion * New model * Fix MQA test * Fix copies * try fix tests * FIX TEST!! * remove `DoubleHeadsModel` * add MQA tests * add slow tests * clean up * add CPU checker * final fixes * fixes - fix GPU issue - fixed slow tests - skip disk offload * fix final issue * Simplify and comment baddbmm fix * Remove unnecessary code * Transpose tweaks * Use beta=1 on cpu, improve tests --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com>	2023-04-10 10:57:21 +02:00
Arun Brahma	656e869a45	moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models (#22663 ) moved labels to the same device as logits	2023-04-07 17:04:54 -04:00
Sylvain Gugger	6db23af50c	Revert migration of setup to pyproject.toml (#22658 )	2023-04-07 15:08:44 -04:00
Joao Gante	3f96e0b4e4	Generate: add API warning to streamers (#22659 ) add API warning	2023-04-07 14:15:20 -04:00
Arthur	f33419261a	[OPT] Fix default attention mask size (#22649 ) * Fix default attention mask size * fixup * add a test to make sure that even if attention mask are not provided, works * style	2023-04-07 20:12:57 +02:00
Arthur	b1b3dc3e52	[tokenization] do not push special file (#22657 ) * do not push special file * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-04-07 20:12:36 +02:00
Arthur	117a0f6afa	Small nit, (#22653 ) * Small nit, Fixes #21986 * Update src/transformers/pipelines/__init__.py	2023-04-07 17:29:23 +02:00
Wonhyeong Seo	fc1ba6fd11	🌐 [i18n-KO] Translated `pipeline_tutorial.mdx` to Korean (#22508 ) docs: feat: Korean pipeline_tutorial Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Hyeonseo Yun <0525_hhgus@naver.com> Co-authored-by: gabrielwithappy <102908949+gabrielwithappy@users.noreply.github.com> Co-authored-by: Na Yeon Han <nayeon2.han@gmail.com>	2023-04-07 11:27:59 -04:00
Yih-Dar	14d5b2b645	Fix `MegaModel` CI (#22652 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-07 17:13:04 +02:00
Seung-Moo Yang	f2cc8ffdaa	Fix typo (#22650 )	2023-04-07 08:46:23 -04:00
Shikhar Chauhan	1de8ce9ee1	Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 (#22596 ) * (feat): Move labels to the same device as logits * Trigger CI * Trigger CI * Trigger CI * (feat): Making changes for Blip2	2023-04-07 08:23:55 -04:00
gabrielwithappy	d59034ff6f	🌐[i18n-KO] Translate `autoclass_tutorial` to Korean and Fix the typo of `quicktour` (#22533 ) translate the autoclass_tutorial and fix the typo of the quicktour	2023-04-07 08:12:35 -04:00
Sourab Mangrulkar	ee8e80a060	fix FSDP version related issues (#22489 ) fix fsdp	2023-04-07 04:25:19 +05:30
Yih-Dar	c7ec71baf5	Update tiny model summary file for recent models (#22637 ) * Update tiny model summary file for recent models --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-06 22:52:59 +02:00
Younes Belkada	ed67286465	[`Blip`] Fix slow tests and doctests with correct values (#22632 ) fix slow tests and doctests	2023-04-06 19:12:51 +02:00
Nicolas Patry	6a02e98074	LlamaTokenizerFast Fix (.., from_slow=True). (#22630 )	2023-04-06 18:52:59 +02:00
Younes Belkada	09a9888fe9	[`bnb`] 8bit models should not be converted to `DDP` (#22628 ) add safety checker	2023-04-06 18:09:24 +02:00
Yih-Dar	d0b83fe2e1	A script to add/update `pipeline_model_mapping` systematically (#22180 ) * Auto. add and update pipeline_model_mapping * Fix style and quality * Finalize (comments) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-06 18:08:14 +02:00
Yih-Dar	fa01127a67	update_pip_test_mapping (#22606 ) * Add TFBlipForConditionalGeneration * update pipeline_model_mapping * Add import * Revert changes in GPTSanJapaneseTest --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-06 17:56:06 +02:00
Connor Henderson	321b0908dd	docs: Fix broken link to generation strategies (#22623 ) fix broken link	2023-04-06 11:48:50 -04:00
Yih-Dar	2c22bc79c2	Make tiny model creation + pipeline testing more robust (#22500 ) * Final Tiny things --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-06 17:45:55 +02:00
amyeroberts	12d51db243	Backbone add mixin tests (#22542 ) * Add out_indices to backbones, deprecate out_features * Update - can specify both out_features and out_indices but not both * Add backbone mixin tests * Test tidy up * Add test_backbone for convnext * Remove redefinition of method * Update for Dinat and Nat backbones * Update tests * Smarter indexing * Add checks on config creation for backbone * PR comments	2023-04-06 13:50:15 +01:00
Joao Gante	48706c7178	Seq2SeqTrainer: use unwrapped model to retrieve the generation config (#22584 )	2023-04-06 13:29:58 +01:00
Nicolas Patry	0aa1153ffb	Revert error back into warning for byte fallback conversion. (#22607 )	2023-04-06 14:00:29 +02:00
Nicolas Patry	1670be4bde	Adding Llama FastTokenizer support. (#22264 ) * Adding Llama FastTokenizer support. - Requires https://github.com/huggingface/tokenizers/pull/1183 version - Only support byte_fallback for llama, raise otherwise (safety net). - Lots of questions are special tokens How to test: ```python from transformers.convert_slow_tokenizer import convert_slow_tokenizer from transformers import AutoTokenizer from tokenizers import Tokenizer tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b") if False: new_tokenizer = Tokenizer.from_file("tok.json") else: new_tokenizer = convert_slow_tokenizer(tokenizer) new_tokenizer.save("tok.json") strings = [ "This is a test", "生活的真谛是", "生活的真谛是[MASK]。", # XXX: This one is problematic because of special tokens # "<s> Something something", ] for string in strings: encoded = tokenizer(string)["input_ids"] encoded2 = new_tokenizer.encode(string).ids assert encoded == encoded2, f"{encoded} != {encoded2}" decoded = tokenizer.decode(encoded) decoded2 = new_tokenizer.decode(encoded2) assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}" ``` The converter + some test script. The test script. Tmp save. Adding Fast tokenizer + tests. Adding the tokenization tests. Correct combination. Small fix. Fixing tests. Fixing with latest update. Rebased. fix copies + normalized added tokens + copies. Adding doc. TMP. Doc + split files. Doc. Versions + try import. Fix Camembert + warnings -> Error. Fix by ArthurZucker. Not a decorator. * Fixing comments. * Adding more to docstring. * Doc rewriting.	2023-04-06 09:53:03 +02:00
Kaustubh	1564189298	feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart (#22591 )	2023-04-05 14:37:17 -04:00
Matt	e577bd0f13	Use native TF checkpoints for the BLIP TF tests (#22593 ) * Use native TF checkpoints for the TF tests * Remove unneeded exceptions	2023-04-05 18:43:14 +01:00
Younes Belkada	176ceff91f	Add DePlot + MatCha on `transformers` (#22528 ) * add deplot + matcha on `transformers` * more docs * correct path * Update docs/source/en/model_doc/deplot.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix * use auto processor * Update docs/source/en/model_doc/matcha.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make fixup * Update docs/source/en/model_doc/deplot.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * add correct names --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2023-04-05 17:43:48 +02:00
Nicolas Patry	126eafe396	Adding support for BPE merge creation from scores instead of ids. (#22582 ) * Adding support for BPE merge creation from scores instead of ids. * Revert warn -> raise. * Update src/transformers/convert_slow_tokenizer.py * Quality.	2023-04-05 16:03:06 +02:00
Matt	12f1a3bb3c	Fix a typo in one of the BLIP pretrained checkpoint names (#22588 ) Fixes a typo in one of the BLIP pretrained checkpoint names	2023-04-05 14:56:20 +01:00
Mikel Penagarikano	d5239bab5b	Sync preprocesses before loading the processor at run_speech_recognition_ctc.py (#21926 ) * Update run_speech_recognition_ctc.py Make sure all processes wait until data is saved before loading the processor from the output_dit * Make sure all processes wait until data is saved before loading the processor from the output_dit * Update run_speech_recognition_ctc.py * Update run_speech_recognition_seq2seq.py	2023-04-05 09:36:04 -04:00
Wonhyeong Seo	f49b0762a1	docs: ko: complete `_toctree.yml` (#22581 ) Co-authored-by: gabrielwithappy <102908949+gabrielwithappy@users.noreply.github.com>	2023-04-05 09:32:17 -04:00
Quentin Meeus	4861c25817	Add thousands separator in training summary (#22583 ) The logger prints a summary at the beginning of training that displays some info such as number of examples, number of parameters, total number of steps, etc. Those numbers can be quite large and difficult to read. I added a thousand separator to improve readability for the following: - num_examples - num_train_epochs - per_device_train_batch_size - total_train_batch_size - max_steps - num_trainable_params	2023-04-05 09:28:38 -04:00
Matt	2a91a9ef66	Fix PT-TF equivalence test for GPT1 (#22586 ) * Re-enable skipped test and fix the hidden state shape issue * Actually fix the bug instead of just doing something wrong	2023-04-05 13:16:00 +01:00
Joao Gante	0684284911	Tests: disable `accelerate_tests` mark warnings (#22585 )	2023-04-05 13:13:26 +01:00
Sylvain Gugger	6c640f098a	Move back doctest instructions to setup.cfg (#22587 )	2023-04-05 07:53:19 -04:00
Joao Gante	861ff890d6	Generate: `TextIteratorStreamer` timeout (#22576 )	2023-04-05 09:57:46 +01:00
Sylvain Gugger	11fd2c773b	Skip failing test	2023-04-04 21:26:17 -04:00
Matt	edb704b26e	Fix inverted conditional in TF common test! (#22540 ) * Fix inverted conditional in TF common test! * Make the same change in the PT tests file * Make sure hidden states for GPT2 have the same output shape in PT/TF * Minor fix to PT implementation of token classification loss * Skip loss equivalence test for TFHubert because it keeps overflowing to inf * Compute LM loss for TF the (weird) way it's computed in PT * Skip loss equivalence test for Wav2Vec2 for the same reason as Hubert * Fix - don't try to access the hidden states property when output is a tuple	2023-04-04 21:59:54 +01:00
Sourab Mangrulkar	48fbd8fa2e	fix `_no_split_modules` for Whisper model (#22486 )	2023-04-04 13:01:32 -04:00
Shubhamai	900677487d	Flax Regnet (#21867 ) * initial commit * review changes * post model PR merge * updating doc	2023-04-04 12:41:12 -04:00
Sun Haozhe	fc5b7419d4	corrected the code comment for the output of find_pruneable_heads_and_indices (#22557 ) * corrected/clarified the code comment of find_pruneable_heads_and_indices * have run make style	2023-04-04 11:29:42 -04:00
Matt	5f3ea66bc0	Add TF port of BLIP (#22090 ) * Initial commit * more stash commit * Yet another stash commit * yet more stash commit * Mostly working except for docs / repo consistency * Stop importing model list from torch file * Add TF BLIP models to docs * Add auto classes * Move get_text_features and get_image_features * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/models/blip/test_modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use channels_last convolutions in TF (better performance + compatibility) * Remove _shape function * Move multi-line statement to one line in PT + TF * Specify tf.keras.layers instead of importing from it * Remove test_gradient_checkpointing and empty test_training methods * move some multi-line statements to one line * Update docstring for generate * Remove pruned heads set * Remove self.seq_len_dim * Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states * ensure original model follows config in more cases * Skip the same cross-attention tests in the PT tests - didn't realize we did it twice! * Add training args throughout the models and layers * make fixup * Fix docstring for inputs_embeds * Add docstring for is_decoder * Add docstrings to text models * Remove redundant computation * Add unpack_inputs / keras_serializable * Add modeling_tf_blip to doctests * Add config classes for keras serialization * Changes to allow model porting with pt-to-tf * Quick fix to decoder head and test tweaks * Revert an issue with masking the embeddings outputs * Allow missing keys in some equivalence tests (for unused layers) * Add tf-pt equivalence tests back in * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make fixup * Refactor invert_attention_mask out into tf_utils * Re-enable cross-tests on the PT side too --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-04-04 16:05:22 +01:00
Nicolas Patry	a515d0a77c	Soft error whisper. (#22475 ) * Soft error whisper. * Fix format. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-94.taildb5d.ts.net>	2023-04-04 16:21:57 +02:00

... 49 50 51 52 53 ...

15053 Commits