transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-18 20:18:24 +06:00

Author	SHA1	Message	Date
Nicola Procopio	dd3a0580a6	Added big_models.mdx italian translation #17600 (#22115 ) * updated toctree * italian translation big_model.mdx * italian translation big_models	2023-03-13 10:02:03 -04:00
Karim Foda	0768c5e274	Fix gradient checkpointing bug in xlm_roberta_xl (#22128 )	2023-03-13 13:52:34 +00:00
Karim Foda	4c14c1f47b	Fix gradient checkpointing bug in Trajectory Transformer (#22125 )	2023-03-13 13:50:02 +00:00
Karim Foda	d0876a095f	Fix gradient checkpointing bug in xglm (#22127 )	2023-03-13 13:49:23 +00:00
Alex Calabrese	0c883766bd	Add pr_checks.mdx Italian translation (#17459 ) (#22116 ) * Add pr_checks.mdx Italian translation (#17459) * Updated pr_checks.mdx Italian translation (#17459)	2023-03-13 09:24:34 -04:00
wangpeng	102b5ff4a8	add new model of MGP-STR (#21418 ) * add new model of MGP-STR * fix the check failings * remove torch and numpy from mgp_tokenization * remove unused import from modeling_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str.py * add test_processing_mgp_str * add test_processing_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str and add softmax outs to model * rm test_processing_mgp_str and add softmax outs to model * rewrite the code of mgp-str according to PR suggestions * rewrite the code of mgp-str according to PR suggestions * add new model of MGP-STR * fix the check failings * remove torch and numpy from mgp_tokenization * remove unused import from modeling_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str.py * add test_processing_mgp_str * add test_processing_mgp_str * add test_processing_mgp_str * rm test_processing_mgp_str and add softmax outs to model * rewrite the code of mgp-str according to PR suggestions * rewrite the code of mgp-str according to PR suggestions * remove representation_size from MGPSTRConfig * reformat configuration_mgp_str.py * format test_processor_mgp_str.py * add test for tokenizer and complete model/processer test and model file * rm Unnecessary tupple in modeling_mgp_str * reduce hidden_size/layers/label_size in test_model * add integration tests and change MGPSTR to Mgpstr * add test for logit values * reformat test model file --------- Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>	2023-03-13 10:11:31 +00:00
Alara Dirik	32e3466d38	Add AutoModelForZeroShotImageClassification (#22087 ) Adds AutoModelForZeroShotImageClassification to transformers	2023-03-13 12:46:14 +03:00
Sanchit Gandhi	b90fbc7e0b	[Whisper] Remove embed_tokens from encoder docstring (#21996 ) * [Whisper] Remove embed_tokens from encoder docstring * new line to retrigger CI * remove new line	2023-03-11 14:03:36 +01:00
Yih-Dar	2f320661f3	Revert "[GPT2] Propose fix for #21080 " (#22093 ) Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure This reverts commit `a3fef89b26`.	2023-03-10 22:08:21 +01:00
Sylvain Gugger	499770c088	Fix imports of TF MobileViT (#22065 ) * Fix imports of TF MobileViT * Fix copies	2023-03-10 14:46:34 -05:00
Maria Khalusova	bdec2768bd	GPT-J specific half precision on CPU note (#22086 ) * re: #21989 * update re: #21989 * removed cpu option * make style	2023-03-10 14:03:43 -05:00
Dean Wyatte	2f4cdd97f5	handle numpy inputs in whole word mask data collator (#22032 )	2023-03-10 10:50:29 -05:00
J-shang	a70da86b84	Fix hint in src/transformers/modeling_utils.py (#22074 ) fix hint	2023-03-10 08:56:42 -05:00
Karim Foda	419d979f7f	Fix gradient checkpointing bug in Speecht5 (#22080 ) * Fix gradient checkpointing bug in Speecht5 * Update modeling_speech_to_text.py * Update src/transformers/models/speech_to_text/modeling_speech_to_text.py * Fix change errors --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2023-03-10 13:36:09 +00:00
Joao Gante	7014fc360d	Generate - Fix broken documentation links (#22078 ) fix broken links	2023-03-10 13:28:30 +00:00
Kevin Jiang	ade26bf991	Fix small typo in flan-ul2.mdx (#22068 ) * Update flan-ul2.mdx * Update flan-ul2.mdx	2023-03-10 07:44:45 -05:00
Arthur	a3fef89b26	[GPT2] Propose fix for #21080 (#21853 ) * Make sure position ids are masked * test that padded input produce the same results * fix failing tests * fixup * fix batch test	2023-03-10 07:15:25 -05:00
Karim Foda	eee195b3aa	Fix gradient checkpointing bug in switch transformer (#22081 )	2023-03-10 11:31:08 +00:00
Karim Foda	b9273353dc	Fix gradient checkpointing bug in Speech2Text (#22079 ) * Fix gradient checkpointing bug in Speech2Text * Update modeling_speech_to_text.py * Update modeling_speech_to_text_2.py	2023-03-10 11:30:42 +00:00
Sylvain Gugger	a9bd5df16a	Add a progress bar for the total download of shards (#22062 ) * Add a progress bar for the total download of shards * Check for no cache at all * Fix check	2023-03-09 16:58:03 -05:00
aws-sangeetha	1a5fc300f4	Fix case when using --gradient_accumulation_steps with DDP disabled. (#22007 ) Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>	2023-03-09 14:31:58 -05:00
Yih-Dar	6d9031f285	Update tiny model creation script (#22058 ) Update the script Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-09 19:53:54 +01:00
Sylvain Gugger	7a2b915e92	Add setters by type of args to TrainingArguments (#21570 ) * Add setters by type of args to TrainingArguments * Define more setters	2023-03-09 13:13:23 -05:00
Yih-Dar	ab81d31d20	Skip 3 tests for `WhisperEncoderModelTest` (#22060 ) * skip 3 tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-09 19:09:23 +01:00
Jiali Mei	8434cb878e	Edit the docstring of `image_processing_donut` to match code (#22033 ) * Edit the docstring of `image_processing_donut` to match code * improve style * more style improvement after installing quality	2023-03-09 17:35:43 +00:00
Stas Bekman	ec24132b6c	[deepspeed] offload + non-cpuadam optimizer exception (#22043 ) * [deepspeed] offload + non-cpuadam optimizer exception * flip * revert min version	2023-03-09 08:12:57 -08:00
Kamal Raj Kanakarajan	d0c19b3303	rm $ symbol from code block from contributing.md (#22057 ) rm $ symbol from code block Removed the $ symbol from the code block to make copy-pasting easier.	2023-03-09 11:09:46 -05:00
Matt	fdf8409656	pt-to-tf model architecture override (#22055 ) * Add an argument to pt-to-tf to allow overriding the model class * make fixup * Minor fix to error message * Remove unused extra conversion from the script	2023-03-09 15:36:29 +00:00
anruijian	04bfac83b7	Return analysis for hyperparameter_search with Ray backend (#22040 ) * return analysis for hyperparameter_search with ray backend * Revert "return analysis for hyperparameter_search with ray backend" This reverts commit `cd51790709`. * add run_summary attribute to BestRun and return analysis for ray backend * fix typo * add doc for run_summary for ray backend	2023-03-09 09:44:17 -05:00
Yih-Dar	90a7c95496	Show the number of `huggingface_hub` warnings in CI report (#22054 ) * show hfh warnings --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-09 15:39:05 +01:00
Lucain	923110b74f	Remove set_access_token usage + fail tests if FutureWarning (#22051 ) * Remove set_access_token usage + fail tests if FutureWarning * do not fail on FutureWarning in CI --------- Co-authored-by: testbot <lucainp@hf.co>	2023-03-09 09:23:48 -05:00
Shaun VanWeelden	684774306d	Can't install tf2 on M1 Chip by default (#22046 )	2023-03-09 07:44:58 -05:00
Shaun VanWeelden	81cd655cab	Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it (#22045 ) In ZSH, not using ' ' around pip install fails Running ``` pip install transformers[torch] ``` in the default ZSH terminal will fail with the error `zsh: no matches found: transformers[torch]` The solution is to wrap the installation path in ' ' like ``` pip install 'transformers[torch]' ``` Relevant StackOverflow: https://stackoverflow.com/questions/30539798/zsh-no-matches-found-requestssecurity	2023-03-09 07:43:49 -05:00
Nipun Jindal	1a77a1a86f	[21737][T5]: Fix gradient checkpoint bug (#22036 ) * [21737][T5]: Fix gradient checkpoint bug * [21737][T5]: Fix gradient checkpoint bug * [21737][T5]: Fix gradient checkpoint bug * Update src/transformers/models/mt5/modeling_mt5.py * Update src/transformers/models/t5/modeling_t5.py --------- Co-authored-by: njindal <njindal@adobe.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2023-03-09 12:17:44 +00:00
Alara Dirik	2055d737ad	Update ALIGN docs (#22025 ) * Fix typos and add code examples, resources	2023-03-09 14:12:17 +03:00
Ceyda Cinarel	3ec8171bed	Bug fix: token classification pipeline while passing offset_mapping (#22034 ) fix slow tokenizers with passing offset_mapping	2023-03-08 16:21:46 -05:00
Yih-Dar	1cbac6867b	Mark all `BridgeTower` tests slow for now (#22039 ) * slow me --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-08 21:48:29 +01:00
Yih-Dar	bcc8d30aff	Avoid `text_config_dict` and `vision_config_dict` being saved for CLIP-like models (#22035 ) * Avoid text_config_dict and vision_config_dict being saved * for other CLIP-like models --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-08 20:27:30 +01:00
Somasree Majumder	998395061b	fixes the gradient checkpointing of whisper (#22019 ) * fixing * Update modeling_whisper.py * Update modeling_whisper.py * Update src/transformers/models/whisper/modeling_whisper.py --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2023-03-08 14:21:38 -05:00
bofeng huang	6192549c1f	[examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py (#21942 ) * Add specaugment to run_speech_recognition_seq2seq.py * Remove useless argument: text_column * Fix quality * Update return_attention_mask condition * Update specaugment arguments only for whisper models * Remove SpecAugment arguments from ModelArguments, only leave default values for simplicity * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update apply_spec_augment only for whisper models * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Rename return_attention_mask to forward_attention_mask to avoid confusion with wav2vec2 models --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-03-08 17:59:31 +01:00
anruijian	b427b263e2	Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline (#22031 ) add tokenize_kwargs doc in the FeatureExtractionPipeline	2023-03-08 11:43:31 -05:00
Sylvain Gugger	a5392ee747	Fix test for torchneuroncore in Trainer (#22028 )	2023-03-08 09:12:43 -05:00
Anahita Bhiwandiwalla	de81adf978	[WIP] Add BridgeTowerForContrastiveLearning (#21964 ) * Add BridgeTower for ITC * Fix review feedback * Rename BridgeTowerForITC, cleanup * Fix style and quality * implement tests --------- Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com> Co-authored-by: Tiep Le <tiep.le@intel.com>	2023-03-08 09:00:54 -05:00
Younes Belkada	edea08a6b0	[`bnb`] Fix bnb error message (#22026 ) * fix error message * make style	2023-03-08 14:51:44 +01:00
Yih-Dar	dfe9a31973	Update `AudioClassificationPipelineTests::test_small_model_pt` for PT 2.0.0 (#22023 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-08 13:56:47 +01:00
Qiushi	bbd949970d	update: bertology paper (#22012 )	2023-03-08 07:54:30 -05:00
amyeroberts	4130e70367	VideoMAE doctest - use valid dummy pixel values (#22022 ) Use valid dummy pixel values	2023-03-08 11:54:42 +00:00
jim	c1f85598eb	Generate - add 1 to cur_len to make up the new beam length (#21993 ) * add 1 to cur_len to make up the new beam length cur_len is 1 token shorter comparing to the length of the sequence whose best_sum_logprobs is the numerator. * cur_len+=1 before check if beam hyp is done * format code * reformat with black --------- Co-authored-by: Chiming <chiming@biomap.com>	2023-03-08 11:47:55 +00:00
Yih-Dar	b338414e61	Update tiny model creation script and some others files (#22006 ) * Update 1 * Update 2 * Update 3 * Update 4 * Update 5 * Update 6 * Update 7 * Update 8 * Update 9 * Update 10 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-07 22:31:14 +01:00
Eli Simhayev	8abe4930d3	[Time-Series] informer model (#21099 ) * added informer to gitignore * added informer to gitignore * WIP informer2020 * added checking that instantiate works * added config using gluonTS by kashif * WIP config * adding informeConfig. need to remove FeatureEmbedder * done InformerConfig, but need to change the names * Done informer model init. working on enc-dec * added things to address, after reading again enc-dec in the paper * done modeling - checking initialization work * added informer to gitignore * WIP informer2020 * added checking that instantiate works * added config using gluonTS by kashif * WIP config * adding informeConfig. need to remove FeatureEmbedder * done InformerConfig, but need to change the names * Done informer model init. working on enc-dec * added things to address, after reading again enc-dec in the paper * done modeling - checking initialization work * moved enc-dec init to InformerEncoder/Decoder init * added 'init_std' to config, now model init works! * WIP conversion script, and added code sources * WIP conversion script: loading original informer pth works * WIP conversion script: change defaults in the config * WIP conversion script: supporting Informer input embedding * WIP conversion script: added parameters for the informer embed * WIP conversion script: change dim_feedforward=2048 * WIP conversion script: remove unused args for loading checkpoint * just cleaning up * DataEmbedding removed, after thinking with Kashif * working on forward pass * WIP forward pass: trying to establish working batch for forward pass * cleaning and finalizing * adding HF names and docs * init after cleaning works * WIP in tests * added docs for the informer specific args * fix style * undo change * cleaning informer, now need to work only enc-dec * initial enc-dec classes * added encoder and decoder * added todo * add todos for conv_layers * added decoder docs from vanilla * added encoder docs from vanilla * remove encoder decoder from the original informer * removed AttentionLayer from the original paper * removed TriangularCausalMask, same as decoder_attention_mask * initial sparse attention * use conv_layers * fixed test_config test * fix parenthesis when itearting zip(layers, conv_layers) * error found in prob attention, added sizes as comments * fix sizes * added proposal for q_reduce indexing, and remove unused * WIP ProbMask, and changed factor=2 for testing * remove unused libs for this PR for creating the env * fix checking the attn_weights.size() after bmm * Q_reduce: changed from torch.gather to simple slicing * WIP calculate final attn_output * finish adding v_aggregated, attn_output ready * changed tgt_len to u in attention_mask, need to fix the size error * comment attention_mask for encoder, and fix if cond for v_agg * added ProbMask support (wip), removed old original code * finished ProbMask 😃 * Revert "remove unused libs for this PR for creating the env" This reverts commit `11a081e09e`. * fixes * make style * fix initial tests * fix more tests * dry * make style * remove unused files * style * added integration tests * fix num_static_real_features * fix header * remove unused function * fix example * fix docs * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/modeling_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/informer/configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fixes for reviewer * use prediction_length from model * fix style * fixed informer.mdx * added to index * updated readme * undo * make fix-copies * typo * fix copy * added Informer to toctree * in order * fixed comments * remove unneeded new lines in docs * make static real and cat optional * fix use of distil conv layers * fixed integration test * added checkpoint for convlayer * make fix-copies * updated from time series model * make fix-copies * copy decoder * fix unit tests * updated scaling config * fix integration tests * IGNORE_NON_TESTED * IGNORE_NON_AUTO_CONFIGURED * IGNORE_NON_AUTO_CONFIGURED * updated check configs * fix formatting * undo change from time series * prediction_length should not be None * aliign with the blog: prettify ProbSparse and change attention_factor to sampling_factor * make style * make fix-copies * niels CR: update contributed by * niels CR: update configuration_informer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * niels CR: update kashif -> huggingface Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * niels CR: `sampling_factor` only relevant when `attention_type`=prob * make style * fixed U_part: added multiplication by `L_Q` * fixed bug: remove `is not None` from `if config.distil` * fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check * fix integration tests * updated model hub * do not shift as in training * undo * fix make-copies * make fix-copies * added `if prediction_length is None` * changed `ProbSparseAttention` to `InformerProbSparseAttention` * changed `V_sum` -> `v_mean_dim_time` * changed `ConvLayer` to `InformerConvLayer` and fixed `super()` * TimeSeriesTansformer->Informer in decoder's Copied from * more descriptive in ProbSparse * make style * fix coped from * Revert "added `if prediction_length is None`" This reverts commit `b4cbddfa05`. * fixed indent * use InformerSinusoidalPositionalEmbedding * make fix-style * fix from #21860 * fix name * make fix-copies * use time series utils * fix dec num_heads * docstring * added time series util doc * _import_structure * formatting * changes from review * make style * fix docs * fix doc * removed NegativeLogLikelihood --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2023-03-07 21:36:38 +01:00

... 54 55 56 57 58 ...

15053 Commits