transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 19:21:31 +06:00

Author	SHA1	Message	Date
Sayak Paul	561b9a8c00	[SegFormer] TensorFlow port (#17910 ) * add: segformer utils and img. classification. * add: segmentation layer. * feat: working implementation of segformer. * chore: remove unused variable. * add test, remaining modifications. * remove: unnecessary files. * add: rest of the files. Co-authored-by: matt <rocketknight1@gmail.com> * chore: remove ModuleList comment. * chore: apply make style. * chore: apply make fixup-copies. * add to check_repo.py * add decode head to IGNORE_NON_TESTED * chore: run make style. * chore: PR comments. * chore: minor changes to model doc. * tests: reduction across samples. * add a note on the space. * sort importats. * fix: reduction in loss computation. * chore: align loss function with that of NER. * chore: correct utils/documentation_tests.txt Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * chore: simplify the interpolation of logits in loss computation. * chore: return transposed logits when return_dict=False. * chore: add link to the tf fine-tuning repo. * address pr comments. * address niels's comments. * remove from_pt=True since tf weights are in. * remove comment from pt model. * address niels's comments. Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2022-07-21 18:22:37 +01:00
Yih-Dar	9edff45362	skip some test_multi_gpu_data_parallel_forward (#18188 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-20 15:54:44 +02:00
Raghavan	dcec4c4387	Adding OPTForSeqClassification class (#18123 ) * Adding OPTForSeqClassification class * Fix import issues * Add documentation for optforseqclassification * Remove checkout * fix failing tests * fix typo * Fix code formatting * Incorporating the PR feedbacks * Incorporate PR Feedbacks * Fix failing test and add new test for multi label setup * Fix formatting issue * Fix failing tests * Fix formatting issues * Fix failing tests * Fix failing tests * Fix failing tests * Fix failing tests * PR feedback	2022-07-20 10:14:21 +02:00
Younes Belkada	6a1b1bf7a6	BLOOM minor fixes small test (#18175 ) * minor fixes - add correct revision - corrected dosctring for test - removed a test * contrib credits Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>	2022-07-18 19:18:19 +02:00
Yih-Dar	6561fbcc6e	Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests (#18073 ) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-18 15:29:14 +02:00
Yih-Dar	cb19c2afdc	Fix expected loss values in some (m)T5 tests (#18177 ) * fix expected loss values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-18 15:26:21 +02:00
Lysandre Debut	c1c79b0655	NLLB tokenizer (#18126 ) * NLLB tokenizer * Apply suggestions from code review - Thanks Stefan! Co-authored-by: Stefan Schweter <stefan@schweter.it> * Final touches * Style :) * Update docs/source/en/model_doc/nllb.mdx Co-authored-by: Stefan Schweter <stefan@schweter.it> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * PR reviews * Auto models Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-18 08:12:34 -04:00
amyeroberts	8581a798c0	Add TF DeiT implementation (#17806 ) * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Fixup * Fix import * Fix import * Fix import * Fix weight loading for tests whilst not on hub * Add doc tests and remove to_2tuple * Add back to_2tuple Removing to_2tuple results in many downstream changes needed because of the copies checks * Incorporate updates in Improve vision models #17731 PR * Don't hard code num_channels * Copy PyTorch DeiT embeddings and remove pytorch operations with mask * Fix patch embeddings & tidy up * Update PixelShuffle to move logic into class layer * Update doc strings - remove PT references * Use NHWC format in internal layers * Fix up * Use linear activation layer * Remove unused import * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Move dataclass to top of file * Remove from_pt now weights on hub * Fixup Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>	2022-07-13 18:04:08 +01:00
Sijun He	d4ebd4e112	speed up test (#18106 )	2022-07-12 04:28:28 -04:00
Younes Belkada	a462fc9232	Bloom Optimize operations (#17866 ) * fix tolerance for a bloom slow test * enhance alibi padding - get rid of for loops - deals better with padded batched input - avoid useless cpu/gpu communication when creating alibi Co-authored-by: justheuristic <justheuristic@gmail.com> * optimize attention mask * fix scaled softmax limit values * optimize building alibi tensor Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix attention_mask shape when it's None * minor fixes - fix docstring + arg names * remove colons in docstring * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * apply suggestion * remove unsued arg * refactor a bit - use [:, None] for consistency * refactor attention block Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> * quick fixes * first attempt * refactor attention block and fix all tests except "test_simple_generation" - added comments to better explain attention block * remove debug lines and add TODO comment * change `torch.bmm` to `torch.baddbmm` - fixes `test_simple_generation`but breaks `test_batch_generation_padd` * styling * all tests are passing now - use `bmm` - add explanation for `allow_fp16_reduced_precision_reduction` Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * styling Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix support for accelerate Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove attn softmax in fp32 * refactor comments * refactor a bit - remove warning message - remove print on test * refer to pytorch t5 * change the slow tests - do the tests in fp32 - remove some comments - keep large comments * update expected output for `test_simple_generation` - we now test using fp32 * make style + change comments a bit * fix dtype padd test Co-authored-by: justheuristic <justheuristic@gmail.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-11 13:16:13 -04:00
Sylvain Gugger	5ff6f853d7	Mark slow test as such	2022-07-11 12:48:57 -04:00
Yulv-git	95113d1365	Fix some typos. (#17560 ) * Fix some typos. Signed-off-by: Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by: Yulv-git <yulvchi@qq.com> * make fixup.	2022-07-11 05:00:13 -04:00
Patrick von Platen	2544c1434f	[Generate Tests] Make sure no tokens are force-generated (#18053 )	2022-07-07 15:08:34 +02:00
Joao Gante	360719a6a4	TF: GPT-J compatible with XLA generation (#17986 )	2022-07-06 15:02:07 +01:00
Matt	5ae087cf8e	Fix T5/mT5 tests (#18029 )	2022-07-05 16:22:03 +01:00
Yih-Dar	97db5b4223	Update expected values in DecisionTransformerModelIntegrationTest (#18016 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-05 14:53:43 +02:00
Joao Gante	f0982682bd	TF: T5 can now handle a padded past (i.e. XLA generation) (#17969 ) * get the right slicing index for position_bias	2022-07-04 19:47:43 +01:00
Matt	96d833b211	Return scalar losses instead of per-sample means (#18013 ) * Return scalar losses instead of per-sample means * Make loss shape (1,) instead of scalar * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Remove XLA loss function for RAG	2022-07-04 17:26:19 +01:00
amyeroberts	77ea5130a1	Add TF ResNet model (#17427 ) * Rought TF conversion outline * Tidy up * Fix padding differences between layers * Add back embedder - whoops * Match test file to main * Match upstream test file * Correctly pass and assign image_size parameter Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Add in MainLayer * Correctly name layer * Tidy up AdaptivePooler * Small tidy-up More accurate type hints and remove whitespaces * Change AdaptiveAvgPool Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. `9e26607e22 (r900109509)` Co-authored-by: From: matt <rocketknight1@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Use updated AdaptiveAvgPool Co-authored-by: matt <rocketknight1@gmail.com> * Make AdaptiveAvgPool compatible with CPU * Remove image_size from configuration * Fixup * Tensorflow -> TensorFlow * Fix pt references in tests * Apply suggestions from code review - grammar and wording Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Add TFResNet to doc tests * PR comments - GlobalAveragePooling and clearer comments * Remove unused import * Add in keepdims argument * Add num_channels check * grammar fix: by -> of Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Remove transposes - keep NHWC throughout forward pass * Fixup look sharp * Add missing layer names * Final tidy up - remove from_pt now weights on hub Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2022-07-04 10:59:15 +01:00
Matt	d6cec45801	XLA train step fixes (#17973 ) * Copy inputs to train and test step before modifying them, as this breaks things * Add XLA tests, fix our loss functions to be XLA-compatible * make fixup * Update loss computation test to expect vector of per-sample losses * Patch loss for TFLED * Patch loss for TFAlbert * Add a tf_legacy_loss config flag that enables old loss functions * Stop using config.get() because it's not a dict * Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it * make fixup * Add XLA-compatible RAG loss * Fix dtype of loss mask for TFAlbert * Fix test for XLNet too because it overrides the default one * make fixup * Fix config test * No more depending on GPU NaN behaviour * Add test, avoid potential zero division * Fix test item assignment * Fix loss computation masking test * make fixup * Fix dtype bugs	2022-07-01 19:11:14 +01:00
Yih-Dar	569b679adb	Update expected values in CodeGen tests (#17888 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-01 15:33:36 +02:00
Yih-Dar	14fb8a63b9	skip some gpt_neox tests that require 80G RAM (#17923 ) * skip some gpt_neox tests that require 80G RAM * remove tests * fix quality Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-01 09:04:38 -04:00
Jason Phang	205bc4152c	Fix GPT-NeoX-20B past handling, attention computation (#17811 ) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests	2022-06-30 08:47:40 -04:00
Crystina	692e61e91a	Flax t5 Encoder (#17784 ) * first draft adding Flax-t5-encoder and Flax-mt5-encoder * imports * after make fixup * flax t5 encoder test * black on test * make fix-copies * clean * all_model_classes -> tuple * clean test * is_encoder_decoder=False in t5-enc tester * remove file docstring before FlaxT5Encoder * black * isort * commit suggestions on src/transformers/models/t5/modeling_flax_t5.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * commit suggestions on src/transformers/models/t5/modeling_flax_t5.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * remove _get_encoder_module * self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder * bugfix - self.module_class is class itself, not instance; * docs for mt5 and t5 * call -> __call__ in t5 doc * FlaxMT5EncoderModel to TYPE_HINT * run doc-builder to allow change the files Co-authored-by: Suraj Patil <surajp815@gmail.com>	2022-06-30 00:49:02 +02:00
Matthijs Hollemans	fbc7598bab	add MobileViT model (#17354 ) * add MobileViT * fixup * Update README.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * remove empty line Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * use clearer variable names * rename to MobileViTTransformerLayer * no longer inherit from nn.Sequential * fixup * fixup * not sure why this got added twice * rename organization for checkpoints * fix it up * Update src/transformers/models/mobilevit/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/models/mobilevit/test_modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * code style improvements * fixup * Update docs/source/en/model_doc/mobilevit.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * download labels from hub * rename layers * rename more layers * don't compute loss in separate function * remove some nn.Sequential * replace nn.Sequential with new MobileViTTransformer class * replace nn.Sequential with MobileViTMobileNetLayer * fix pruning since model structure changed * fixup * fix doc comment * remove custom resize from feature extractor * fix ONNX import * add to doc tests * use center_crop from image_utils * move RGB->BGR flipping into image_utils * fix broken tests * wrong type hint * small tweaks Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-06-29 16:07:51 -04:00
Younes Belkada	d444edb3f6	OPT - Fix Softmax NaN in half precision mode (#17437 )	2022-06-29 19:15:32 +02:00
StevenTang1998	3cff4cc587	Add MVP model (#17787 ) * Add MVP model * Update README * Remove useless module * Update docs * Fix bugs in tokenizer * Remove useless test * Remove useless module * Update vocab * Remove specifying * Remove specifying * Add #Copied ... statement * Update paper link * Remove useless TFMvp * Add #Copied ... statement * Fix style in test mvp model * Fix some typos * Fix properties of unset special tokens in non verbose mode * Update paper link * Update MVP doc * Update MVP doc * Fix README * Fix typos in docs * Update docs	2022-06-29 09:30:55 -04:00
Aritra Roy Gosthipaty	a7eba83161	TF implementation of RegNets (#17554 ) * chore: initial commit Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets. * chore: porting the rest of the modules to tensorflow did not change the documentation yet, yet to try the playground on the model * Fix initilizations (#1) * fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn. * chore: styling nits. * fix: cross-loading bn params. * fix: regnet tf model, integration passing. * add: tests for TF regnet. * fix: code quality related issues. * chore: added rest of the files. * minor additions.. * fix: repo consistency. * fix: regnet tf tests. * chore: reorganize dummy_tf_objects for regnet. * chore: remove checkpoint var. * chore: remov unnecessary files. * chore: run make style. * Update docs/source/en/model_doc/regnet.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * chore: PR feedback I. * fix: pt test. thanks to @ydshieh. * New adaptive pooler (#3) * feat: new adaptive pooler Co-authored-by: @Rocketknight1 * chore: remove image_size argument. Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: matt <rocketknight1@gmail.com> * Empty-Commit * chore: remove image_size comment. * chore: remove playground_tf.py * chore: minor changes related to spacing. * chore: make style. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <aeroberts4444@gmail.com> * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <aeroberts4444@gmail.com> * chore: refactored __init__. * chore: copied from -> taken from./g * adaptive pool -> global avg pool, channel check. * chore: move channel check to stem. * pr comments - minor refactor and add regnets to doc tests. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * minor fix in the xlayer. * Empty-Commit * chore: removed from_pt=True. Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: amyeroberts <aeroberts4444@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2022-06-29 13:45:14 +01:00
Joao Gante	e6d27ca5c8	TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible (#17857 ) * working beam search 🎉 * XLA generation compatible with ALL classes * add xla generation slow test	2022-06-29 12:41:01 +01:00
Jerry Jiarui XU	6c8f4c9a93	Adding GroupViT Models (#17313 ) * add group vit and fixed test (except slow) * passing slow test * addressed some comments * fixed test * fixed style * fixed copy * fixed segmentation output * fixed test * fixed relative path * fixed copy * add ignore non auto configured * fixed docstring, add doc * fixed copies * Apply suggestions from code review merge suggestions Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * resolve comment, renaming model * delete unused attr * use fix copies * resolve comments * fixed attn * remove unused vars * refactor tests * resolve final comments * add demo notebook * fixed inconsitent default * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * rename stage->stages * Create single GroupViTEncoderLayer class * Update conversion script * Simplify conversion script * Remove cross-attention class in favor of GroupViTAttention * Convert other model as well, add processor to conversion script * addressing final comment * fixed args * Update src/transformers/models/groupvit/modeling_groupvit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-06-28 20:51:47 +02:00
Yih-Dar	9a3453846b	fix (#17890 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-27 14:36:11 +02:00
Matt	ee0d001de7	Add a TF in-graph tokenizer for BERT (#17701 ) * Add a TF in-graph tokenizer for BERT * Add from_pretrained * Add proper truncation, option handling to match other tokenizers * Add proper imports and guards * Add test, fix all the bugs exposed by said test * Fix truncation of paired texts in graph mode, more test updates * Small fixes, add a (very careful) test for savedmodel * Add tensorflow-text dependency, make fixup * Update documentation * Update documentation * make fixup * Slight changes to tests * Add some docstring examples * Update tests * Update tests and add proper lowercasing/normalization * make fixup * Add docstring for padding! * Mark slow tests * make fixup * Fall back to BertTokenizerFast if BertTokenizer is unavailable * Fall back to BertTokenizerFast if BertTokenizer is unavailable * make fixup * Properly handle tensorflow-text dummies	2022-06-27 12:06:21 +01:00
Yih-Dar	401fcca6c5	Fix TF GPT2 test_onnx_runtime_optimize (#17874 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-27 09:27:30 +02:00
Yih-Dar	b03be78a4b	Fix `test_inference_instance_segmentation_head` (#17872 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-24 19:36:45 +02:00
Yih-Dar	494aac65a7	Skip `test_multi_gpu_data_parallel_forward` for `MaskFormer` (#17864 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-24 19:35:00 +02:00
Yih-Dar	0e0f1f4692	Use higher value for hidden_size in Flax BigBird test (#17822 ) * Use higher value for hidden_size in Flax BigBird test * remove 5e-5 Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-24 19:31:30 +02:00
rooa	d6b6fb9963	Add CodeGen model (#17443 ) * Add CodeGen model * Add missing key and switch order of super() * Fix torch.ones init with uint8 instead of bool * Address comments: copy statements and doc * update tests * remove old model parallel * fix batch gen tests * fix batch gen test * update test_gpt2_sample_max_time * fix codgen test and revert gpt2 test change * Fix incorrect tie_word_embedding value, typo, URL * Fix model order in README and styling * Reorder model list alphabetically * Set tie_word_embedding to False by default * Apply suggestions from code review * Better attn mask name & remove attn masked_bias * add tokenizer for codegen * quality * doc tokenizer * fix-copies * add CodeGenTokenizer in converter * make truncation optional * add test for truncation * add copyright * fix-copies * fix fast tokenizer decode * Update src/transformers/models/codegen/tokenization_codegen.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * increase vocab_size in tests Co-authored-by: patil-suraj <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-06-24 17:10:38 +02:00
Yih-Dar	447490015a	Fix Splinter test (#17854 ) * fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-24 16:26:14 +02:00
Suraj Patil	73a0496c2f	[tests/VisionEncoderDecoder] import to_2tuple from test utils (#17865 )	2022-06-24 15:23:30 +02:00
NielsRogge	0917870510	Improve vision models (#17731 ) * Improve vision models * Add a lot of improvements * Remove to_2tuple from swin tests * Fix TF Swin * Fix more tests * Fix copies * Improve more models * Fix ViTMAE test * Add channel check for TF models * Add proper channel check for TF models * Apply suggestion from code review * Apply suggestions from code review * Add channel check for Flax models, apply suggestion * Fix bug * Add tests for greyscale images * Add test for interpolation of pos encodigns Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-06-24 11:34:51 +02:00
Sijun He	7cf52a49de	Nezha Pytorch implementation (#17776 ) * wip * rebase * all tests pass * rebase * ready for PR * address comments * fix styles * add require_torch to pipeline test * remove remote image to improve CI consistency * address comments; fix tf/flax tests * address comments; fix tf/flax tests * fix tests; add alias * repo consistency tests * Update src/transformers/pipelines/visual_question_answering.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * address comments * Update src/transformers/pipelines/visual_question_answering.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * merge * wip * wip * wip * most basic tests passes * all tests pass now * relative embedding * wip * running make fixup * remove bert changes * fix doc * fix doc * fix issues * fix doc * address comments * fix CI * remove redundant copied from * address comments * fix broken test Co-authored-by: Sijun He <sijunhe@Sijuns-MacBook-Pro.local> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2022-06-23 12:36:22 -04:00
Younes Belkada	18c263c4b6	BLOOM minor changes on tokenizer (#17823 ) * few fixes: - hardcode tokenizer padding side - remove unused args * few fixes: - added new attribute on TokenizerTesterMixin - added new slow test - remove unused arg on tokenizer class * make style * Update src/transformers/models/bloom/tokenization_bloom_fast.py Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * make quality * apply changes - remove new attribute - redefine test on the class * add comments Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>	2022-06-23 15:57:12 +02:00
Thomas Wang	abc400b06a	Add final_layer_norm to OPT model (#17785 ) * Add final_layer_norm to OPT model * Add JAX and TF version * Fix Keras name * Woops * Allow for non breaking change * Apply suggestions from code review * add tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-06-21 20:26:36 +02:00
Yih-Dar	f47afefb21	Use 5e-5 For BigBird PT/Flax equivalence tests (#17780 ) * rename to check_pt_flax_outputs * update check_pt_flax_outputs * use 5e-5 for BigBird PT/Flax test Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-21 17:55:26 +02:00
Lysandre Debut	6a5272b205	Prepare transformers for v0.8.0 huggingface-hub release (#17716 ) * Prepare CI for v0.8.0 * pin hfh (revert before merge) * Revert "pin hfh (revert before merge)" This reverts commit `a0103140e1`. * Test rc3 * Test latest rc * Unpin to the RC Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>	2022-06-21 11:51:18 -04:00
NielsRogge	b681e12d59	[ViTMAE] Fix docstrings and variable names (#17710 ) * Fix docstrings and variable names * Rename x to something better * Improve messages * Fix docstrings and add test for greyscale images Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-06-21 15:56:00 +02:00
Yih-Dar	d3cb28886a	Not use -1e4 as attn mask (#17306 ) * Use torch.finfo(self.dtype).min * for GPTNeoX * for Albert * For Splinter * Update src/transformers/models/data2vec/modeling_data2vec_audio.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fix -inf used in Bart-like models * Fix a few remaining -inf * more fix * clean up * For CLIP * For FSMT * clean up * fix test * Add dtype argument and use it for LayoutLMv3 * update FlaxLongT5Attention Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-06-20 16:16:16 +02:00
Sylvain Gugger	fdb120805c	Fix cache for GPT-Neo-X (#17764 ) * Fix cache for GPT-Neo-X * Add more tests	2022-06-20 08:43:36 -04:00
Joao Gante	132402d752	TF: BART compatible with XLA generation (#17479 ) * Also propagate changes to blenderbot, blenderbot_small, marian, mbart, and pegasus	2022-06-20 11:07:46 +01:00
Younes Belkada	d453ea6120	fix tolerance for a bloom slow test (#17634 )	2022-06-14 18:14:12 +02:00

1 2 3

118 Commits