transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

Author	SHA1	Message	Date
Allan Lin	91ff480e26	Update namespaces inside torch.utils.data to the latest. (#13167 ) * Update torch.utils.data namespaces to the latest. * Format * Update Dataloader. * Style	2021-08-19 14:29:51 +02:00
Patrick von Platen	ecfa7eb260	[AutoFeatureExtractor] Fix loading of local folders if config.json exists (#13166 ) * up * up	2021-08-18 16:18:13 +02:00
Ori Ram	439a43b6b4	Add splinter (#12955 ) * splinter template * initialize splinter classes * Splinter Tokenizer * splinter.rst * tokenization fixes * Documentation & some minor variable name changes * bug fix (added back question_token_id to config) + variable names * Minor bug fixes + variable name changes * Fix Splinter references after merge with new transformers * changes after running make style & quality * Fix documentation unindent * Fix doc indentation in tokenization_splinter * Fix also SplinterTokenizerFast * Add Splinter to index.rst and README * Fixdouble whitespace from index.rst * Fixed index.rst with 'make fix-copies' * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/splinter/__init__.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Added "copied from BERT" comments * Removing unnexessary code from modeling_splinter * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Remove references to TF modeling from splinter * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add differences between Splinter and Bert tokenizers * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter_fast.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Doc formatting * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * bug fix: remove load_tf_weights attribute * Some minor quality changes * Update docs/source/model_doc/splinter.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Change FullyConnectedLayer to SplinterFullyConnectedLayer * Variable naming * Reove gather_positions function * Remove ClassificationHead as it's outdated * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Remove hardcoded 102 token id * Minor style change * Added "tau" organization to all model identifiers & URLS * Added tau to the tests as well * Copy-from comments * Removed all unnecessary classes (e.g. SplinterForMaskedLM) * Running make fix-copies * Bug fix: Further removed unnecessary classes * Add Splinter to AutoTokenization * Add an integration test for Splinter * Removed initialize_new_qass from config - It will be done through different checkpoints * Removed `initialize_new_qass` from documentation as well * Added new checkpoint names (`tau/splinter-base-qass` and same for large) in the code * Minor change to test * SplinterTokenizer now doesn't abstract from BertTokenizer * SplinterTokenizerFast also dosn't abstract from Bert * style and quality * bug fix: import ing torch in tests only if it's available * Auto mappings * Changed copyrights in Splinter's files * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: yuvalkirstain <kirstain.yuval@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-08-17 08:29:01 -04:00
Nicolas Patry	d58926ab1d	Moving fill-mask pipeline to new testing scheme (#12943 ) * Fill mask pipelines test updates. * Model eval !! * Adding slow test with actual values. * Making all tests pass (skipping quite a bit.) * Doc styling. * Better doc cleanup. * Making an explicit test with no pad token tokenizer. * Typo.	2021-08-13 12:04:18 +02:00
Sylvain Gugger	9a498c37a2	Rely on huggingface_hub for common tools (#13100 ) * Remove hf_api module and use hugginface_hub * Style * Fix to test_fetcher * Quality	2021-08-12 14:59:02 +02:00
Patrick von Platen	6900dded49	[Flax/JAX] Run jitted tests at every commit (#13090 ) * up * up * up	2021-08-12 14:49:46 +02:00
Sylvain Gugger	ea8ffe36d3	Proper import for unittest.mock.patch (#13085 )	2021-08-12 11:23:00 +02:00
Kamal Raj	d329b63369	Deberta tf (#12972 ) * TFDeberta moved weights to build and fixed name scope added missing , bug fixes to enable graph mode execution updated setup.py fixing typo fix imports embedding mask fix added layer names avoid autmatic incremental names +XSoftmax cleanup added names to layer disable keras_serializable Distangled attention output shape hidden_size==None using symbolic inputs test for Deberta tf make style Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> removed tensorflow-probability removed blank line * removed tf experimental api +torch_gather tf implementation from @Rocketknight1 * layername DeBERTa --> deberta * copyright fix * added docs for TFDeberta & make style * layer_name change to fix load from pt model * layer_name change as pt model * SequenceClassification layername change, to same as pt model * switched to keras built-in LayerNormalization * added `TFDeberta` prefix most layer classes * updated to tf.Tensor in the docstring	2021-08-12 05:01:26 -04:00
Sylvain Gugger	0454e4bd8b	Fix ModelOutput instantiation form dictionaries (#13067 ) * Fix ModelOutput instantiation form dictionaries * Style	2021-08-10 12:20:04 +02:00
Lysandre Debut	6f5ab9daf1	Add MBART to models exportable with ONNX (#13049 ) * Add MBART to models exportable with ONNX * unittest mock * Add tests * Misc fixes	2021-08-09 08:56:04 -04:00
Lysandre Debut	1bf38611a4	Put smaller ALBERT model (#13028 )	2021-08-06 12:41:33 -04:00
Michael Benayoun	dc420b0eb1	T5 with past ONNX export (#13014 ) T5 with past ONNX export, and more explicit past_key_values inputs and outputs names for ONNX model Authored-by: Michael Benayoun <michael@huggingface.co>	2021-08-06 15:46:26 +02:00
Sylvain Gugger	9870093f7b	[WIP] Disentangle auto modules from other modeling files (#13023 ) * Initial work * All auto models * All tf auto models * All flax auto models * Tokenizers * Add feature extractors * Fix typos * Fix other typo * Use the right config * Remove old mapping names and update logic in AutoTokenizer * Update check_table * Fix copies and check_repo script * Fix last test * Add back name * clean up * Update template * Update template * Forgot a ) * Use alternative to fixup * Fix TF model template * Address review comments * Address review comments * Style	2021-08-06 13:12:30 +02:00
Patrick von Platen	60e448c87e	[Flax] Correct pt to flax conversion if from base to head (#13006 ) * finish PR * add tests * correct tests * finish * correct other flax tests * better naming * correct naming * finish * apply sylvains suggestions	2021-08-05 18:38:50 +02:00
Michael Benayoun	a6d62aaba0	GPT-Neo ONNX export (#12911 ) GPT-Neo ONNX export and task / feature refactoring Authored-by: Michael Benayoun <michael@huggingface.co>	2021-08-05 10:12:13 +02:00
NielsRogge	83e5a10603	Add BEiT (#12994 ) * First pass * Make conversion script work * Improve conversion script * Fix bug, conversion script working * Improve conversion script, implement BEiTFeatureExtractor * Make conversion script work based on URL * Improve conversion script * Add tests, add documentation * Fix bug in conversion script * Fix another bug * Add support for converting masked image modeling model * Add support for converting masked image modeling * Fix bug * Add print statement for debugging * Fix another bug * Make conversion script finally work for masked image modeling models * Move id2label for datasets to JSON files on the hub * Make sure id's are read in as integers * Add integration tests * Make style & quality * Fix test, add BEiT to README * Apply suggestions from @sgugger's review * Apply suggestions from code review * Make quality * Replace nielsr by microsoft in tests, add docs * Rename BEiT to Beit * Minor fix * Fix docs of BeitForMaskedImageModeling Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-08-04 18:29:23 +02:00
Lysandre Debut	0dd1152c18	Skip ProphetNet test (#12462 )	2021-08-04 18:24:54 +02:00
Patrick von Platen	a317e6c3be	[Flax] Correctly Add MT5 (#12988 ) * finish PR * finish mt5 * push * up * Update tests/test_modeling_flax_mt5.py Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2021-08-04 16:03:13 +02:00
Patrick von Platen	da9754a3a0	[Flax] Align jax flax device name (#12987 ) * [Flax] Align device name in docs * make style * fix import error	2021-08-04 16:00:09 +02:00
Sylvain Gugger	d4c834d2e0	Fix from_pretrained with corrupted state_dict (#12939 ) * Fix from_pretrained with corrupted state_dict * Adapt test * Use better checkpoint * Style * Clean up	2021-08-04 11:48:39 +02:00
NielsRogge	a28da4c490	Replace nielsr by google namespace in tests (#12453 )	2021-08-04 03:29:34 -04:00
Philip May	b7439675b8	fix `Trainer.train(resume_from_checkpoint=False)` is causing an exception (#12981 ) * fix #12970 * Update tests/test_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove unnecessary issue link * fix test formatting Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-08-03 10:10:33 +02:00
Nicolas Patry	e2d22eef14	Moving feature-extraction pipeline to new testing scheme (#12843 ) * Update feature extraction pipelilne. * Leaving 1 small model for actual values check. * Fixes tests - Better support for tokenizer with no pad token - Increasing PegasusModelTesterConfig for pipelines - Test of feature extraction are more permissive + don't test Multimodel models + encoder-decoder. * Fixing model loading with incorrect shape (+ model with HEAD). * Update tests/test_pipelines_common.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Revert modeling_utils modification. * Some corrections. * Update tests/test_pipelines_common.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_pipelines_feature_extraction.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Syntax. * Fixing text-classification tests. * Don't modify this file. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-07-29 19:35:55 +02:00
Funtowicz Morgan	640421c0ec	ONNX v2 raises an Exception when using PyTorch < 1.8.0 (#12933 ) * Raise an issue if the pytorch version is < 1.8.0 * Attempt to add a test to ensure it correctly raises. * Missing docstring. * Second attempt, patch with string absolute import. * Let's do the call before checking it was called ... * use the correct function ... 🤦 * Raise ImportError and AssertionError respectively when unable to find torch and torch version is not sufficient. * Correct path mock patching * relax constraint for torch_onnx_dict_inputs to ge instead of eq. * Style. * Split each version requirements for torch. * Let's compare version directly. * Import torch_version after checking pytorch is installed. * @require_torch	2021-07-29 18:02:29 +02:00
Nicolas Patry	a3bd763732	Better heuristic for token-classification pipeline. (#12611 ) * Better heuristic for token-classification pipeline. Relooking at the problem makes thing actually much simpler, when we look at ids from a tokenizer, we have no way in general to recover if some substring is part of a word or not. However, within the pipeline, with offsets we still have access to the original string, so we can simply look if previous character (if it exists) of a token, is actually a space. This will obviously be wrong for tokenizers that contain spaces within tokens, tokenizers where offsets include spaces too (Don't think there are a lot). This heuristic hopefully is fully bc and still can handle non-word based tokenizers. * Updating test with real values. * We still need the older "correct" heuristic to prevent fusing punctuation. * Adding a real warning when important.	2021-07-26 16:21:26 +02:00
Thibault FEVRY	434022adac	Add RemBERT model code to huggingface (#10692 ) * Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825) get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler is prohibitively slow for large number of megabatches (in test case takes hours for ~270k megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []). Resolves: #11795 Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu> * Replace double occurrences as the last step (#11367) * [Flax] Fix PyTorch import error (#11839) * fix_torch_device_generate_test * remove @ * change pytorch import to flax import * Fix reference to XLNet (#11846) * Switch mem metrics flag (#11851) * Switch mem metrics flag * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Fix flos single node (#11844) * fixing flos bug/typo in non-distributed setting * storing flos every logging_interval * Fix two typos in docs (#11852) * typo2 * fix typo * [Trainer] Report both steps and num samples per second (#11818) * [Trainer] Report both steps and num samples per second * Fix batch number * Update src/transformers/trainer_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Address review comments Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Add some tests to the slow suite #11860 * Enable memory metrics in tests that need it (#11859) * fixed a small typo in the doc (#11856) * typo (#11858) * Add option to log only once in multinode training (#11819) * Add option to long only once in multinode training * Use an alternate property * [Wav2Vec2] SpecAugment Fast (#11764) * first try * finish * [lm examples] fix overflow in perplexity calc (#11855) * fix overflow in perplexity calc * use inf * fix * [Examples] create model with custom config on the fly (#11798) * create custom model on the flight * better wording * add update_from_string * cleanup * cleanup * Update src/transformers/configuration_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * more bool options * style * fix logger * add test * add the doc * assert on conflict of options Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [Wav2Vec2ForCTC] example typo fixed (#11878) * Ensure input tensor are on device. (#11874) The feature extractor does not create tensors on the appropriate device, so we call `ensure_tensor_on_device` before feeding the processed inputs to the model. * Fix usage of head masks by TF encoder-decoder models' `generate()` function (#11775) * Fix Bart * Fix Blenderbot{,_small} * Fix LED * Fix Marian * Fix MBart * Fix Pegasus * Fix T5 * Add test for generation with head_mask * Add a common TF test * Override a test for the LED model as head masking is not yet properly implemented * Remove all head_masks from input preparation for LED * Drop masking for T5 as it needs a bit of refactor * Correcting comments in T5Stack to reflect correct tuple order (#11330) * Correcting comments to reflect correct tuple order In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967. * Update modeling_t5.py Updating another comment as well * Removing extra space * Fixing style and quality * style & quality * Update src/transformers/models/t5/modeling_t5.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Flax] Allow dataclasses to be jitted (#11886) * fix_torch_device_generate_test * remove @ * change dataclasses to flax ones * fix typo * fix jitted tests * fix bert & electra * changing find_batch_size to work with tokenizer outputs (#11890) * changing find_batch_size to work with tokenizer outputs trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop. * Trigger CI Co-authored-by: jrenner <joseph.renner@inria.fr> * Link official Cloud TPU JAX docs (#11892) * Flax Generate (#11777) * fix_torch_device_generate_test * remove @ * add * indexing * correct a couple of tests * fix tests * add logits processor * finish top_k, top_p, temp * add docs * correct flax prng key default * improve generate * add generation docs * add docs * make style * revert model outputs change * make style * correct typo * fix tests * fix slow test * add raise * finish generation Co-authored-by: Patrick von Platen <patrick@huggingface.co> * Add Emotion Speech Noteboook (#11900) * Update deepspeed config to reflect hyperparameter search parameters (#11896) * rebuild deepspeed config for hyperparameter search * reformat code to fix style issues * Adding new argument `max_new_tokens` for generate. (#11476) * Adding new argument `max_new_tokens` for generate. This is a proposal to add a new argument `max_new_tokens` to `generate`. This include a `MaxNewTokensCriteria` that enables callers that don't know about the token length ahead (like pipelines callers) to manage more easily the length of their generated output. * Adding a test for the user warning when both`max_length` and `max_new_tokens` are used together. * Removed redundant `no_grad`. * Added Sequence Classification class in GPTNeo (#11906) * seq classification changes * fix tests * [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918) * Added logic to return attention from flax-bert model and added test cases to check that * Added new line at the end of file to test_modeling_flax_common.py * fixing code style * Fixing Roberta and Elextra models too from cpoying bert * Added temporary hack to not run test_attention_outputs for FlaxGPT2 * Returning attention weights from GPT2 and changed the tests accordingly. * last fixes * bump flax dependency Co-authored-by: jayendra <jayendra@infocusp.in> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Test optuna and ray (#11924) * Remove `datasets` submodule * fix assert (#11935) * Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920) * Remove redundant `nn.log_softmax` in `run_flax_glue.py` `optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference. * Remove unused 'flax.linen' import * Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961) * Add MT5ForConditionalGeneration as supported arch. * Update README.md * Add FlaxCLIP (#11883) * add flax CLIP * default input_shape * add tests * fix test * fix name * fix docs * fix shapes * attend at least 1 token * flax conv to torch conv * return floats * fix equivalence tests * fix import * return attention_weights and update tests * fix dosctrings * address patricks comments * input_shape arg * add tests for get_image_features and get_text_features methods * fix tests * RAG-2nd2end-revamp (#11893) * initial * code quality test * code quality * added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver * minor change in test_modeling_rag * fixed tests * Update examples/research_projects/rag-end2end-retriever/README.md typo corrected as suggested by lhoestq Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * Update examples/research_projects/rag-end2end-retriever/finetune_rag.py type change suggested by lhoestq Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * Update src/transformers/models/rag/retrieval_rag.py Adding this change as mentioned by lhoestq. Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * completed the minor changes suggested by the reviewers Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * modify qa-trainer (#11872) * modify qa-trainer * fix flax model * bugfixes training_args.py (#11922) modified according to: https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html * reinitialize wandb config for each hyperparameter search run (#11945) * Add regression tests for slow sentencepiece tokenizers. (#11737) * add test_vocab_size for sentencepiece tok. * add test_get_vocab for sentencepiece tok. * add test_convert_token_and_id for sentencepiece tok. * add test_tokenize_and_convert_tokens_to_string for all tok. * improve test_tokenize_and_convert_tokens_to_string for sp. tok. * add common tokenizer integration tests - for albert - for barthez * add tokenizer integration tests to bert gen. * add most tokenizer integration tests * fix camembert tokenizer integration test * add tokenizer integration test to marian * add tokenizer integration test to reformer * add typing and doc to tokenizer_integration_test_util * fix tokenizer integration test of reformer * improve test_sentencepiece_tokenize_and_convert_tokens_to_string * empty commit to trigger CI * fix tokenizer integration test of reformer * remove code not needed anymore * empty commit to trigger CI * empty commit to trigger CI * Authorize args when instantiating an AutoModel (#11956) * Neptune.ai integration (#11937) An option that turns on neptune.ai logging --report_to 'neptune' Additional ENV variables: NEPTUNE_PROJECT NEPTUNE_API_TOKEN NEPTUNE_RUN_NAME (optional) NEPTUNE_STOP_TIMEOUT (optional) * Run the integration tests on schedule tests instead of master tests * [deepspeed] docs (#11940) * deepspeed docs * cleanup * cleanup * typo correction (#11973) * typo correction * type corrections * ByT5 model (#11971) * allow tf to use uneven num of layers * add tokenizer * finish docs * finish docs * Apply suggestions from code review * include in index * finish * Update docs/source/model_doc/byt5.rst Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * apply sylvais suggestions * make style Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Typo in usage example, changed to device instead of torch_device (#11979) * [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966) * decouple DeepSpeedConfigHF from Trainer * add LoggingLevel ctx manager; add new test * cleanup * add docs * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * implemented suggested renames * formatter workaround Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [Trainer] add train loss and flops metrics reports (#11980) * add train loss and flops metrics reports * consistency * add train_loss to skip keys * restore on_train_end call timing * Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5) --- updated-dependencies: - dependency-name: urllib3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [RAG] Fix rag from pretrained question encoder generator behavior (#11962) * fix_torch_device_generate_test * remove @ * fix rag from pretrained loading * add test * uplaod * finish * VisualBERT (#10534) * Init VisualBERT * Add cookie-cutter, Config, and Embeddings * Add preliminary Model * Add Bert analogous classes * Add basic code for NLVR, VQA, Flickr * Update Init * Fix VisualBert Downstream Models * Rename classifier to cls * Comment position_ids buffer * Remove sentence image predictor output * Update output dicts * Remove unnecessary files * Fix Auto Modeling * Fix transformers init * Add conversion script * Add conversion script * Fix docs * Update visualbert modelling * Update configuration * Style fixes * Add model and integration tests * Add all tests * Update model mapping * Add simple detector from original repository * Update docs and configs * Fix style * Fix style * Update docs * Fix style * Fix import issues in style * Fix style * Add changes from review * Fix style * Fix style * Update docs * Fix style * Fix style * Update docs/source/model_doc/visual_bert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add changes from review * Remove convert run script * Add changes from review * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add changes from review * Add changes from review * Add visual embedding example in docs * Fix "copied from" comments * Add changes from review * Fix error, style, checkpoints * Update docs * Fix integration tests * Fix style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix examples (#11990) * [docs] fix xref to `PreTrainedModel.generate` (#11049) * fix xref to generate * do the same for search methods * style * style * Update return introduction (#11976) Make it clear that the `forward` method now returns a dict instead of tuple. Fix style * [deepspeed] Move code and doc into standalone files (#11984) * move code and docs * style * moved * restore * [deepspeed] add nvme test skip rule (#11997) * add nvme skip rule * fix * Fix weight decay masking in `run_flax_glue.py` (#11964) * Fix weight decay masking in `run_flax_glue.py` Issues with the previous implementation: - The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods. - `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped. - Flax's LayerNorm calls the scale parameter `scale` not `weight` * Fix formatting with black * adapt results Co-authored-by: Patrick von Platen <patrick@huggingface.co> * [Flax] Refactor MLM (#12013) * fix_torch_device_generate_test * remove @ * finish refactor Co-authored-by: Patrick von Platen <patrick@huggingface.co> * [Deepspeed] Assert on mismatches between ds and hf args (#12021) * wip * add mismatch validation + test * renames * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * renames Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [TrainerArguments] format and sort __repr__, add __str__ (#12018) * format and sort __repr__, add __str__ * typo * use __str__ directly * alias __repr__ = __str__ * Fixed Typo in modeling_bart.py (#12035) * Fixed Typo in modeling_bart.py - Issue #11895 * Fixed Typo in modeling_bart.py * fix deberta 2 tokenizer integration test (#12017) * fix docs of past_key_values (#12049) * [JAX] Bump jax lib (#12053) * fix_torch_device_generate_test * remove @ * bump up jax lib * Fixes bug that appears when using QA bert and distilation. (#12026) * Fixing bug that appears when using distilation (and potentially other uses). During backward pass Pytorch complains with: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails. * Fixing all models QA clamp_ bug. * Extend pipelines for automodel tupels (#12025) * fix_torch_device_generate_test * remove @ * finish * refactor * add test * fix test * Attempt at simplification. * Small fix. * Fixing non existing AutoModel for TF. * Naming. * Remove extra condition. Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com> * Add optional grouped parsers description to HfArgumentParser (#12042) * Adding optional argument group to HfArgumentParser * Minor * remove whitespace * Minor styling * adds metric prefix. (#12057) * adds metric prefix. * update tests to include prefix * skip failing test (#12059) * Fix integration tests (#12066) * Fix tapas issue (#12063) * Fix scatter function to be compatible with torch-scatter 2.7.0 * Allow test again * updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806) * updated the original RAG implementation to be compatible with the latest PL version * updated the requirements.txt file * execute make style * code quality test * code quality * conflix resolved in requirement.txt * code quality * changed the MyDDP class name to CustomDDP * Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027) * Replace legacy torch.Tensor constructor with torch.{tensor, empty} * Remove torch.Tensor in examples * Add torch to requirements.txt in language-modeling (#12040) * Add torch to requirements.txt in language-modeling * Update examples/pytorch/language-modeling/requirements.txt Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Properly indent block_size (#12070) * [Deepspeed] various fixes (#12058) * replace deprecated config * sub_group_size was too big * complete deprecation removal * [Deepspeed Wav2vec2] integration (#11638) * wip * wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044 * cleanup * workaround * working 5/8 modes * solve fp32 distributed zero3 * style * sync * sync * rework * deprecation * cleanup * https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged * clean up * add a guide * more prose * more prose * fix * more prose * sub_group_size was too big * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * refactor * bug fix * make the true check explicit * new deepspeed release Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * typo * Update run_ner.py with id2label config (#12001) * sync LayerDrop for Wav2Vec2Encoder + tests (#12076) * Add DETR (#11653) * Squash all commits of modeling_detr_v7 branch into one * Improve docs * Fix tests * Style * Improve docs some more and fix most tests * Fix slow tests of ViT, DeiT and DETR * Improve replacement of batch norm * Restructure timm backbone forward * Make DetrForSegmentation support any timm backbone * Fix name of output * Address most comments by @LysandreJik * Give better names for variables * Conditional imports + timm in setup.py * Address additional comments by @sgugger * Make style, add require_timm and require_vision to testsé * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone * Add png files to fixtures * Fix type hint * Add timm to workflows * Add `BatchNorm2d` to the weight initialization * Fix retain_grad test * Replace model checkpoints by Facebook namespace * Fix name of checkpoint in test * Add user-friendly message when scipy is not available * Address most comments by @patrickvonplaten * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner * Better initialization * Scipy is necessary to get sklearn metrics * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel * Make style * Improve docs and add 2 community notebooks Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> * [test] support more than 2 gpus (#12074) * support more than 2 gpus * style * Wav2Vec2 Pretraining (#11306) * Working quantizer forward * Working quantizer forward * Clean up unused model parts, test reproducibility * Working quantizer forward * Clean up unused model parts, test reproducibility * Remove custom outputs from the shared ones * correct conversion * correct bug * add first pretrain script * save intermediate * static shapes * save intermediate * finish first pretrain script version * more refactor * remove wanddb * refactor more * improve test * correct perplexity compute bug * finish model implementation * add to docs * finish docs * finish pretraining script * finish pretraining script * remove wandb * finish PR for merge * finish config * finish * make deepspeed work * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply suggestions * fix flaky test Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * pass decay_mask fn to optimizer (#12087) * rm require_version_examples (#12088) * [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089) * fix_torch_device_generate_test * remove @ * fix tests * Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083) * Add text_column_name and label_column_name to run_ner args * Minor fix: grouping for text and label column name * CLIPFeatureExtractor should resize images with kept aspect ratio (#11994) * Resize with kept aspect ratio * Fixed failed test * Overload center_crop and resize methods instead * resize should handle non-PIL images * update slow test * Tensor => tensor Co-authored-by: patil-suraj <surajp815@gmail.com> * New TF GLUE example (#12028) * Pushing partially-complete new GLUE example * First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready. * Fix to the fit() call * Bugfixes, making sure TPU and multi-GPU support is ready * Remove logger line that depends on Pytorch * Style pass * Deleting old TF GLUE example * Include label2id and id2label in the saved model config * Don't clobber the existing model.config.label2id * Style fixes * Update examples/tensorflow/text-classification/run_glue.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix quality * Update README.md to cover the TF GLUE example. * Minor style edits * Appending label2id and id2label to models to ensure inference works properly (#12102) * Fix a condition in test_generate_with_head_masking (#11911) * Fix a condition in test_generate_with_head_masking * Fix usage of head_mask in bigbirg_pegasus * Fix head masking for speech2text * Resolve copy mismatch + drop unwanted print statement * Fix the condition * Flax VisionTransformer (#11951) * adding vit for flax * added test for Flax-vit and some bug-fixes * overrided methods where variable changes were necessary for flax_vit test * added FlaxViTForImageClassification for test * Update src/transformers/models/vit/modeling_flax_vit.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * made changes suggested in PR * Adding jax-vit models for autoimport * swapping num_channels and height,width dimension * fixing the docstring for torch-like inputs for VIT * add model to main init * add docs * doc, fix-copies * docstrings * small test fixes * fix docs * fix docstr * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style Co-authored-by: jayendra <jayendra@infocusp.in> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * add relevant description to tqdm in examples (#11927) * add relevant `desc` in examples * require_version datasets>=1.8.0 * Fix head masking generate tests (#12110) * fix_torch_device_generate_test * remove @ * fix tests * Flax CLM script (#12023) * first draft * max_seq_length => block_size * fix arg names * fix typos * fix loss calculation * add max examples, fix train eval steps, metrics * optimizer mask * fix perpelexity, metric logging * fix logging * data_collator = > data_loader * refactor loss_fn * support single GPU * pass distributed to write_metric * fix jitting * fix single device training * fix single device metrics * close inner progress bars once finished * add overwrite_cache arg * ifx dataset caching issue * add more logs * few small fixes, * address nicholas suggestions * fix docstr * address patricks suggestions * make flake happy * pass new new_dropout_rng to apply_gradients * reset train metrics after every epoc * remove distributed logis, small fixes * Add from_pretrained to dummy timm objects (#12097) * Add from_pretrained to dummy timm * Fix at the source * Update utils/check_dummies.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Missing pretrained dummies * Style Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix t5 error message (#12136) * Fix t5 error message * Fix again * Fix megatron_gpt2 attention block's causal mask (#12007) * Fix megatron_gpt2 attention block's causal mask. * compatibility with checkpoints created with recent versions of Megatron-LM * added integration test for the released Megatron-GPT2 model * code style changes * added option to megatron conversion script to read from config file Co-authored-by: Guido Novati <gnovati@nvidia.com> * Add mlm pretraining xla torch readme (#12011) * fix_torch_device_generate_test * remove @ * upload * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Update examples/flax/language-modeling/README.md * add more info * finish * fix Co-authored-by: Patrick von Platen <patrick@huggingface.co> * add readme for flax clm (#12111) * add readme for flax clm * use section link for tokenizer * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update metrics Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * FlaxBart (#11537) * Start working on FlaxBart * Create modeling_flax_bart.py * Write FlaxBartAttention * Add FlaxBartEncoderLayer * Add FlaxBartDecoderLayer and some typing * Add helepr function for FlaxBart * shift_tokens_right * _make_causal_mask * _expand_mask * Add PositionalEmbedding and fix init_std naming * Add FlaxBartPretrainedModel * Add FlaxBartEncoder * Add FlaxBartEncoder * Add FlaxBartEncoder among modules to be imported * YET WE CANNOT INITIALIZE THAT!! :( * Make BartEncoder working Change BartEncoder to instance of nn.Module so far * Add FlaxBartDecoder * Add FlaxBartModel * TODO to make model run -> Prepapre model inputs * Resolve padding * Add FlaxBartModel * Add FlaxBartModel into importable modules * Remove FlaxBartEncoder and FlaxBartDecoder from importable modules * make style; not properly working * make style; make quality not pass due to some import I left * Remove TODO for padding_idx in nn.Embed so far * Add FlaxBartForConditionalGeneration * Incorporate Flax model output classes, i.e. return_dict * Add another models and incorporate use_cache arg * Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering * Incorporate use_cache arg from PyTorch implementation * Add all necessary Flax output utils * Add FlaxBartForCausalLM; not working yet' * Add minor improvements; still lacks some functionality * Update docs, src and tests * Add support of FlaxBart to docs/source * Fix some bugs in FlaxBart souce code * Add some neccessary tests for FlaxBart models - jit_compilation not passing * Fix tests and add test_head_masking * Fix tests for @jax.jit computation * Add test_head_masking * Migrate FlaxBart tests from jax.numpy to numpy * Remove FlaxBartForCausalLM * Clean repo * fix bart model weight structure * Fix FlaxBartForSequenceClassification Slicing is not possible to use below jit, therefore, selecting sentence representation from hidden_states must be changed. * Allow FlaxBartForSequenceClassification for testing pt_flax equivalence * Allow testing for FlaxBartForQA for pt_flax equivalence * Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6 * remove past_key_values * remove inputs_mebeds and make input_ids required * add position ids * re-write attention layer * fix dataclass * fix pos embeds and attention output * fix pos embeds * expose encode method * expose decode method * move docstring to top * add cache for causal attn layer * remove head masking for now * s2s greedy search first pass * boom boom * fix typos * fix greedy generate for bart * use encoder, decoder layers instead of num_hidden_layers * handle encoder_outputs * cleanup * simplify decoding * more clean-up * typos * Change header + add {decoder_,}position_ids into 2 models * add BartConfig * fix existing tests * add encode, decode methods * Fix shift_tokens_right for JIT compilation + clarify one condition * fix decode * encoder => encode * simplify generate * add tests for encode and decode * style * add tests for cache * fix equivalence tests * sample generate now works with seq2seq * generation tests * initialize dense layers * docstring and cleanup * quality * remove get/set input_embeddings * address Patricks suggestions * decode for every model, remove encoder_outputs from call * update tests accordingly * decode returns only decoder outputs and logits * fix arguments * doc encode, decode methods * correct base_model_prefix * fix test for seq classif model * fix docs Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> * Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810) * feature for tokenizer without slow/legacy version * format * modify common test * add tests * add PreTrainedTokenizerFast to AutoTokenizer * format * change tokenizer common test in order to be able to run test without a slow version * update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class` * add autokenizer test * replace `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None` * remove obsolete change in comment * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/tokenization_utils_fast.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * change `get_main_tokenizer` into `get_tokenizers` * clarify `get_tokenizers` method * homogenize with `test_slow_tokenizer` and `test_rust_tokenizer` * add `test_rust_tokenizer = False` to tokenizer which don't define a fast version * `test_rust_tokenizer = False` for BertJapaneseTokenizer * `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [Flax] Add links to google colabs (#12146) * fix_torch_device_generate_test * remove @ * add colab links * Don't log anything before logging is setup in examples (#12121) * Don't log anything before logging is setup in examples * Last example * Use text_column_name variable instead of "text" (#12132) * Use text_column_name variable instead of "text" `text_column_name` was already defined above where I made the changes and it was also used below where I made changes. This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway. * black formatting * make style Co-authored-by: Nicholas Broad <nicholas@nmbroad.com> * [lm examples] Replicate --config_overrides addition to other LM examples (#12135) * [lm examples] Replicate --config_overrides addition to other LM examples * Removing no trainer files changes * Update README Co-authored-by: Kumar Abhishek <kabhishek@expedia.com> * fix error message (#12148) * [optim] implement AdafactorSchedule (#12123) * implement AdafactorSchedule * typo * fix * Update src/transformers/optimization.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [style] consistent nn. and nn.functional (#12124) * consistent nn. and nn.functional * fix glitch * fix glitch #2 * Adding TFWav2Vec2Model (#11617) * [WIP] Add TFWav2Vec2Model Work in progress for adding a tensorflow version of Wav2Vec2 * feedback changes * small fix * Test Feedback Round 1 * Add SpecAugment and CTC Loss * correct spec augment mask creation * docstring and correct copyright * correct bugs * remove bogus file * finish tests correction * del unnecessary layers * Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * correct final bug * Feedback Changes Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Flax] Fix flax pt equivalence tests (#12154) * fix_torch_device_generate_test * remove @ * upload * consistent nn. and nn.functional: p2 templates (#12153) * Flax Big Bird (#11967) * add flax bert * bert -> bigbird * original_full ported * add debugger * init block sparse * fix copies ; gelu_fast -> gelu_new * block sparse port * fix block sparse * block sparse working * all ckpts working * fix-copies * make quality * init tests * temporary fix for FlaxBigBirdForMultipleChoice * skip test_attention_outputs * fix * gelu_fast -> gelu_new ; fix multiple choice model * remove nsp * fix sequence classifier * fix * make quality * make fix-copies * finish * Delete debugger.ipynb * Update src/transformers/models/big_bird/modeling_flax_big_bird.py * make style * finish * bye bye jit flax tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [style] consistent nn. and nn.functional: part 3 `tests` (#12155) * consistent nn. and nn.functional: p3 templates * restore * [style] consistent nn. and nn.functional: part 4 `examples` (#12156) * consistent nn. and nn.functional: p4 examples * restore * consistent nn. and nn.functional: part 5 docs (#12161) * Add video links to the documentation (#12162) * [Flax generate] Add params to generate (#12171) * fix_torch_device_generate_test * remove @ * add params as input * finish * Use a released version of optax rather than installing from Git. (#12173) Use a released version of optax rather than installing from Git * Have dummy processors have a `from_pretrained` method (#12145) * Add course banner (#12157) * Add course banner * Update course banner * Adjust banner width * Enable add_prefix_space if model_type is roberta or gpt2 (#12116) * Update AutoModel classes in summarization example (#12178) - Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM - Add newly required `truncation=True` to `tokenizer.encode` with `max_length` This silences all warnings. * Ray Tune Integration Updates (#12134) * fix * fixes * add back to scheduled tests * formatting * Update integrations.py * [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166) * ensure concurrent pytest workers use a unique port for torch.distributed.launch * reword * Model card defaults (#12122) * [WIP] Model card defaults * finetuned_from default value * Add all mappings to the mapping file * Be more defensive on finetuned_from arg * Add default task tag * Separate tags from tasks * Edge case for dataset * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Temporarily deactivate torch-scatter while we wait for new release (#12181) * Temporarily deactivate torch-scatter while we wait for new release * torch-1.8.1 binary for scatter * Revert to 1.8.0 * Pin torch dependency * torchaudio and torchvision * Temporarily deactivate torchhub test (#12184) * [Flax] Add Beam Search (#12131) * fix_torch_device_generate_test * remove @ * push new logit processors * add processors * save first working version * save intermediate * finish * make style * make fix-copies * finish * Update tests/test_modeling_flax_bart.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> * Hubert (#11889) * fix_torch_device_generate_test * remove @ * add hubert * add first test file * more docs * fix bugs * fix bug * finish * finish * finish docstring * fix * fix * finalize * add to ignored * finish * Apply suggestions from code review * correct naming * finish * fix auto config * finish * correct convert script * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com> * apply suggestions lysandre & suraj Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com> * updated DLC images and sample notebooks (#12191) * Enabling AutoTokenizer for HubertConfig. (#12198) * Use yaml to create metadata (#12185) * Use yaml to create metadata * Fix typo * Remove pin * [Docs] fixed broken link (#12205) * fixed broken link * Update docs/source/benchmarks.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/benchmarks.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Pipeline update & tests (#12207) * Improve detr (#12147) * Remove unused variables * Improve docs * Fix docs of segmentation masks Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Add link to the course (#12229) * Support for torch 1.9.0 (#12224) * Support for torch 1.9.0 * Torch scatter for 1.9.0 * Github Actions run on 1.9.0 * fix pt-1.9.0 `add_` deprecation (#12217) * fix pt-1.9.0 add_ deprecation * add () for clarity * Trigger CI * require_version(torch * Release: v4.7.0 * Docs for v4.8.0 * AutoTokenizer: infer the class from the tokenizer config if possible (#12208) * AutoTokenizer: infer the class from the tokenizer config if possible * Add tests * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update desc for map in all examples (#12226) * update desc for map in all examples * added plm * suggestions * [Flax] FlaxAutoModelForSeq2SeqLM (#12228) * add FlaxAutoModelForSeq2SeqLM * [FlaxBart] few small fixes (#12247) * boom boom * remove flax clip example * few small fixes * Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240) * Moved Mish to Torch 1.9 version * Run black formatting * [t5 doc] make the example work out of the box (#12239) * [run_clm.py] restore caching * style * [t5 doc] make the example work out of the box This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break. * Update docs/source/model_doc/t5.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * expand the other example Co-authored-by: Suraj Patil <surajp815@gmail.com> * Fix the scheduled CI * Better CI feedback (#12279) * Better run ID * Only part of CI * Revert "Only part of CI" This reverts commit `29f7f248d2`. * Fix for making student ProphetNet for Seq2Seq Distillation (#12130) * make_student.py: fix to make student ProphetNet * reformat * [FlaxClip] fix test from/save pretrained test (#12284) * boom boom * remove flax clip example * fix from_save_pretrained * [Flax] [WIP] allow loading head model with base model weights (#12255) * boom boom * remove flax clip example * allow loading head model with base model weights * add test * fix imports * disable save, load test for clip * add test_save_load_to_base * [DeepSpeed] don't ignore --adafactor (#12257) * [Flax] Fix flax test save pretrained (#12256) * fix_torch_device_generate_test * remove @ * fix flax save pretrained test * Tensorflow QA example (#12252) * New Tensorflow QA example! * Style pass * Updating README.md for the new example * flake8 fixes * Update examples/tensorflow/question-answering/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [Flax] Add jax flax to env command (#12251) * fix_torch_device_generate_test * remove @ * add commands for flax/jax * reset report_to to none, avoid deprecation warning (#12293) * [trainer + examples] set log level from CLI (#12276) * set log level from CLI * add log_level_replica + test + extended docs * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename datasets objects to allow datasets module * improve the doc * style * doc improve Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [tests] multiple improvements (#12294) * [tests] multiple improvements * cleanup * style * todo to investigate * fix * Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252) * registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing * sytle format * adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue * adding the try catch to the fix as persistent flag is only available from PT >1.6 * adding version check * added the condition to only use the token_type_ids buffer when its autogenerated not passed by user * adding comments and making the conidtion where token_type_ids are None to use the registered buffer * taking out position-embeddding from the if block * adding comments * handling the case if buffer for position_ids was not registered * reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings * reverting the token_type_ids in case of None to the previous version * reverting changes on position_ids adding back the if block * changes added by running make fix-copies * changes added by running make fix-copies and added the import version as it was getting used * changes added by running make fix-copies * changes added by running make fix-copies * fixing the import format * fixing the import format * modified to use temp tensor for trimed and expanded token_type_ids buffer * changes made by fix-copies after temp tensor modifications * changes made by fix-copies after temp tensor modifications * changes made by fix-copies after temp tensor modifications * clean up * clean up * clean up * clean up * Nit * Nit * Nit * modified according to support device conversion on traced models * modified according to support device conversion on traced models * modified according to support device conversion on traced models * modified according to support device conversion on traced models * changes based on latest in master * Adapt templates * Add version import Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> * trainer_tf: adjust wandb installation command (#12291) * add FlaxAutoModelForImageClassification in main init (#12298) * Fix and improve documentation for LEDForConditionalGeneration (#12303) * Replace conditional generation example (fixes #12268) * Replace model in summarization example with finetuned checkpoint, adapt example text * Fix typo in new summarization example * Fix docstring formatting, add missing import statement to example * [Flax] Main doc for event orga (#12305) * fix_torch_device_generate_test * remove @ * push * finish * some typos * add more info on communication * add suggestions * [trainer] 2 bug fixes and a rename (#12309) * bug fixes and a rename * add extended DDP test * FlaxBartPretrainedModel -> FlaxBartPreTrainedModel (#12313) * [docs] performance (#12258) * initial performance document * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * rewrites based on suggestions * 8x multiple is for AMP only * add contribute section Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Add CodeCarbon Integration (#12304) * Add optional dependency * Add CodeCarbon integration * Add CodeCarbon integration * Add CodeCarbon integration * typo * Optimizing away the `fill-mask` pipeline. (#12113) * Optimizing away the `fill-mask` pipeline. - Don't send anything to the tokenizer unless needed. Vocab check is much faster - Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again - Make `targets` and `top_k` work together better `top_k` cannot be higher than `len(targets)` but can be smaller still. - Actually simplify the `target_ids` in case of duplicate (it can happen because we're parsing raw strings) - Removed useless code to fail on empty strings. It works only if empty string is in first position, moved to ignoring them instead. - Changed the related tests as only the tests would fail correctly (having incorrect value in first position) * Make tests compatible for 2 different vocabs... (at the price of a warning). Co-authored-by: @EtaoinWu * ValueError working globally * Update src/transformers/pipelines/fill_mask.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity + fallback. Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Add output in a dictionary for TF `generate` method (#12139) * Add output args to greedy search * Fix critical typo + make style quality * Handle generate_beam_search * Add dict_specific tests and fix the placement of encoder outputs * Add specific outputs * Update doc * Fix typo * Adjust handling encoder_outputs + Fix generating for T5 * Fix generate for RAG * Fix handling ouptut_attentions when target_mapping is not None Take care of situations when target_mapping is provided as there are 2-tuple of attentions Change from: if inputs["output_attentions"]: attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions) to: if inputs["output_attentions"]: if inputs["target_mapping"] is not None: # when target_mapping is provided, there are 2-tuple of attentions attentions = tuple( tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions ) else: attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions) * Rename kwargs to model_kwargs * make style quality * Move imports in test_modeling_tf_common.py Move ModelOutput-related imports in test_modeling_tf_common.py into the `is_tf_available():` statement. * Rewrite nested if-statements * Fix added tests * Flax summarization script (#12230) * add summrization script * fix arguments, preprocessing, metrics * add generation and metrics * auto model, prediction loop * prettify * label smoothing * adress Sylvain and Patricks suggestions * dynamically import shift_tokens_right * fix shift_tokens_right_fn call * Rewrite ProphetNet to adapt converting ONNX friendly (#11981) * Rewrite * [ONNX] rewrite * Flax T5 (#12150) * copy pytorch-t5 * init * boom boom * forward pass same * make generation work * add more tests * make test work * finish normal tests * make fix-copies * finish quality * correct slow example * correct slow test * version table * upload models * Update tests/test_modeling_flax_t5.py * correct incorrectly deleted line Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick@huggingface.co> * Add mention of the huggingface_hub methods for offline mode (#12320) * [Flax/JAX] Add how to propose projects markdown (#12311) * fix_torch_device_generate_test * remove @ * finish * make style * [TFWav2Vec2] Fix docs (#12283) * fix error * make style check happy Co-authored-by: chenhaitao <chenhaitao@qiyi.com> * Clean push to hub API (#12187) * Clean push to hub API * Create working dir if it does not exist * Different tweak * New API + all models + test Flax * Adds the Trainer clean up * Update src/transformers/file_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * (nit) output types * No need to set clone_from when folder exists * Update src/transformers/trainer.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Add generated_from_trainer tag * Update to new version * Fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> * Add all XxxPreTrainedModel to the main init (#12314) * Add all XxxPreTrainedModel to the main init * Add to template * Add to template bis * Add FlaxT5 * Conda build (#12323) * Temporarily revert the `fill-mask` improvements. * changed modeling_fx_utils.py to utils/fx.py for clarity (#12326) Co-authored-by: Michael Benayoun <michael@huggingface.co> * Pin good version of huggingface_hub * [Flax T5] Fix weight initialization and fix docs (#12327) * finish t5 flax fixes * improve naming * Release: v4.8.0 * v4.9.0.dev0 * Update training_args.py (#12328) mention in `save_strategy` param description that `load_best_model_at_end` can override * [Deepspeed] new docs (#12077) * document sub_group_size * style * install + issues reporting * style * style * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * indent 4 * restore * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix default to logging_dir lost in merge conflict * try-this (#12338) Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * [examples/Flax] move the examples table up (#12341) * Fix torchscript tests (#12336) * Fix torchscript tests * Better test * Remove bogus print * Document patch release v4.8.1 * Add flax/jax quickstart (#12342) * Update README.md * fixed typo (#12356) * Fix exception in prediction loop occurring for certain batch sizes (#12350) * fix distributed_concat for scalar outputs * Update README.md * fixed typo (#12356) * simplify fix with terser syntax Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Trigger CI Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add FlaxBigBird QuestionAnswering script (#12233) * port bigbird script * adapt script a bit * change location * adapt more * save progress * init commit * style * dataset script tested * readme add * Replace NotebookProgressReporter by ProgressReporter in Ray Tune run (#12357) * Replace NotebookProgressReporter by ProgressReporter in Ray Tune run * Move to local import * Style * remove extra white space from log format (#12360) * fixed multiplechoice tokenization (#12362) * fixed multiplechoice tokenization The model would have seen two sequences: 1. [CLS]prompt[SEP]prompt[SEP] 2. [CLS]choice0[SEP]choice1[SEP] that is not correct as we want a contextualized embedding of prompt and choice * removed outer brackets for proper sequence generation * [trainer] add main_process_first context manager (#12351) * main_process_first context manager * handle multi-node, add context description * sync desc * [Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359) * added log_level * fix comment * fixed log_level * Trigger CI * Unfied logging * simplified args for log_level * updated example template (#12365) * replace print with logger (#12368) * [Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers (#12371) * Notify users that DataCollatorForWholeWordMask is limited to BertTokenier-like tokenizers * Fix code formatting * Update run_mlm.py (#12344) Before the code could not be used for validation only because of this line: extension = data_args.train_file.split(".")[-1] was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately. * Add possibility to maintain full copies of files (#12312) * [CI] add dependency table sync verification (#12364) * add dependency table sync verification * improve the message * improve the message * revert * ready to merge * [Examples] Added context manager to datasets map (#12367) * added cotext manager to datasets map * fixed style and spaces * fixed warning of deprecation * changed desc * [Flax community event] Add more description to readme (#12398) * fix_torch_device_generate_test * remove @ * boom boom * correct typos * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply suggestions from code review Co-authored-by: Suzana Ilić <io.suzanai@gmail.com> * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Suzana Ilić <io.suzanai@gmail.com> * Update README.md * Fix copies * Remove the need for `einsum` in Albert's attention computation (#12394) * debug albert einsum * Fix matmul computation * Let's use torch linear layer. * Style. * [Flax] Adapt flax examples to include `push_to_hub` (#12391) * fix_torch_device_generate_test * remove @ * finish * correct summary writer * correct push to hub * fix indent * finish * finish * finish * finish * finish Co-authored-by: Patrick von Platen <patrick@huggingface.co> * Tensorflow LM examples (#12358) * Tensorflow MLM example * Add CLM example * Style fixes, adding missing checkpoint code from the CLM example * Fix TPU training, avoid massive dataset warnings * Fix incorrect training length calculation for multi-GPU training * Fix incorrect training length calculation for multi-GPU training * Refactors and nitpicks from the review * Style pass * Adding README * pass the matching trainer log level to deepspeed (#12401) * [Flax] Add T5 pretraining script (#12355) * fix_torch_device_generate_test * remove @ * add length computatan * finish masking * finish * upload * fix some bugs * finish * fix dependency table * correct tensorboard * Apply suggestions from code review * correct processing * slight change init * correct some more mistakes * apply suggestions * improve readme * fix indent * Apply suggestions from code review Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * correct tokenizer * finish * finish * finish * finish Co-authored-by: Patrick von Platen <patrick@huggingface.co> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * [models] respect dtype of the model when instantiating it (#12316) * [models] respect dtype of the model when instantiating it * cleanup * cleanup * rework to handle non-float dtype * fix * switch to fp32 tiny model * improve * use dtype.is_floating_point * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix the doc * recode to use explicit torch_dtype_auto_detect, torch_dtype args * docs and tweaks * docs and tweaks * docs and tweaks * merge 2 args, add docs * fix * fix * better doc * better doc Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Rename detr targets to labels (#12280) * Rename target to labels in DetrFeatureExtractor * Update DetrFeatureExtractor tests accordingly * Improve docs of DetrFeatureExtractor * Improve docs * Make style * Add out of vocabulary error to ASR models (#12288) * Add OOV error to ASR models * Feedback changes * Fix TFWav2Vec2 SpecAugment (#12289) * Fix TFWav2Vec2 SpecAugment * Invert masks * Feedback changes * [example/flax] add summarization readme (#12393) * add readme * update readme and add requirements * Update examples/flax/summarization/README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Flax] Example scripts - correct weight decay (#12409) * fix_torch_device_generate_test * remove @ * finish * finish * correct style * fix ids_to_tokens naming error in tokenizer of deberta v2 (#12412) Co-authored-by: Jipeng Huang <jihuan@microsoft.com> * minor fixes in original RAG training (#12395) * Added talks (#12415) * Easily train a new fast tokenizer from a given one (#12361) * [WIP] Easily train a new fast tokenizer from a given one * Fix test * Roll out to other tokenizers and add tests * Fix bug with unk id and add emoji to test * Really use something different in test * Implement special tokens map * Map special tokens in the Transformers tokenizers * Fix test * Make test more robust * Fix test for BPE * More robust map and test Co-authored-by SaulLu * Test file * Stronger tests Co-authored-by: SaulLu <lucilesaul.com@gmail.com> * Map unk token for Wordpiece and address review comment * Fix lowercase test and address review comment * Fix all tests * Simplify test * Fix tests for realsies * Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) (#12420) * Propose change in tests regarding lower case * add new test for special tokens types * put back the test part about decoding * add feature: the AddedToken is re-build with the different mapped content * Address review comment: simplify AddedToken building Co-authored-by: sgugger <sylvain.gugger@gmail.com> * Update src/transformers/tokenization_utils_fast.py Co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: SaulLu <lucilesaul.com@gmail.com> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * [modelcard] fix (#12422) this PR is fixing an incorrect attribute - probably some tests are needed? * Add option to save on each training node (#12421) * Add option to save on each training node * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Address review comments Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Added to talks section (#12433) Added one more confirmed speaker, zoom links and gcal event links * Fix default bool in argparser (#12424) * Fix default bool in argparser * Add more to test * Add default bos_token and eos_token for tokenizer of deberta_v2 (#12429) * fix ids_to_tokens naming error in tokenizer of deberta v2 * Update tokenization_deberta_v2.py Add bos_token and eos_token. * format code Co-authored-by: Jipeng Huang <jihuan@microsoft.com> * Add CANINE (#12024) * First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Add support for hidden_states and attentions of shallow encoders * Define custom CanineModelOutputWithPooling, tests pass * First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Make conversion script work for Canine-c too * Fix tokenizer tests * Remove file Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Document patch release v4.8.2 * fix typo in mt5 configuration docstring (#12432) * Add to talks section (#12442) * [JAX/Flax readme] add philosophy doc (#12419) * add philosophy doc * fix typos * update doc * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * address Patricks suggestions * add a training example and fix typos * jit the training step * jit train step * fix example code * typo * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [Flax] Add wav2vec2 (#12271) * fix_torch_device_generate_test * remove @ * start flax wav2vec2 * save intermediate * forward pass has correct shape * add weight norm * add files * finish ctc * make style * finish gumbel quantizer * correct docstrings * correct some more files * fix vit * finish quality * correct tests * correct docstring * correct tests * start wav2vec2 pretraining script * save intermediate * start pretraining script * finalize pretraining script * finish * finish * small typo * finish * correct * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> * make style * push Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> * Add missing Copied from statements * Reference model uploaded under Google org * Fix various duplicates from merging * Rembert-large -> rembert, fix overeager Copied from, return type * Incorporate PR comments from Patrick and Sylvain Co-authored-by: ctheodoris <seanymphoceana@yahoo.com> Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Teven <teven.lescao@gmail.com> Co-authored-by: Nick Lane-Smith <nlanesmith@gmail.com> Co-authored-by: Shiro T <stsuchi@users.noreply.github.com> Co-authored-by: Wang Ran (汪然) <wrran@outlook.com> Co-authored-by: Ahmet Akkoç <themadprogramer@gmail.com> Co-authored-by: francescorubbo <francescorubbo@users.noreply.github.com> Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com> Co-authored-by: talkhaldi <tareq.alkhaldi@gmail.com> Co-authored-by: joerenner <joepeterrenner@gmail.com> Co-authored-by: jrenner <joseph.renner@inria.fr> Co-authored-by: Avital Oliver <avitalo@google.com> Co-authored-by: Patrick von Platen <patrick@huggingface.co> Co-authored-by: Josh Tanner <mindful.jt@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Bhadresh Savani <bhadreshpsavani@gmail.com> Co-authored-by: Jayendra <jayendra0parmar@gmail.com> Co-authored-by: jayendra <jayendra@infocusp.in> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Philip May <philip@may.la> Co-authored-by: Nicholas Vadivelu <nicholas.vadivelu@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Shamane Siri <shamane@ahlab.org> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: Fan Zhang <zhangfan.tju@gmail.com> Co-authored-by: Riccardo Bassani <48254418+BassaniRiccardo@users.noreply.github.com> Co-authored-by: Volodymyr Byno <volodymyr.byno@gmail.com> Co-authored-by: Jeoung-Minju <51041861+JminJ@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Alberto Villa <a.villa.diez@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com> Co-authored-by: Kou Yong Kang <kou.yongkang@dhs.sg> Co-authored-by: Shiva Pundir <36535845+ceevaaa@users.noreply.github.com> Co-authored-by: François Lagunas <francois.lagunas@gmail.com> Co-authored-by: Peter Izsak <232524+peteriz@users.noreply.github.com> Co-authored-by: Russell Klopfer <russell@klopfer.us> Co-authored-by: Mario Šaško <mariosasko777@gmail.com> Co-authored-by: cdleong <4109253+cdleong@users.noreply.github.com> Co-authored-by: Koichi Yasuoka <yasuoka@kanji.zinbun.kyoto-u.ac.jp> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: kumapo <kumapo@users.noreply.github.com> Co-authored-by: Tobias Norlund <tobias@norlund.se> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Bhavitvya Malik <bhavitvya.malik@gmail.com> Co-authored-by: Jonathan Chang <31893406+cccntu@users.noreply.github.com> Co-authored-by: Guido Novati <16716298+novatig@users.noreply.github.com> Co-authored-by: Guido Novati <gnovati@nvidia.com> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> Co-authored-by: Nicholas Broad <nbroad94@gmail.com> Co-authored-by: Nicholas Broad <nicholas@nmbroad.com> Co-authored-by: Kumar Abhishek <kr.abhish@gmail.com> Co-authored-by: Kumar Abhishek <kabhishek@expedia.com> Co-authored-by: Will Rice <will@spokestack.io> Co-authored-by: Vasudev Gupta <7vasudevgupta@gmail.com> Co-authored-by: Kilian Kluge <32523967+ionicsolutions@users.noreply.github.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> Co-authored-by: Xa9aX ツ <mishradiganta91@gmail.com> Co-authored-by: Vishal Burman <vishal.a.burman23@gmail.com> Co-authored-by: Hamid Shojanazeri <hamid.nazeri2010@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal> Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Kevin Canwen Xu <canwenxu@126.com> Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com> Co-authored-by: chenht2010 <chenht2010@yahoo.com> Co-authored-by: chenhaitao <chenhaitao@qiyi.com> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> Co-authored-by: Michael Benayoun <michael@huggingface.co> Co-authored-by: Sam Havens <47401552+sam-qordoba@users.noreply.github.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Marc van Zee <marcvanzee@gmail.com> Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com> Co-authored-by: jglaser <glaserj@ornl.gov> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: cronoik <johannes.schaffrath@mail.de> Co-authored-by: Taha ValizadehAslani <47432410+TahaAslani@users.noreply.github.com> Co-authored-by: Suzana Ilić <io.suzanai@gmail.com> Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com> Co-authored-by: Will Rice <wrice20@gmail.com> Co-authored-by: Jabin Huang <huangjipengnju@gmail.com> Co-authored-by: Jipeng Huang <jihuan@microsoft.com> Co-authored-by: SaulLu <lucilesaul.com@gmail.com> Co-authored-by: fcakyon <34196005+fcakyon@users.noreply.github.com>	2021-07-24 11:31:42 -04:00
Patrick von Platen	f6e254474c	[Sequence Feature Extraction] Add truncation (#12804 ) * fix_torch_device_generate_test * remove @ * add truncate * finish * correct test * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * clean tests * correct normalization for truncation * remove casting * up * save intermed * finish * finish * correct Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-07-23 17:53:30 +02:00
Stas Bekman	98364ea74f	[tests] fix logging_steps requirements (#12860 )	2021-07-23 08:05:48 -07:00
Nicolas Patry	795c1444e9	Improving pipeline tests (#12784 ) * Proposal * Testing pipelines slightly better. - Overall same design - Metaclass to get proper different tests instead of subTest (not well supported by Pytest) - Added ANY meta object to make output checking more readable. - Skipping architectures either without tiny_config or without architecture. * Small fix. * Fixing the tests in case of None value. * Oups. * Rebased with more architectures. * Fixing reformer tests (no override anymore). * Adding more options for model tester config. Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-07-22 15:19:35 +02:00
Sylvain Gugger	786ced3639	Add versioning system to fast tokenizer files (#12713 ) * Add versioning system to fast tokenizer files * Deal with offline mode * Use staging env in tests * Style * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Style Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-07-21 08:24:36 -04:00
Lysandre Debut	c3d9ac7607	Expose get_config() on ModelTesters (#12812 ) * Expose get_config() on ModelTesters * Typo	2021-07-21 04:13:11 -04:00
Stas Bekman	cabcc75171	[trainer] sanity checks for `save_steps=0\|None` and `logging_steps=0` (#12796 ) * [trainer] fix % 0 * sanity checks * fix logging_strategy * correction * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-07-20 09:05:26 -07:00
Sylvain Gugger	0118ef89ee	Enforce eval and save strategies are compatible when --load_best_model_at_end (#12786 ) * Enforce eval and save strategies are compatible when --load_best_model_at_end * Update doc * Fix typos * Fix tests	2021-07-19 19:50:47 +02:00
Tomohiro Endo	08d609bfb8	Add tokenizers class mismatch detection between `cls` and checkpoint (#12619 ) * Detect mismatch by analyzing config * Fix comment * Fix import * Update src/transformers/tokenization_utils_base.py Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * Revise based on reviews * remove kwargs * Fix exception * Fix handling exception again * Disable mismatch test in PreTrainedTokenizerFast Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>	2021-07-17 15:52:21 +02:00
Patrick von Platen	b4b562d834	[Wav2Vec2] Padded vectors should not allowed to be sampled (#12764 ) * fix_torch_device_generate_test * remove @ * finish * correct script * correct script	2021-07-16 19:07:08 +02:00
SaulLu	6e87010060	Preserve `list` type of `additional_special_tokens` in `special_token_map` (#12759 ) * preserve type of `additional_special_tokens` in `special_token_map` * format * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-07-16 18:26:54 +02:00
Funtowicz Morgan	fbf1397bf8	Turn on eval mode when exporting to ONNX (#12758 ) * Set model in eval mode when exporting to ONNX. * Disable t5 for now. * Disable T5 with past too. * Style.	2021-07-16 15:09:15 +02:00
Patrick von Platen	2e9fb13fb1	[Wav2Vec2] Correctly pad mask indices for PreTraining (#12748 ) * fix_torch_device_generate_test * remove @ * start adding tests * correct wav2vec2 pretraining * up * up Co-authored-by: Patrick von Platen <patrick@huggingface.co>	2021-07-15 21:40:25 +01:00
Lysandre Debut	959d448b3f	Fix led torchscript (#12735 ) * Don't test LED on torchscript * Typo	2021-07-15 11:48:50 -04:00
Lysandre Debut	f03580fb02	Fix DETR integration test (#12734 )	2021-07-15 11:48:37 -04:00
Lysandre Debut	f42d9dcc0e	Patch T5 device test (#12742 )	2021-07-15 16:40:17 +01:00
Lysandre Debut	370be9cc38	Fix MBart failing test (#12737 )	2021-07-15 16:39:35 +01:00
Lysandre Debut	eb2e006b35	Skip test while the model is not available (#12740 )	2021-07-15 09:14:12 -04:00
Lysandre Debut	8c7bd1b97b	Skip test while the model is not available (#12739 )	2021-07-15 09:06:47 -04:00
Lysandre Debut	3290315a2a	Fix AutoModel tests (#12733 )	2021-07-15 09:06:12 -04:00
Lysandre Debut	01cb2f25e3	LXMERT integration test typo (#12736 )	2021-07-15 08:29:49 -04:00
Stas Bekman	a18a17d2b6	[test] split test into 4 sub-tests to avoid timeout (#12710 ) * split the test into 4 sub-tests to avoid timeout * fix decorator order	2021-07-14 13:04:58 -07:00
Sylvain Gugger	084873b025	Only test the files impacted by changes in the diff (#12644 ) * Base test * More test * Fix mistake * Add a docstring change * Add doc ignore * Add changes * Add recursive dep search * Add recursive dep search * save * Finalize test mapping * Fix bug * Print prettier * Ignore comments and empty lines * Make script runnable from anywhere * Need dev install * Like that * Adapt * Add as artifact * Try on torch tests * Fix yaml error * Install GitPython * Apply everywhere * Be more defensive * Revert to all tests if something is wrong * Install GitPython * Test if there are tests before launching. * Fixes * Fixes * Fixes * Fixes * Bash syntax is horrible * Be less stupid * Try differently * Typo * Typo * Typo * Style * Better name * Escape quotes * Ignore black unhelpful re-formatting * Not a docstring * Deal with inits in dependency map * Run all tests once PR is merged. * Add last job * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Stronger dependencies gather * Ignore empty lines too! * Clean up * Fix quality Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-07-14 10:56:55 -04:00
Stas Bekman	5dd0c956a8	non-native optimizers are mostly ok with zero-offload (#12690 )	2021-07-13 20:18:51 -07:00
Stas Bekman	78f5fe1416	[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477 ) * zero_to_fp32 tests * args change * remove unnecessary work * use transformers.trainer_utils.get_last_checkpoint * document the new features * cleanup * wip * fix fsmt * add bert * cleanup * add xlm-roberta * electra works * cleanup * sync * split off the model zoo tests * cleanup * cleanup * cleanup * cleanup * reformat * cleanup * casing * deepspeed>=0.4.3 * adjust distilbert * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-07-13 12:07:32 -07:00
Patrick von Platen	cee2d2135f	[Flax Generation] Correct inconsistencies PyTorch/Flax (#12662 ) * fix_torch_device_generate_test * remove @ * correct greedy search * save intertmed * add final logits bias * correct * up * add more tests * fix another bug * finish tests * finish marian tests * up Co-authored-by: Patrick von Platen <patrick@huggingface.co>	2021-07-13 18:53:30 +01:00
Sylvain Gugger	90178b0cef	Add option to load a pretrained model with mismatched shapes (#12664 ) * Add option to load a pretrained model with mismatched shapes * Fail at loading when mismatched shapes in Flax * Fix tests * Update src/transformers/modeling_flax_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-07-13 10:15:15 -04:00
Lysandre Debut	9da1acaea2	*encode_plus() shouldn't run for W2V2CTC (#12655 ) *encode_plus() shouldn't run for W2V2CTC Typo	2021-07-13 06:31:56 -04:00
Lysandre Debut	a6938c4721	Patch BigBird tokenization test (#12653 )	2021-07-13 02:53:06 -04:00
Lysandre Debut	b189226e8c	Fix transfo xl integration test (#12652 ) * Cleanup test * Skip TF TransfoXL test	2021-07-12 11:51:35 -04:00
Lysandre Debut	fd41e2daf4	Pipeline should be agnostic (#12656 )	2021-07-12 11:42:59 -04:00
Lysandre Debut	fb5665b5ad	The extended trainer tests should require torch (#12650 )	2021-07-12 09:47:05 -04:00
Lysandre Debut	0af8579bbe	Skip TestMarian_MT_EN (#12649 ) * Skip TestMarian_MT_EN * Skip EN_ZH and EN_ROMANCE * Skip EN_ROMANCE pipeline	2021-07-12 09:11:32 -04:00
Will Rice	fb65f65ea6	Add TFHubertModel (#12206 ) * TFHubert * Update with TFWav2Vec Bug Fixes * Add OOV Error * Feedback changes * Fix kwargs call	2021-07-09 18:55:25 +01:00
Alex Hedges	e7f33e8cb3	Pass `model_kwargs` when loading a model in `pipeline()` (#12449 ) * Pass model_kwargs when loading a model in pipeline * Add test for model_kwargs parameter of pipeline() * Rewrite test to not download model * Fix failing style checks	2021-07-09 09:24:55 -04:00
Patrick von Platen	65e27215ba	[Flax] Add flax marian (#12595 ) * fix_torch_device_generate_test * remove @ * add marian * finish make style * add model * add docs * add test * add integration tests * up * solve bug * correct tests * correct some tests * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct adapt marian * finish Co-authored-by: Patrick von Platen <patrick@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-07-09 11:42:13 +01:00
Nicolas Patry	cc12e1dbf6	This will reduce "Already borrowed error": (#12550 ) * This will reduce "Already borrowed error": Original issue https://github.com/huggingface/tokenizers/issues/537 The original issue is caused by transformers calling many times mutable functions on the rust tokenizers. Rust needs to guarantee that only 1 agent has a mutable reference to memory at a given time (for many reasons which don't need explaining here). Usually, the rust compiler can guarantee that this property is true at compile time. Unfortunately, this is impossible for Python to do that, so PyO3, the bridge between rust and python used by `tokenizers`, will change the compile guarantee for a dynamic guarantee, so if multiple agents try to have multiple mutable borrows at the same time, then the runtime will yell with "Already borrowed". The proposed fix here in transformers, is simply to reduce the actual number of calls that really need mutable borrows. By reducing them, we reduce the risk of running into "Already borrowed" error. The caveat is now we add a call to read the current configuration of the `_tokenizer`, so worst case we have 2 calls instead of 1, and best case we simply have 1 + a Python comparison of a dict (should be negligible). * Adding a test. * trivial error :(. * Update tests/test_tokenization_fast.py Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * Adding reference to original issues in the tests. * Update the tests with fast tokenizer. Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>	2021-07-09 09:36:05 +02:00
Nicolas Patry	4da568c152	Fixing the pipeline optimization by reindexing targets (V2) (#12330 ) * Fixing the pipeline optimization by rescaling the logits first. * Add test for target equivalence Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-07-08 16:58:15 +02:00
Funtowicz Morgan	2aa3cd935d	[RFC] Laying down building stone for more flexible ONNX export capabilities (#11786 ) * Laying down building stone for more flexible ONNX export capabilities * Ability to provide a map of config key to override before exporting. * Makes it possible to export BART with/without past keys. * Supports simple mathematical syntax for OnnxVariable.repeated * Effectively apply value override from onnx config for model * Supports export with additional features such as with-past for seq2seq * Store the output path directly in the args for uniform usage across. * Make BART_ONNX_CONFIG_* constants and fix imports. * Support BERT model. * Use tokenizer for more flexibility in defining the inputs of a model. * Add TODO as remainder to provide the batch/sequence_length as CLI args * Enable optimizations to be done on the model. * Enable GPT2 + past * Improve model validation with outputs containing nested structures * Enable Roberta * Enable Albert * Albert requires opset >= 12 * BERT-like models requires opset >= 12 * Remove double printing. * Enable XLM-Roberta * Enable DistilBERT * Disable optimization by default * Fix missing setattr when applying optimizer_features * Add value field to OnnxVariable to define constant input (not from tokenizers) * Add T5 support. * Simplify model type retrieval * Example exporting token_classification pipeline for DistilBERT. * Refactoring to package `transformers.onnx` * Solve circular dependency & __main__ * Remove unnecessary imports in `__init__` * Licences * Use @Narsil's suggestion to forward the model's configuration to the ONNXConfig to avoid interpolation. * Onnx export v2 fixes (#12388) * Tiny fixes Remove `convert_pytorch` from onnxruntime-less runtimes Correct reference to model * Style * Fix Copied from * LongFormer ONNX config. * Removed optimizations * Remvoe bad merge relicas. * Remove unused constants. * Remove some deleted constants from imports. * Fix unittest to remove usage of PyTorch model for onnx.utils. * Fix distilbert export * Enable ONNX export test for supported model. * Style. * Fix lint. * Enable all supported default models. * GPT2 only has one output * Fix bad property name when overriding config. * Added unittests and docstrings. * Disable with_past tests for now. * Enable outputs validation for default export. * Remove graph opt lvls. * Last commit with on-going past commented. * Style. * Disabled `with_past` for now * Remove unused imports. * Remove framework argument * Remove TFPreTrainedModel reference * Add documentation * Add onnxruntime tests to CircleCI * Add test * Rename `convert_pytorch` to `export` * Use OrderedDict for dummy inputs * WIP Wav2Vec2 * Revert "WIP Wav2Vec2" This reverts commit f665efb04c92525c3530e589029f0ae7afdf603e. * Style * Use OrderedDict for I/O * Style. * Specify OrderedDict documentation. * Style :) Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-07-08 10:54:42 -04:00
Nicolas Patry	ebc69afc30	Adding support for `pipeline("automatic-speech-recognition")`. (#11525 ) * Adding support for `pipeline("automatic-speech-recognition")`. - Ugly `"config"` choice for AutoModel. It would be great to have the possibility to have something like `AutoModelFor` that would implement the same logic (Load the config, check Architectures and load the first one) * Remove `model_id` was not needed in the end. * Rebased ! * Remove old code. * Rename `nlp`.	2021-07-07 16:06:48 +02:00
Daniel Stancl	61400e1ec7	[Flax] Add FlaxMBart (#12236 ) * Copy BART to MBart and rename some stuff * Add copy statements pointing to FlaxBart * Update/add some common files * Update shift_tokens_rigth + fix imports * Fix shift_tokens_right method according to MBart implementation * Update shift_tokens_right in tests accordingly * Fix the import issue and update docs file * make style quality * Do some minor changes according to patil-suraj suggestions * Change the order of normalization layer and attention * Add some copu statementes * Update generate method and add integration test for mBart * Make a few updates after a review Besides, add `lang_code_to_id` to MBartTokenizeFast * fix-copies; make style quality * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * fix output type, style * add copied from * resolve conflicts Co-authored-by: Suraj Patil <surajp815@gmail.com>	2021-07-07 12:20:38 +05:30
sadakmed	3fd85777ea	implementing tflxmertmodel integration test (#12497 ) * implementing tflxmertmodel integration test * move import * revert and fix	2021-07-06 11:44:47 -04:00
Suraj Patil	7a259c190c	FlaxGPTNeo (#12493 ) * flax gpt neo * fix query scaling * update generation test * use flax model for test	2021-07-06 18:55:18 +05:30
yujun	626a0a0147	[RoFormer] Fix some issues (#12397 ) * add RoFormerTokenizerFast into AutoTokenizer * fix typo in roformer docs * make onnx export happy * update RoFormerConfig embedding_size * use jieba not rjieba * fix 12244 and make test_alignement passed * update ARCHIVE_MAP * make style & quality & fixup * update * make style & quality & fixup * make style quality fixup * update * suggestion from LysandreJik Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * make style * use rjieba Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-07-06 03:31:57 -04:00
sadakmed	0e1718afb6	create LxmertModelIntegrationTest Pytorch (#9989 ) * create LxmertModelIntegrationTest * implementation using numpy seeding to fix inputs params. * fix code quality * isort check	2021-07-05 05:21:25 -04:00
Lysandre Debut	b889d3f6c4	Fix TAPAS test uncovered by #12446 (#12480 )	2021-07-02 04:35:10 -04:00
Stas Bekman	2d1d92181a	[roberta] fix lm_head.decoder.weight ignore_key handling (#12446 ) * fix lm_head.decoder.weight ignore_key handling * fix the mutable class variable * Update src/transformers/models/roberta/modeling_roberta.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * replicate the comment * make deterministic Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-07-01 10:31:19 -07:00
Patrick von Platen	27d348f2fe	[Wav2Vec2, Hubert] Fix ctc loss test (#12458 ) * fix_torch_device_generate_test * remove @ * fix test	2021-07-01 08:59:32 -04:00
SaulLu	3aa37b945e	Add test for a WordLevel tokenizer model (#12437 ) * add a test for a WordLevel tokenizer * adapt common test to new tokenizer	2021-07-01 12:37:07 +02:00
Patrick von Platen	0d1f67e651	[Flax] Add wav2vec2 (#12271 ) * fix_torch_device_generate_test * remove @ * start flax wav2vec2 * save intermediate * forward pass has correct shape * add weight norm * add files * finish ctc * make style * finish gumbel quantizer * correct docstrings * correct some more files * fix vit * finish quality * correct tests * correct docstring * correct tests * start wav2vec2 pretraining script * save intermediate * start pretraining script * finalize pretraining script * finish * finish * small typo * finish * correct * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> * make style * push Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2021-06-30 18:44:23 +01:00
NielsRogge	6e68597877	Add CANINE (#12024 ) * First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Add support for hidden_states and attentions of shallow encoders * Define custom CanineModelOutputWithPooling, tests pass * First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Make conversion script work for Canine-c too * Fix tokenizer tests * Remove file Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-30 08:05:44 -04:00
Sylvain Gugger	c9486fd0f5	Fix default bool in argparser (#12424 ) * Fix default bool in argparser * Add more to test	2021-06-30 07:57:05 -04:00
Sylvain Gugger	dc42e770b8	Easily train a new fast tokenizer from a given one (#12361 ) * [WIP] Easily train a new fast tokenizer from a given one * Fix test * Roll out to other tokenizers and add tests * Fix bug with unk id and add emoji to test * Really use something different in test * Implement special tokens map * Map special tokens in the Transformers tokenizers * Fix test * Make test more robust * Fix test for BPE * More robust map and test Co-authored-by SaulLu * Test file * Stronger tests Co-authored-by: SaulLu <lucilesaul.com@gmail.com> * Map unk token for Wordpiece and address review comment * Fix lowercase test and address review comment * Fix all tests * Simplify test * Fix tests for realsies * Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) (#12420) * Propose change in tests regarding lower case * add new test for special tokens types * put back the test part about decoding * add feature: the AddedToken is re-build with the different mapped content * Address review comment: simplify AddedToken building Co-authored-by: sgugger <sylvain.gugger@gmail.com> * Update src/transformers/tokenization_utils_fast.py Co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: SaulLu <lucilesaul.com@gmail.com> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>	2021-06-29 15:00:08 -04:00
Will Rice	bc084938f2	Add out of vocabulary error to ASR models (#12288 ) * Add OOV error to ASR models * Feedback changes	2021-06-29 08:57:46 +01:00
NielsRogge	1fc6817a30	Rename detr targets to labels (#12280 ) * Rename target to labels in DetrFeatureExtractor * Update DetrFeatureExtractor tests accordingly * Improve docs of DetrFeatureExtractor * Improve docs * Make style	2021-06-29 03:07:46 -04:00
Stas Bekman	7682e97702	[models] respect dtype of the model when instantiating it (#12316 ) * [models] respect dtype of the model when instantiating it * cleanup * cleanup * rework to handle non-float dtype * fix * switch to fp32 tiny model * improve * use dtype.is_floating_point * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix the doc * recode to use explicit torch_dtype_auto_detect, torch_dtype args * docs and tweaks * docs and tweaks * docs and tweaks * merge 2 args, add docs * fix * fix * better doc * better doc Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-28 20:11:21 -07:00
Bhadresh Savani	04dbea31a9	[Examples] Added context manager to datasets map (#12367 ) * added cotext manager to datasets map * fixed style and spaces * fixed warning of deprecation * changed desc	2021-06-28 09:14:00 -07:00
Stas Bekman	4a872caef4	remove extra white space from log format (#12360 )	2021-06-25 13:20:14 -07:00
Lysandre Debut	8ef62ec9e1	Fix torchscript tests (#12336 ) * Fix torchscript tests * Better test * Remove bogus print	2021-06-24 09:52:28 -04:00
Michael Benayoun	986ac03e37	changed modeling_fx_utils.py to utils/fx.py for clarity (#12326 ) Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-06-23 18:16:24 +02:00
Lysandre	941b4442ba	Temporarily revert the `fill-mask` improvements.	2021-06-23 17:46:24 +02:00
Sylvain Gugger	53c60babe4	Clean push to hub API (#12187 ) * Clean push to hub API * Create working dir if it does not exist * Different tweak * New API + all models + test Flax * Adds the Trainer clean up * Update src/transformers/file_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * (nit) output types * No need to set clone_from when folder exists * Update src/transformers/trainer.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Add generated_from_trainer tag * Update to new version * Fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-23 10:11:19 -04:00
Vasudev Gupta	e98233dde1	Flax T5 (#12150 ) * copy pytorch-t5 * init * boom boom * forward pass same * make generation work * add more tests * make test work * finish normal tests * make fix-copies * finish quality * correct slow example * correct slow test * version table * upload models * Update tests/test_modeling_flax_t5.py * correct incorrectly deleted line Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick@huggingface.co>	2021-06-23 13:13:32 +01:00
Daniel Stancl	26a2e36595	Add output in a dictionary for TF `generate` method (#12139 ) * Add output args to greedy search * Fix critical typo + make style quality * Handle generate_beam_search * Add dict_specific tests and fix the placement of encoder outputs * Add specific outputs * Update doc * Fix typo * Adjust handling encoder_outputs + Fix generating for T5 * Fix generate for RAG * Fix handling ouptut_attentions when target_mapping is not None Take care of situations when target_mapping is provided as there are 2-tuple of attentions Change from: if inputs["output_attentions"]: attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions) to: if inputs["output_attentions"]: if inputs["target_mapping"] is not None: # when target_mapping is provided, there are 2-tuple of attentions attentions = tuple( tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions ) else: attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions) * Rename kwargs to model_kwargs * make style quality * Move imports in test_modeling_tf_common.py Move ModelOutput-related imports in test_modeling_tf_common.py into the `is_tf_available():` statement. * Rewrite nested if-statements * Fix added tests	2021-06-23 10:52:11 +01:00
Nicolas Patry	d4be498441	Optimizing away the `fill-mask` pipeline. (#12113 ) * Optimizing away the `fill-mask` pipeline. - Don't send anything to the tokenizer unless needed. Vocab check is much faster - Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again - Make `targets` and `top_k` work together better `top_k` cannot be higher than `len(targets)` but can be smaller still. - Actually simplify the `target_ids` in case of duplicate (it can happen because we're parsing raw strings) - Removed useless code to fail on empty strings. It works only if empty string is in first position, moved to ignoring them instead. - Changed the related tests as only the tests would fail correctly (having incorrect value in first position) * Make tests compatible for 2 different vocabs... (at the price of a warning). Co-authored-by: @EtaoinWu * ValueError working globally * Update src/transformers/pipelines/fill_mask.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity + fallback. Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-06-23 10:38:04 +02:00
Stas Bekman	ebe5413589	[trainer] 2 bug fixes and a rename (#12309 ) * bug fixes and a rename * add extended DDP test	2021-06-22 11:13:23 -07:00
Stas Bekman	0d97ba8a98	[tests] multiple improvements (#12294 ) * [tests] multiple improvements * cleanup * style * todo to investigate * fix	2021-06-21 19:51:36 -07:00
Stas Bekman	dad414d5f9	[trainer + examples] set log level from CLI (#12276 ) * set log level from CLI * add log_level_replica + test + extended docs * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename datasets objects to allow datasets module * improve the doc * style * doc improve Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-21 19:30:50 -07:00
Stas Bekman	a4ed074d4b	reset report_to to none, avoid deprecation warning (#12293 )	2021-06-21 16:50:12 -07:00
Patrick von Platen	4e9a6796c7	[Flax] Fix flax test save pretrained (#12256 ) * fix_torch_device_generate_test * remove @ * fix flax save pretrained test	2021-06-21 16:37:13 +01:00
Suraj Patil	eb881674f2	[Flax] [WIP] allow loading head model with base model weights (#12255 ) * boom boom * remove flax clip example * allow loading head model with base model weights * add test * fix imports * disable save, load test for clip * add test_save_load_to_base	2021-06-21 15:56:42 +01:00
Suraj Patil	8d5b7f36e5	[FlaxClip] fix test from/save pretrained test (#12284 ) * boom boom * remove flax clip example * fix from_save_pretrained	2021-06-21 15:54:34 +01:00
Sylvain Gugger	adb70eda4d	AutoTokenizer: infer the class from the tokenizer config if possible (#12208 ) * AutoTokenizer: infer the class from the tokenizer config if possible * Add tests * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-06-17 12:39:22 -04:00
Lysandre Debut	b56848c8c8	Pipeline update & tests (#12207 )	2021-06-17 09:41:16 +02:00
Patrick von Platen	ccca510276	Hubert (#11889 ) * fix_torch_device_generate_test * remove @ * add hubert * add first test file * more docs * fix bugs * fix bug * finish * finish * finish docstring * fix * fix * finalize * add to ignored * finish * Apply suggestions from code review * correct naming * finish * fix auto config * finish * correct convert script * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com> * apply suggestions lysandre & suraj Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2021-06-16 12:14:12 +01:00

1 2 3 4 5 ...

1226 Commits