mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-06 06:10:04 +06:00
d12bbe4942
842 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
d12bbe4942 | Release: v4.10.0 | ||
![]() |
854260ca44
|
TF/Numpy variants for all DataCollator classes (#13105)
* Adding a TF variant of the DataCollatorForTokenClassification to get feedback * Added a Numpy variant and a post_init check to fail early if a missing import is found * Fixed call to Numpy variant * Added a couple more of the collators * Update src/transformers/data/data_collator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fixes, style pass, finished DataCollatorForSeqToSeq * Added all the LanguageModeling DataCollators, except SOP and PermutationLanguageModeling * Adding DataCollatorForPermutationLanguageModeling * Style pass * Add missing `__call__` for PLM * Remove `post_init` checks for frameworks because the imports inside them were making us fail code quality checks * Remove unused imports * First attempt at some TF tests * A second attempt to make any of those tests actually work * TF tests, round three * TF tests, round four * TF tests, round five * TF tests, all enabled! * Style pass * Merging tests into `test_data_collator.py` * Merging tests into `test_data_collator.py` * Fixing up test imports * Fixing up test imports * Trying shuffling the conditionals around * Commenting out non-functional old tests * Completed all tests for all three frameworks * Style pass * Fixed test typo * Style pass * Move standard `__call__` method to mixin * Rearranged imports for `test_data_collator` * Fix data collator typo "torch" -> "pt" * Fixed the most embarrassingly obvious bug * Update src/transformers/data/data_collator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Renaming mixin * Updating docs Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Dalton Walker <dalton_walker@icloud.com> Co-authored-by: Andrew Romans <andrew.romans@hotmail.com> |
||
![]() |
180c6de6a6
|
docs: fix minor typo (#13289)
`at` should be `a1` |
||
![]() |
066fd047cc
|
correct TP implementation resources (#13248)
fix a few implementation links |
||
![]() |
3efcfeab67
|
Deberta_v2 tf (#13120)
* Deberta_v2 tf * added new line at the end of file, make style * +V2, typo * remove never executed branch of code * rm cmnt and fixed typo in url filter * cleanup according to review comments * added #Copied from |
||
![]() |
286ccefb48
|
doc mismatch fixed (#13345) | ||
![]() |
41c559415a
|
Add GPT2ForTokenClassification (#13290)
* Add GPT2ForTokenClassification * Fix dropout exception for GPT2 NER * Remove sequence label in test * Change TokenClassifierOutput to TokenClassifierOutputWithPast * Fix for black formatter * Remove dummy * Update docs for GPT2ForTokenClassification * Fix check_inits ci fail * Update dummy_pt_objects after make fix-copies * Remove TokenClassifierOutputWithPast * Fix tuple input issue Co-authored-by: danielsejong55@gmail.com <danielsejong55@gmail.com> |
||
![]() |
11fbc32e3e
|
Fixing a typo in the data_collator documentation (#13309) | ||
![]() |
4ebe798ff2
|
Fix release utils (#13337)
* Fix release utils * Update docs/source/conf.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> |
||
![]() |
98e409abb3
|
albert flax (#13294)
* albert flax * year -> 2021 * docstring updated for flax * removed head_mask * removed from_pt * removed passing attention_mask to embedding layer |
||
![]() |
774760e6f3
|
distilbert-flax (#13324)
* distilbert-flax * added missing self * docs fix * removed tied kernal extra init * updated docs * x -> hidden states * removed head_mask * removed from_pt, +FLAX * updated year |
||
![]() |
01977466f4
|
fix: typo spelling grammar (#13212)
* fix: typo spelling grammar * fix: make fixup |
||
![]() |
b6ddb08a66
|
Add LayoutLMv2 + LayoutXLM (#12604)
* First commit * Make style * Fix dummy objects * Add Detectron2 config * Add LayoutLMv2 pooler * More improvements, add documentation * More improvements * Add model tests * Add clarification regarding image input * Improve integration test * Fix bug * Fix another bug * Fix another bug * Fix another bug * More improvements * Make more tests pass * Make more tests pass * Improve integration test * Remove gradient checkpointing and add head masking * Add integration test * Add LayoutLMv2ForSequenceClassification to the tests * Add LayoutLMv2ForQuestionAnswering * More improvements * More improvements * Small improvements * Fix _LazyModule * Fix fast tokenizer * Move sync_batch_norm to a separate method * Replace dummies by requires_backends * Move calculation of visual bounding boxes to separate method + update README * Add models to main init * First draft * More improvements * More improvements * More improvements * More improvements * More improvements * Remove is_split_into_words * More improvements * Simply tesseract - no use of pandas anymore * Add LayoutLMv2Processor * Update is_pytesseract_available * Fix bugs * Improve feature extractor * Fix bug * Add print statement * Add truncation of bounding boxes * Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer * Improve tokenizer tests * Make more tokenizer tests pass * Make more tests pass, add integration tests * Finish integration tests * More improvements * More improvements - update API of the tokenizer * More improvements * Remove support for VQA training * Remove some files * Improve feature extractor * Improve documentation and one more tokenizer test * Make quality and small docs improvements * Add batched tests for LayoutLMv2Processor, remove fast tokenizer * Add truncation of labels * Apply suggestions from code review * Improve processor tests * Fix failing tests and add suggestion from code review * Fix tokenizer test * Add detectron2 CI job * Simplify CI job * Comment out non-detectron2 jobs and specify number of processes * Add pip install torchvision * Add durations to see which tests are slow * Fix tokenizer test and make model tests smaller * Frist draft * Use setattr * Possible fix * Proposal with configuration * First draft of fast tokenizer * More improvements * Enable fast tokenizer tests * Make more tests pass * Make more tests pass * More improvements * Addd padding to fast tokenizer * Mkae more tests pass * Make more tests pass * Make all tests pass for fast tokenizer * Make fast tokenizer support overflowing boxes and labels * Add support for overflowing_labels to slow tokenizer * Add support for fast tokenizer to the processor * Update processor tests for both slow and fast tokenizers * Add head models to model mappings * Make style & quality * Remove Detectron2 config file * Add configurable option to label all subwords * Fix test * Skip visual segment embeddings in test * Use ResNet-18 backbone in tests instead of ResNet-101 * Proposal * Re-enable all jobs on CI * Fix installation of tesseract * Fix failing test * Fix index table * Add LayoutXLM doc page, first draft of code examples * Improve documentation a lot * Update expected boxes for Tesseract 4.0.0 beta * Use offsets to create labels instead of checking if they start with ## * Update expected boxes for Tesseract 4.1.1 * Fix conflict * Make variable names cleaner, add docstring, add link to notebooks * Revert "Fix conflict" This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5. * Revert to make integration test pass * Apply suggestions from @LysandreJik's review * Address @patrickvonplaten's comments * Remove fixtures DocVQA in favor of dataset on the hub Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> |
||
![]() |
b6f332ecaf
|
Add Wav2Vec2 & Hubert ForSequenceClassification (#13153)
* Add hubert classifier + tests * Add hubert classifier + tests * Dummies for all classification tests * Wav2Vec2 classifier + ER test * Fix hubert integration tests * Add hubert IC * Pass tests for all classification tasks on Hubert * Pass all tests + copies * Move models to the SUPERB org |
||
![]() |
0759f2510c
|
Add DINO conversion script (#13265)
* First commit * Add interpolation of patch embeddings * Comment out code * Fix bug * Fix another bug * Fix bug * Fix another bug * Remove print statements * Update conversion script * Use the official vit implementation * Add support for converting dino_vits8 * Add DINO to docs of ViT * Remove assertion * Add interpolation of position encodings * Fix bug * Add align_corners * Add interpolate_pos_encoding option to forward pass of ViTModel * Improve interpolate_pos_encoding method * Add docstring |
||
![]() |
cf57447648
|
Fix broken links in Splinter documentation (#13237) | ||
![]() |
2e20c0f34a
|
Make Flax GPT2 working with cross attention (#13008)
* make flax gpt2 working with cross attention * Remove encoder->decoder projection layer * A draft (incomplete) for FlaxEncoderDecoderModel * Add the method from_encoder_decoder_pretrained + the docstrings * Fix the mistakes of using EncoderDecoderModel * Fix style * Add FlaxEncoderDecoderModel to the library * Fix cyclic imports * Add FlaxEncoderDecoderModel to modeling_flax_auto.py * Remove question comments * add tests for FlaxEncoderDecoderModel * add flax_encoder_decoder to the lists of ignored entries in check_repo.py * fix missing required positional arguments * Remove **kwargs when creating FlaxEncoderDecoderModel in from_encoder_decoder_pretrained() Also fix generation eos/pad tokens issue * Fix: Use sequences from the generated_output * Change a check from assert to raise ValueError * Fix examples and token ids issues * Fix missing all_cross_attentions when outputting tuple in modeling_gpt2 * Remove the changes in configuration docstrings. * allow for bert 2 gpt2 * make fix-copies * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Change remaining examples to bert2gpt2 * Change the test to Bert2GPT2 * Fix examples * Fix import * Fix unpack bug * Rename to FlaxEncoderDecoderModelTest and change the test to bert2gpt2 * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Fix: NotImplentedError -> NotImplementedError * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * up * finalize Co-authored-by: ydshieh <ydshieh@user.noreply> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> |
||
![]() |
439a43b6b4
|
Add splinter (#12955)
* splinter template * initialize splinter classes * Splinter Tokenizer * splinter.rst * tokenization fixes * Documentation & some minor variable name changes * bug fix (added back question_token_id to config) + variable names * Minor bug fixes + variable name changes * Fix Splinter references after merge with new transformers * changes after running make style & quality * Fix documentation unindent * Fix doc indentation in tokenization_splinter * Fix also SplinterTokenizerFast * Add Splinter to index.rst and README * Fixdouble whitespace from index.rst * Fixed index.rst with 'make fix-copies' * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/splinter/__init__.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Added "copied from BERT" comments * Removing unnexessary code from modeling_splinter * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Remove references to TF modeling from splinter * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add differences between Splinter and Bert tokenizers * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter_fast.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Doc formatting * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * bug fix: remove load_tf_weights attribute * Some minor quality changes * Update docs/source/model_doc/splinter.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Change FullyConnectedLayer to SplinterFullyConnectedLayer * Variable naming * Reove gather_positions function * Remove ClassificationHead as it's outdated * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Remove hardcoded 102 token id * Minor style change * Added "tau" organization to all model identifiers & URLS * Added tau to the tests as well * Copy-from comments * Removed all unnecessary classes (e.g. SplinterForMaskedLM) * Running make fix-copies * Bug fix: Further removed unnecessary classes * Add Splinter to AutoTokenization * Add an integration test for Splinter * Removed initialize_new_qass from config - It will be done through different checkpoints * Removed `initialize_new_qass` from documentation as well * Added new checkpoint names (`tau/splinter-base-qass` and same for large) in the code * Minor change to test * SplinterTokenizer now doesn't abstract from BertTokenizer * SplinterTokenizerFast also dosn't abstract from Bert * style and quality * bug fix: import ing torch in tests only if it's available * Auto mappings * Changed copyrights in Splinter's files * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: yuvalkirstain <kirstain.yuval@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> |
||
![]() |
c066598c23
|
Fix frameworks table so it's alphabetical (#13118)
* Fix frameworks table so it's alphabetical * Update index.rst * Don't differentiate when sorting between upper and lower case |
||
![]() |
bda1cb0236
|
Fix VisualBERT docs (#13106)
* Fix VisualBERT docs * Show example notebooks as lists * Fix style |
||
![]() |
9a498c37a2
|
Rely on huggingface_hub for common tools (#13100)
* Remove hf_api module and use hugginface_hub * Style * Fix to test_fetcher * Quality |
||
![]() |
f176fbf588 | Fix doc building error | ||
![]() |
d329b63369
|
Deberta tf (#12972)
* TFDeberta moved weights to build and fixed name scope added missing , bug fixes to enable graph mode execution updated setup.py fixing typo fix imports embedding mask fix added layer names avoid autmatic incremental names +XSoftmax cleanup added names to layer disable keras_serializable Distangled attention output shape hidden_size==None using symbolic inputs test for Deberta tf make style Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/models/deberta/modeling_tf_deberta.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> removed tensorflow-probability removed blank line * removed tf experimental api +torch_gather tf implementation from @Rocketknight1 * layername DeBERTa --> deberta * copyright fix * added docs for TFDeberta & make style * layer_name change to fix load from pt model * layer_name change as pt model * SequenceClassification layername change, to same as pt model * switched to keras built-in LayerNormalization * added `TFDeberta` prefix most layer classes * updated to tf.Tensor in the docstring |
||
![]() |
53b38d6269
|
Doctests job (#13088)
* Doctests * Limit to 4 decimals * Try with separate PT/TF tests * Remove test for TF * Ellips the predictions * Doctest continue on failure Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> |
||
![]() |
83424ade1a
|
[Doctest] Setup, quicktour and task_summary (#13078)
* Fix doctests for quicktour * Adapt causal LM exemple * Remove space * Fix until summarization * End of task summary * Style * With last changes in quicktour |
||
![]() |
3157fa3c53
|
docs: add HuggingArtists to community notebooks (#13050)
* Adding HuggingArtists to Community Notebooks * Adding HuggingArtists to Community Notebooks * Adding HuggingArtists to Community Notebooks * docs: add HuggingArtists to community notebooks Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
76cadb7943
|
replace tgt_lang by tgt_text (#13061) | ||
![]() |
a8bf2fa76e | Documentation for patch v4.9.2 | ||
![]() |
5008e08885
|
Add to ONNX docs (#13048)
* Add to ONNX docs * Add MBART example * Update docs/source/serialization.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
8aa01d2a6d
|
Create perplexity.rst (#13004)
Updating the import for load_dataset |
||
![]() |
83e5a10603
|
Add BEiT (#12994)
* First pass * Make conversion script work * Improve conversion script * Fix bug, conversion script working * Improve conversion script, implement BEiTFeatureExtractor * Make conversion script work based on URL * Improve conversion script * Add tests, add documentation * Fix bug in conversion script * Fix another bug * Add support for converting masked image modeling model * Add support for converting masked image modeling * Fix bug * Add print statement for debugging * Fix another bug * Make conversion script finally work for masked image modeling models * Move id2label for datasets to JSON files on the hub * Make sure id's are read in as integers * Add integration tests * Make style & quality * Fix test, add BEiT to README * Apply suggestions from @sgugger's review * Apply suggestions from code review * Make quality * Replace nielsr by microsoft in tests, add docs * Rename BEiT to Beit * Minor fix * Fix docs of BeitForMaskedImageModeling Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
fbf468b057
|
[Flax] Correct flax docs (#12782)
* fix_torch_device_generate_test * remove @ * fix flax docs * correct more docs in flax * another correction * fix flax docs * Apply suggestions from code review |
||
![]() |
a317e6c3be
|
[Flax] Correctly Add MT5 (#12988)
* finish PR * finish mt5 * push * up * Update tests/test_modeling_flax_mt5.py Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> |
||
![]() |
8ff619d95e
|
Add multilingual documentation support (#12952)
* Add multilingual documentation support * Add multilingual documentation support * make style * make style * revert |
||
![]() |
a492aec82d | Update doc | ||
![]() |
434022adac
|
Add RemBERT model code to huggingface (#10692)
* Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825)
get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).
Resolves: #11795
Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
* Replace double occurrences as the last step (#11367)
* [Flax] Fix PyTorch import error (#11839)
* fix_torch_device_generate_test
* remove @
* change pytorch import to flax import
* Fix reference to XLNet (#11846)
* Switch mem metrics flag (#11851)
* Switch mem metrics flag
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix flos single node (#11844)
* fixing flos bug/typo in non-distributed setting
* storing flos every logging_interval
* Fix two typos in docs (#11852)
* typo2
* fix typo
* [Trainer] Report both steps and num samples per second (#11818)
* [Trainer] Report both steps and num samples per second
* Fix batch number
* Update src/transformers/trainer_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Add some tests to the slow suite #11860
* Enable memory metrics in tests that need it (#11859)
* fixed a small typo in the doc (#11856)
* typo (#11858)
* Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training
* Use an alternate property
* [Wav2Vec2] SpecAugment Fast (#11764)
* first try
* finish
* [lm examples] fix overflow in perplexity calc (#11855)
* fix overflow in perplexity calc
* use inf
* fix
* [Examples] create model with custom config on the fly (#11798)
* create custom model on the flight
* better wording
* add update_from_string
* cleanup
* cleanup
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* more bool options
* style
* fix logger
* add test
* add the doc
* assert on conflict of options
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Wav2Vec2ForCTC] example typo fixed (#11878)
* Ensure input tensor are on device. (#11874)
The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.
* Fix usage of head masks by TF encoder-decoder models' `generate()` function (#11775)
* Fix Bart
* Fix Blenderbot{,_small}
* Fix LED
* Fix Marian
* Fix MBart
* Fix Pegasus
* Fix T5
* Add test for generation with head_mask
* Add a common TF test
* Override a test for the LED model as head masking is not yet properly implemented
* Remove all head_masks from input preparation for LED
* Drop masking for T5 as it needs a bit of refactor
* Correcting comments in T5Stack to reflect correct tuple order (#11330)
* Correcting comments to reflect correct tuple order
In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.
* Update modeling_t5.py
Updating another comment as well
* Removing extra space
* Fixing style and quality
* style & quality
* Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Allow dataclasses to be jitted (#11886)
* fix_torch_device_generate_test
* remove @
* change dataclasses to flax ones
* fix typo
* fix jitted tests
* fix bert & electra
* changing find_batch_size to work with tokenizer outputs (#11890)
* changing find_batch_size to work with tokenizer outputs
trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.
* Trigger CI
Co-authored-by: jrenner <joseph.renner@inria.fr>
* Link official Cloud TPU JAX docs (#11892)
* Flax Generate (#11777)
* fix_torch_device_generate_test
* remove @
* add
* indexing
* correct a couple of tests
* fix tests
* add logits processor
* finish top_k, top_p, temp
* add docs
* correct flax prng key default
* improve generate
* add generation docs
* add docs
* make style
* revert model outputs change
* make style
* correct typo
* fix tests
* fix slow test
* add raise
* finish generation
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Add Emotion Speech Noteboook (#11900)
* Update deepspeed config to reflect hyperparameter search parameters (#11896)
* rebuild deepspeed config for hyperparameter search
* reformat code to fix style issues
* Adding new argument `max_new_tokens` for generate. (#11476)
* Adding new argument `max_new_tokens` for generate.
This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.
* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.
* Removed redundant `no_grad`.
* Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes
* fix tests
* [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
* Added logic to return attention from flax-bert model and added test cases to check that
* Added new line at the end of file to test_modeling_flax_common.py
* fixing code style
* Fixing Roberta and Elextra models too from cpoying bert
* Added temporary hack to not run test_attention_outputs for FlaxGPT2
* Returning attention weights from GPT2 and changed the tests accordingly.
* last fixes
* bump flax dependency
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Test optuna and ray (#11924)
* Remove `datasets` submodule
* fix assert (#11935)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`
`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.
* Remove unused 'flax.linen' import
* Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)
* Add MT5ForConditionalGeneration as supported arch.
* Update README.md
* Add FlaxCLIP (#11883)
* add flax CLIP
* default input_shape
* add tests
* fix test
* fix name
* fix docs
* fix shapes
* attend at least 1 token
* flax conv to torch conv
* return floats
* fix equivalence tests
* fix import
* return attention_weights and update tests
* fix dosctrings
* address patricks comments
* input_shape arg
* add tests for get_image_features and get_text_features methods
* fix tests
* RAG-2nd2end-revamp (#11893)
* initial
* code quality test
* code quality
* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver
* minor change in test_modeling_rag
* fixed tests
* Update examples/research_projects/rag-end2end-retriever/README.md
typo corrected as suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py
type change suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update src/transformers/models/rag/retrieval_rag.py
Adding this change as mentioned by lhoestq.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* completed the minor changes suggested by the reviewers
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* modify qa-trainer (#11872)
* modify qa-trainer
* fix flax model
* bugfixes training_args.py (#11922)
modified according to:
https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html
* reinitialize wandb config for each hyperparameter search run (#11945)
* Add regression tests for slow sentencepiece tokenizers. (#11737)
* add test_vocab_size for sentencepiece tok.
* add test_get_vocab for sentencepiece tok.
* add test_convert_token_and_id for sentencepiece tok.
* add test_tokenize_and_convert_tokens_to_string for all tok.
* improve test_tokenize_and_convert_tokens_to_string for sp. tok.
* add common tokenizer integration tests
- for albert
- for barthez
* add tokenizer integration tests to bert gen.
* add most tokenizer integration tests
* fix camembert tokenizer integration test
* add tokenizer integration test to marian
* add tokenizer integration test to reformer
* add typing and doc to tokenizer_integration_test_util
* fix tokenizer integration test of reformer
* improve test_sentencepiece_tokenize_and_convert_tokens_to_string
* empty commit to trigger CI
* fix tokenizer integration test of reformer
* remove code not needed anymore
* empty commit to trigger CI
* empty commit to trigger CI
* Authorize args when instantiating an AutoModel (#11956)
* Neptune.ai integration (#11937)
An option that turns on neptune.ai logging
--report_to 'neptune'
Additional ENV variables:
NEPTUNE_PROJECT
NEPTUNE_API_TOKEN
NEPTUNE_RUN_NAME (optional)
NEPTUNE_STOP_TIMEOUT (optional)
* Run the integration tests on schedule tests instead of master tests
* [deepspeed] docs (#11940)
* deepspeed docs
* cleanup
* cleanup
* typo correction (#11973)
* typo correction
* type corrections
* ByT5 model (#11971)
* allow tf to use uneven num of layers
* add tokenizer
* finish docs
* finish docs
* Apply suggestions from code review
* include in index
* finish
* Update docs/source/model_doc/byt5.rst
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* apply sylvais suggestions
* make style
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Typo in usage example, changed to device instead of torch_device (#11979)
* [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966)
* decouple DeepSpeedConfigHF from Trainer
* add LoggingLevel ctx manager; add new test
* cleanup
* add docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* implemented suggested renames
* formatter workaround
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Trainer] add train loss and flops metrics reports (#11980)
* add train loss and flops metrics reports
* consistency
* add train_loss to skip keys
* restore on_train_end call timing
* Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5)
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [RAG] Fix rag from pretrained question encoder generator behavior (#11962)
* fix_torch_device_generate_test
* remove @
* fix rag from pretrained loading
* add test
* uplaod
* finish
* VisualBERT (#10534)
* Init VisualBERT
* Add cookie-cutter, Config, and Embeddings
* Add preliminary Model
* Add Bert analogous classes
* Add basic code for NLVR, VQA, Flickr
* Update Init
* Fix VisualBert Downstream Models
* Rename classifier to cls
* Comment position_ids buffer
* Remove sentence image predictor output
* Update output dicts
* Remove unnecessary files
* Fix Auto Modeling
* Fix transformers init
* Add conversion script
* Add conversion script
* Fix docs
* Update visualbert modelling
* Update configuration
* Style fixes
* Add model and integration tests
* Add all tests
* Update model mapping
* Add simple detector from original repository
* Update docs and configs
* Fix style
* Fix style
* Update docs
* Fix style
* Fix import issues in style
* Fix style
* Add changes from review
* Fix style
* Fix style
* Update docs
* Fix style
* Fix style
* Update docs/source/model_doc/visual_bert.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Remove convert run script
* Add changes from review
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Add changes from review
* Add visual embedding example in docs
* Fix "copied from" comments
* Add changes from review
* Fix error, style, checkpoints
* Update docs
* Fix integration tests
* Fix style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix examples (#11990)
* [docs] fix xref to `PreTrainedModel.generate` (#11049)
* fix xref to generate
* do the same for search methods
* style
* style
* Update return introduction (#11976)
Make it clear that the `forward` method now returns a dict instead of tuple.
Fix style
* [deepspeed] Move code and doc into standalone files (#11984)
* move code and docs
* style
* moved
* restore
* [deepspeed] add nvme test skip rule (#11997)
* add nvme skip rule
* fix
* Fix weight decay masking in `run_flax_glue.py` (#11964)
* Fix weight decay masking in `run_flax_glue.py`
Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`
* Fix formatting with black
* adapt results
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Flax] Refactor MLM (#12013)
* fix_torch_device_generate_test
* remove @
* finish refactor
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Deepspeed] Assert on mismatches between ds and hf args (#12021)
* wip
* add mismatch validation + test
* renames
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [TrainerArguments] format and sort __repr__, add __str__ (#12018)
* format and sort __repr__, add __str__
* typo
* use __str__ directly
* alias __repr__ = __str__
* Fixed Typo in modeling_bart.py (#12035)
* Fixed Typo in modeling_bart.py - Issue #11895
* Fixed Typo in modeling_bart.py
* fix deberta 2 tokenizer integration test (#12017)
* fix docs of past_key_values (#12049)
* [JAX] Bump jax lib (#12053)
* fix_torch_device_generate_test
* remove @
* bump up jax lib
* Fixes bug that appears when using QA bert and distilation. (#12026)
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
* Fixing all models QA clamp_ bug.
* Extend pipelines for automodel tupels (#12025)
* fix_torch_device_generate_test
* remove @
* finish
* refactor
* add test
* fix test
* Attempt at simplification.
* Small fix.
* Fixing non existing AutoModel for TF.
* Naming.
* Remove extra condition.
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Add optional grouped parsers description to HfArgumentParser (#12042)
* Adding optional argument group to HfArgumentParser
* Minor
* remove whitespace
* Minor styling
* adds metric prefix. (#12057)
* adds metric prefix.
* update tests to include prefix
* skip failing test (#12059)
* Fix integration tests (#12066)
* Fix tapas issue (#12063)
* Fix scatter function to be compatible with torch-scatter 2.7.0
* Allow test again
* updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)
* updated the original RAG implementation to be compatible with the latest PL version
* updated the requirements.txt file
* execute make style
* code quality test
* code quality
* conflix resolved in requirement.txt
* code quality
* changed the MyDDP class name to CustomDDP
* Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}
* Remove torch.Tensor in examples
* Add torch to requirements.txt in language-modeling (#12040)
* Add torch to requirements.txt in language-modeling
* Update examples/pytorch/language-modeling/requirements.txt
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Properly indent block_size (#12070)
* [Deepspeed] various fixes (#12058)
* replace deprecated config
* sub_group_size was too big
* complete deprecation removal
* [Deepspeed Wav2vec2] integration (#11638)
* wip
* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044
* cleanup
* workaround
* working 5/8 modes
* solve fp32 distributed zero3
* style
* sync
* sync
* rework
* deprecation
* cleanup
* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged
* clean up
* add a guide
* more prose
* more prose
* fix
* more prose
* sub_group_size was too big
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor
* bug fix
* make the true check explicit
* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typo
* Update run_ner.py with id2label config (#12001)
* sync LayerDrop for Wav2Vec2Encoder + tests (#12076)
* Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [test] support more than 2 gpus (#12074)
* support more than 2 gpus
* style
* Wav2Vec2 Pretraining (#11306)
* Working quantizer forward
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Remove custom outputs from the shared ones
* correct conversion
* correct bug
* add first pretrain script
* save intermediate
* static shapes
* save intermediate
* finish first pretrain script version
* more refactor
* remove wanddb
* refactor more
* improve test
* correct perplexity compute bug
* finish model implementation
* add to docs
* finish docs
* finish pretraining script
* finish pretraining script
* remove wandb
* finish PR for merge
* finish config
* finish
* make deepspeed work
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
* fix flaky test
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* pass decay_mask fn to optimizer (#12087)
* rm require_version_examples (#12088)
* [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)
* fix_torch_device_generate_test
* remove @
* fix tests
* Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)
* Add text_column_name and label_column_name to run_ner args
* Minor fix: grouping for text and label column name
* CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)
* Resize with kept aspect ratio
* Fixed failed test
* Overload center_crop and resize methods instead
* resize should handle non-PIL images
* update slow test
* Tensor => tensor
Co-authored-by: patil-suraj <surajp815@gmail.com>
* New TF GLUE example (#12028)
* Pushing partially-complete new GLUE example
* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.
* Fix to the fit() call
* Bugfixes, making sure TPU and multi-GPU support is ready
* Remove logger line that depends on Pytorch
* Style pass
* Deleting old TF GLUE example
* Include label2id and id2label in the saved model config
* Don't clobber the existing model.config.label2id
* Style fixes
* Update examples/tensorflow/text-classification/run_glue.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix quality
* Update README.md to cover the TF GLUE example.
* Minor style edits
* Appending label2id and id2label to models to ensure inference works properly (#12102)
* Fix a condition in test_generate_with_head_masking (#11911)
* Fix a condition in test_generate_with_head_masking
* Fix usage of head_mask in bigbirg_pegasus
* Fix head masking for speech2text
* Resolve copy mismatch + drop unwanted print statement
* Fix the condition
* Flax VisionTransformer (#11951)
* adding vit for flax
* added test for Flax-vit and some bug-fixes
* overrided methods where variable changes were necessary for flax_vit test
* added FlaxViTForImageClassification for test
* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* made changes suggested in PR
* Adding jax-vit models for autoimport
* swapping num_channels and height,width dimension
* fixing the docstring for torch-like inputs for VIT
* add model to main init
* add docs
* doc, fix-copies
* docstrings
* small test fixes
* fix docs
* fix docstr
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add relevant description to tqdm in examples (#11927)
* add relevant `desc` in examples
* require_version datasets>=1.8.0
* Fix head masking generate tests (#12110)
* fix_torch_device_generate_test
* remove @
* fix tests
* Flax CLM script (#12023)
* first draft
* max_seq_length => block_size
* fix arg names
* fix typos
* fix loss calculation
* add max examples, fix train eval steps, metrics
* optimizer mask
* fix perpelexity, metric logging
* fix logging
* data_collator = > data_loader
* refactor loss_fn
* support single GPU
* pass distributed to write_metric
* fix jitting
* fix single device training
* fix single device metrics
* close inner progress bars once finished
* add overwrite_cache arg
* ifx dataset caching issue
* add more logs
* few small fixes,
* address nicholas suggestions
* fix docstr
* address patricks suggestions
* make flake happy
* pass new new_dropout_rng to apply_gradients
* reset train metrics after every epoc
* remove distributed logis, small fixes
* Add from_pretrained to dummy timm objects (#12097)
* Add from_pretrained to dummy timm
* Fix at the source
* Update utils/check_dummies.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Missing pretrained dummies
* Style
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix t5 error message (#12136)
* Fix t5 error message
* Fix again
* Fix megatron_gpt2 attention block's causal mask (#12007)
* Fix megatron_gpt2 attention block's causal mask.
* compatibility with checkpoints created with recent versions of Megatron-LM
* added integration test for the released Megatron-GPT2 model
* code style changes
* added option to megatron conversion script to read from config file
Co-authored-by: Guido Novati <gnovati@nvidia.com>
* Add mlm pretraining xla torch readme (#12011)
* fix_torch_device_generate_test
* remove @
* upload
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/flax/language-modeling/README.md
* add more info
* finish
* fix
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* add readme for flax clm (#12111)
* add readme for flax clm
* use section link for tokenizer
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update metrics
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* FlaxBart (#11537)
* Start working on FlaxBart
* Create modeling_flax_bart.py
* Write FlaxBartAttention
* Add FlaxBartEncoderLayer
* Add FlaxBartDecoderLayer and some typing
* Add helepr function for FlaxBart
* shift_tokens_right
* _make_causal_mask
* _expand_mask
* Add PositionalEmbedding and fix init_std naming
* Add FlaxBartPretrainedModel
* Add FlaxBartEncoder
* Add FlaxBartEncoder
* Add FlaxBartEncoder among modules to be imported
* YET WE CANNOT INITIALIZE THAT!! :(
* Make BartEncoder working
Change BartEncoder to instance of nn.Module so far
* Add FlaxBartDecoder
* Add FlaxBartModel
* TODO to make model run -> Prepapre model inputs
* Resolve padding
* Add FlaxBartModel
* Add FlaxBartModel into importable modules
* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
* make style; not properly working
* make style; make quality not pass due to some import I left
* Remove TODO for padding_idx in nn.Embed so far
* Add FlaxBartForConditionalGeneration
* Incorporate Flax model output classes, i.e. return_dict
* Add another models and incorporate use_cache arg
* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
* Incorporate use_cache arg from PyTorch implementation
* Add all necessary Flax output utils
* Add FlaxBartForCausalLM; not working yet'
* Add minor improvements; still lacks some functionality
* Update docs, src and tests
* Add support of FlaxBart to docs/source
* Fix some bugs in FlaxBart souce code
* Add some neccessary tests for FlaxBart models - jit_compilation not passing
* Fix tests and add test_head_masking
* Fix tests for @jax.jit computation
* Add test_head_masking
* Migrate FlaxBart tests from jax.numpy to numpy
* Remove FlaxBartForCausalLM
* Clean repo
* fix bart model weight structure
* Fix FlaxBartForSequenceClassification
Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.
* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
* Allow testing for FlaxBartForQA for pt_flax equivalence
* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
* remove past_key_values
* remove inputs_mebeds and make input_ids required
* add position ids
* re-write attention layer
* fix dataclass
* fix pos embeds and attention output
* fix pos embeds
* expose encode method
* expose decode method
* move docstring to top
* add cache for causal attn layer
* remove head masking for now
* s2s greedy search first pass
* boom boom
* fix typos
* fix greedy generate for bart
* use encoder, decoder layers instead of num_hidden_layers
* handle encoder_outputs
* cleanup
* simplify decoding
* more clean-up
* typos
* Change header + add {decoder_,}position_ids into 2 models
* add BartConfig
* fix existing tests
* add encode, decode methods
* Fix shift_tokens_right for JIT compilation + clarify one condition
* fix decode
* encoder => encode
* simplify generate
* add tests for encode and decode
* style
* add tests for cache
* fix equivalence tests
* sample generate now works with seq2seq
* generation tests
* initialize dense layers
* docstring and cleanup
* quality
* remove get/set input_embeddings
* address Patricks suggestions
* decode for every model, remove encoder_outputs from call
* update tests accordingly
* decode returns only decoder outputs and logits
* fix arguments
* doc encode, decode methods
* correct base_model_prefix
* fix test for seq classif model
* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)
* feature for tokenizer without slow/legacy version
* format
* modify common test
* add tests
* add PreTrainedTokenizerFast to AutoTokenizer
* format
* change tokenizer common test in order to be able to run test without a slow version
* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`
* add autokenizer test
* replace `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`
* remove obsolete change in comment
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change `get_main_tokenizer` into `get_tokenizers`
* clarify `get_tokenizers` method
* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`
* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version
* `test_rust_tokenizer = False` for BertJapaneseTokenizer
* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Flax] Add links to google colabs (#12146)
* fix_torch_device_generate_test
* remove @
* add colab links
* Don't log anything before logging is setup in examples (#12121)
* Don't log anything before logging is setup in examples
* Last example
* Use text_column_name variable instead of "text" (#12132)
* Use text_column_name variable instead of "text"
`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
* black formatting
* make style
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
* [lm examples] Replicate --config_overrides addition to other LM examples (#12135)
* [lm examples] Replicate --config_overrides addition to other LM examples
* Removing no trainer files changes
* Update README
Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
* fix error message (#12148)
* [optim] implement AdafactorSchedule (#12123)
* implement AdafactorSchedule
* typo
* fix
* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [style] consistent nn. and nn.functional (#12124)
* consistent nn. and nn.functional
* fix glitch
* fix glitch #2
* Adding TFWav2Vec2Model (#11617)
* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Fix flax pt equivalence tests (#12154)
* fix_torch_device_generate_test
* remove @
* upload
* consistent nn. and nn.functional: p2 templates (#12153)
* Flax Big Bird (#11967)
* add flax bert
* bert -> bigbird
* original_full ported
* add debugger
* init block sparse
* fix copies ; gelu_fast -> gelu_new
* block sparse port
* fix block sparse
* block sparse working
* all ckpts working
* fix-copies
* make quality
* init tests
* temporary fix for FlaxBigBirdForMultipleChoice
* skip test_attention_outputs
* fix
* gelu_fast -> gelu_new ; fix multiple choice model
* remove nsp
* fix sequence classifier
* fix
* make quality
* make fix-copies
* finish
* Delete debugger.ipynb
* Update src/transformers/models/big_bird/modeling_flax_big_bird.py
* make style
* finish
* bye bye jit flax tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [style] consistent nn. and nn.functional: part 3 `tests` (#12155)
* consistent nn. and nn.functional: p3 templates
* restore
* [style] consistent nn. and nn.functional: part 4 `examples` (#12156)
* consistent nn. and nn.functional: p4 examples
* restore
* consistent nn. and nn.functional: part 5 docs (#12161)
* Add video links to the documentation (#12162)
* [Flax generate] Add params to generate (#12171)
* fix_torch_device_generate_test
* remove @
* add params as input
* finish
* Use a released version of optax rather than installing from Git. (#12173)
Use a released version of optax rather than installing from Git
* Have dummy processors have a `from_pretrained` method (#12145)
* Add course banner (#12157)
* Add course banner
* Update course banner
* Adjust banner width
* Enable add_prefix_space if model_type is roberta or gpt2 (#12116)
* Update AutoModel classes in summarization example (#12178)
- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`
This silences all warnings.
* Ray Tune Integration Updates (#12134)
* fix
* fixes
* add back to scheduled tests
* formatting
* Update integrations.py
* [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)
* ensure concurrent pytest workers use a unique port for torch.distributed.launch
* reword
* Model card defaults (#12122)
* [WIP] Model card defaults
* finetuned_from default value
* Add all mappings to the mapping file
* Be more defensive on finetuned_from arg
* Add default task tag
* Separate tags from tasks
* Edge case for dataset
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Temporarily deactivate torch-scatter while we wait for new release (#12181)
* Temporarily deactivate torch-scatter while we wait for new release
* torch-1.8.1 binary for scatter
* Revert to 1.8.0
* Pin torch dependency
* torchaudio and torchvision
* Temporarily deactivate torchhub test (#12184)
* [Flax] Add Beam Search (#12131)
* fix_torch_device_generate_test
* remove @
* push new logit processors
* add processors
* save first working version
* save intermediate
* finish
* make style
* make fix-copies
* finish
* Update tests/test_modeling_flax_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Hubert (#11889)
* fix_torch_device_generate_test
* remove @
* add hubert
* add first test file
* more docs
* fix bugs
* fix bug
* finish
* finish
* finish docstring
* fix
* fix
* finalize
* add to ignored
* finish
* Apply suggestions from code review
* correct naming
* finish
* fix auto config
* finish
* correct convert script
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* apply suggestions lysandre & suraj
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* updated DLC images and sample notebooks (#12191)
* Enabling AutoTokenizer for HubertConfig. (#12198)
* Use yaml to create metadata (#12185)
* Use yaml to create metadata
* Fix typo
* Remove pin
* [Docs] fixed broken link (#12205)
* fixed broken link
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Pipeline update & tests (#12207)
* Improve detr (#12147)
* Remove unused variables
* Improve docs
* Fix docs of segmentation masks
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add link to the course (#12229)
* Support for torch 1.9.0 (#12224)
* Support for torch 1.9.0
* Torch scatter for 1.9.0
* Github Actions run on 1.9.0
* fix pt-1.9.0 `add_` deprecation (#12217)
* fix pt-1.9.0 add_ deprecation
* add () for clarity
* Trigger CI
* require_version(torch
* Release: v4.7.0
* Docs for v4.8.0
* AutoTokenizer: infer the class from the tokenizer config if possible (#12208)
* AutoTokenizer: infer the class from the tokenizer config if possible
* Add tests
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update desc for map in all examples (#12226)
* update desc for map in all examples
* added plm
* suggestions
* [Flax] FlaxAutoModelForSeq2SeqLM (#12228)
* add FlaxAutoModelForSeq2SeqLM
* [FlaxBart] few small fixes (#12247)
* boom boom
* remove flax clip example
* few small fixes
* Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)
* Moved Mish to Torch 1.9 version
* Run black formatting
* [t5 doc] make the example work out of the box (#12239)
* [run_clm.py] restore caching
* style
* [t5 doc] make the example work out of the box
This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.
* Update docs/source/model_doc/t5.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* expand the other example
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Fix the scheduled CI
* Better CI feedback (#12279)
* Better run ID
* Only part of CI
* Revert "Only part of CI"
This reverts commit
|
||
![]() |
40de2d5a4f | Docs for v4.10.0dev0 | ||
![]() |
27a8c9e4f1
|
[parallelism doc] document Deepspeed-Inference and parallelformers (#12836)
* document Deepspeed-Inference and parallelformers * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
807b6bd160
|
[Deepspeed] warmup_ratio docs (#12830)
* [Deepspeed] warmup_ratio docs * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
cf0755aa6e
|
[debug] DebugUnderflowOverflow doesn't work with DP (#12816) | ||
![]() |
b5b4e54920
|
add and fix examples (#12810) | ||
![]() |
da72ac6e26
|
Fix push_to_hub docstring and make it appear in doc (#12770) | ||
![]() |
6989264963
|
[doc] testing: how to trigger a self-push workflow (#12724)
* [testing] details of how to start self-push workflow * style * fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
31cfcbd3e2
|
[doc] performance: batch sizes (#12725) | ||
![]() |
68605e9db1
|
[doc] parallelism: Which Strategy To Use When (#12712) | ||
![]() |
eb4d7ef97b
|
Remove framework mention (#12731) | ||
![]() |
5dd0c956a8
|
non-native optimizers are mostly ok with zero-offload (#12690) | ||
![]() |
78f5fe1416
|
[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477)
* zero_to_fp32 tests * args change * remove unnecessary work * use transformers.trainer_utils.get_last_checkpoint * document the new features * cleanup * wip * fix fsmt * add bert * cleanup * add xlm-roberta * electra works * cleanup * sync * split off the model zoo tests * cleanup * cleanup * cleanup * cleanup * reformat * cleanup * casing * deepspeed>=0.4.3 * adjust distilbert * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
cee2d2135f
|
[Flax Generation] Correct inconsistencies PyTorch/Flax (#12662)
* fix_torch_device_generate_test * remove @ * correct greedy search * save intertmed * add final logits bias * correct * up * add more tests * fix another bug * finish tests * finish marian tests * up Co-authored-by: Patrick von Platen <patrick@huggingface.co> |
||
![]() |
9519f0cd63
|
Wrong model is used in example, should be character instead of subword model (#12676)
* Wrong model is used, should be character instead of subword In the original Google repo for CANINE there was mixup in the model names in the README.md, which was fixed 2 weeks ago. Since this transformer model was created before, it probably resulted in wrong use in this example. s = subword, c = character * canine.rst style fix * Update docs/source/model_doc/canine.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Styling canine.rst * Added links to model cards. * Fixed links to model cards. Co-authored-by: Jeroen Steggink <978411+jsteggink@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |