mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-06 06:10:04 +06:00
5061a9fd55
1 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
434022adac
|
Add RemBERT model code to huggingface (#10692)
* Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825)
get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).
Resolves: #11795
Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
* Replace double occurrences as the last step (#11367)
* [Flax] Fix PyTorch import error (#11839)
* fix_torch_device_generate_test
* remove @
* change pytorch import to flax import
* Fix reference to XLNet (#11846)
* Switch mem metrics flag (#11851)
* Switch mem metrics flag
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix flos single node (#11844)
* fixing flos bug/typo in non-distributed setting
* storing flos every logging_interval
* Fix two typos in docs (#11852)
* typo2
* fix typo
* [Trainer] Report both steps and num samples per second (#11818)
* [Trainer] Report both steps and num samples per second
* Fix batch number
* Update src/transformers/trainer_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Add some tests to the slow suite #11860
* Enable memory metrics in tests that need it (#11859)
* fixed a small typo in the doc (#11856)
* typo (#11858)
* Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training
* Use an alternate property
* [Wav2Vec2] SpecAugment Fast (#11764)
* first try
* finish
* [lm examples] fix overflow in perplexity calc (#11855)
* fix overflow in perplexity calc
* use inf
* fix
* [Examples] create model with custom config on the fly (#11798)
* create custom model on the flight
* better wording
* add update_from_string
* cleanup
* cleanup
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* more bool options
* style
* fix logger
* add test
* add the doc
* assert on conflict of options
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Wav2Vec2ForCTC] example typo fixed (#11878)
* Ensure input tensor are on device. (#11874)
The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.
* Fix usage of head masks by TF encoder-decoder models' `generate()` function (#11775)
* Fix Bart
* Fix Blenderbot{,_small}
* Fix LED
* Fix Marian
* Fix MBart
* Fix Pegasus
* Fix T5
* Add test for generation with head_mask
* Add a common TF test
* Override a test for the LED model as head masking is not yet properly implemented
* Remove all head_masks from input preparation for LED
* Drop masking for T5 as it needs a bit of refactor
* Correcting comments in T5Stack to reflect correct tuple order (#11330)
* Correcting comments to reflect correct tuple order
In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.
* Update modeling_t5.py
Updating another comment as well
* Removing extra space
* Fixing style and quality
* style & quality
* Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Allow dataclasses to be jitted (#11886)
* fix_torch_device_generate_test
* remove @
* change dataclasses to flax ones
* fix typo
* fix jitted tests
* fix bert & electra
* changing find_batch_size to work with tokenizer outputs (#11890)
* changing find_batch_size to work with tokenizer outputs
trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.
* Trigger CI
Co-authored-by: jrenner <joseph.renner@inria.fr>
* Link official Cloud TPU JAX docs (#11892)
* Flax Generate (#11777)
* fix_torch_device_generate_test
* remove @
* add
* indexing
* correct a couple of tests
* fix tests
* add logits processor
* finish top_k, top_p, temp
* add docs
* correct flax prng key default
* improve generate
* add generation docs
* add docs
* make style
* revert model outputs change
* make style
* correct typo
* fix tests
* fix slow test
* add raise
* finish generation
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Add Emotion Speech Noteboook (#11900)
* Update deepspeed config to reflect hyperparameter search parameters (#11896)
* rebuild deepspeed config for hyperparameter search
* reformat code to fix style issues
* Adding new argument `max_new_tokens` for generate. (#11476)
* Adding new argument `max_new_tokens` for generate.
This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.
* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.
* Removed redundant `no_grad`.
* Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes
* fix tests
* [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
* Added logic to return attention from flax-bert model and added test cases to check that
* Added new line at the end of file to test_modeling_flax_common.py
* fixing code style
* Fixing Roberta and Elextra models too from cpoying bert
* Added temporary hack to not run test_attention_outputs for FlaxGPT2
* Returning attention weights from GPT2 and changed the tests accordingly.
* last fixes
* bump flax dependency
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Test optuna and ray (#11924)
* Remove `datasets` submodule
* fix assert (#11935)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`
`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.
* Remove unused 'flax.linen' import
* Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)
* Add MT5ForConditionalGeneration as supported arch.
* Update README.md
* Add FlaxCLIP (#11883)
* add flax CLIP
* default input_shape
* add tests
* fix test
* fix name
* fix docs
* fix shapes
* attend at least 1 token
* flax conv to torch conv
* return floats
* fix equivalence tests
* fix import
* return attention_weights and update tests
* fix dosctrings
* address patricks comments
* input_shape arg
* add tests for get_image_features and get_text_features methods
* fix tests
* RAG-2nd2end-revamp (#11893)
* initial
* code quality test
* code quality
* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver
* minor change in test_modeling_rag
* fixed tests
* Update examples/research_projects/rag-end2end-retriever/README.md
typo corrected as suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py
type change suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update src/transformers/models/rag/retrieval_rag.py
Adding this change as mentioned by lhoestq.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* completed the minor changes suggested by the reviewers
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* modify qa-trainer (#11872)
* modify qa-trainer
* fix flax model
* bugfixes training_args.py (#11922)
modified according to:
https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html
* reinitialize wandb config for each hyperparameter search run (#11945)
* Add regression tests for slow sentencepiece tokenizers. (#11737)
* add test_vocab_size for sentencepiece tok.
* add test_get_vocab for sentencepiece tok.
* add test_convert_token_and_id for sentencepiece tok.
* add test_tokenize_and_convert_tokens_to_string for all tok.
* improve test_tokenize_and_convert_tokens_to_string for sp. tok.
* add common tokenizer integration tests
- for albert
- for barthez
* add tokenizer integration tests to bert gen.
* add most tokenizer integration tests
* fix camembert tokenizer integration test
* add tokenizer integration test to marian
* add tokenizer integration test to reformer
* add typing and doc to tokenizer_integration_test_util
* fix tokenizer integration test of reformer
* improve test_sentencepiece_tokenize_and_convert_tokens_to_string
* empty commit to trigger CI
* fix tokenizer integration test of reformer
* remove code not needed anymore
* empty commit to trigger CI
* empty commit to trigger CI
* Authorize args when instantiating an AutoModel (#11956)
* Neptune.ai integration (#11937)
An option that turns on neptune.ai logging
--report_to 'neptune'
Additional ENV variables:
NEPTUNE_PROJECT
NEPTUNE_API_TOKEN
NEPTUNE_RUN_NAME (optional)
NEPTUNE_STOP_TIMEOUT (optional)
* Run the integration tests on schedule tests instead of master tests
* [deepspeed] docs (#11940)
* deepspeed docs
* cleanup
* cleanup
* typo correction (#11973)
* typo correction
* type corrections
* ByT5 model (#11971)
* allow tf to use uneven num of layers
* add tokenizer
* finish docs
* finish docs
* Apply suggestions from code review
* include in index
* finish
* Update docs/source/model_doc/byt5.rst
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* apply sylvais suggestions
* make style
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Typo in usage example, changed to device instead of torch_device (#11979)
* [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966)
* decouple DeepSpeedConfigHF from Trainer
* add LoggingLevel ctx manager; add new test
* cleanup
* add docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* implemented suggested renames
* formatter workaround
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Trainer] add train loss and flops metrics reports (#11980)
* add train loss and flops metrics reports
* consistency
* add train_loss to skip keys
* restore on_train_end call timing
* Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5)
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [RAG] Fix rag from pretrained question encoder generator behavior (#11962)
* fix_torch_device_generate_test
* remove @
* fix rag from pretrained loading
* add test
* uplaod
* finish
* VisualBERT (#10534)
* Init VisualBERT
* Add cookie-cutter, Config, and Embeddings
* Add preliminary Model
* Add Bert analogous classes
* Add basic code for NLVR, VQA, Flickr
* Update Init
* Fix VisualBert Downstream Models
* Rename classifier to cls
* Comment position_ids buffer
* Remove sentence image predictor output
* Update output dicts
* Remove unnecessary files
* Fix Auto Modeling
* Fix transformers init
* Add conversion script
* Add conversion script
* Fix docs
* Update visualbert modelling
* Update configuration
* Style fixes
* Add model and integration tests
* Add all tests
* Update model mapping
* Add simple detector from original repository
* Update docs and configs
* Fix style
* Fix style
* Update docs
* Fix style
* Fix import issues in style
* Fix style
* Add changes from review
* Fix style
* Fix style
* Update docs
* Fix style
* Fix style
* Update docs/source/model_doc/visual_bert.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Remove convert run script
* Add changes from review
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Add changes from review
* Add visual embedding example in docs
* Fix "copied from" comments
* Add changes from review
* Fix error, style, checkpoints
* Update docs
* Fix integration tests
* Fix style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix examples (#11990)
* [docs] fix xref to `PreTrainedModel.generate` (#11049)
* fix xref to generate
* do the same for search methods
* style
* style
* Update return introduction (#11976)
Make it clear that the `forward` method now returns a dict instead of tuple.
Fix style
* [deepspeed] Move code and doc into standalone files (#11984)
* move code and docs
* style
* moved
* restore
* [deepspeed] add nvme test skip rule (#11997)
* add nvme skip rule
* fix
* Fix weight decay masking in `run_flax_glue.py` (#11964)
* Fix weight decay masking in `run_flax_glue.py`
Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`
* Fix formatting with black
* adapt results
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Flax] Refactor MLM (#12013)
* fix_torch_device_generate_test
* remove @
* finish refactor
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Deepspeed] Assert on mismatches between ds and hf args (#12021)
* wip
* add mismatch validation + test
* renames
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [TrainerArguments] format and sort __repr__, add __str__ (#12018)
* format and sort __repr__, add __str__
* typo
* use __str__ directly
* alias __repr__ = __str__
* Fixed Typo in modeling_bart.py (#12035)
* Fixed Typo in modeling_bart.py - Issue #11895
* Fixed Typo in modeling_bart.py
* fix deberta 2 tokenizer integration test (#12017)
* fix docs of past_key_values (#12049)
* [JAX] Bump jax lib (#12053)
* fix_torch_device_generate_test
* remove @
* bump up jax lib
* Fixes bug that appears when using QA bert and distilation. (#12026)
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
* Fixing all models QA clamp_ bug.
* Extend pipelines for automodel tupels (#12025)
* fix_torch_device_generate_test
* remove @
* finish
* refactor
* add test
* fix test
* Attempt at simplification.
* Small fix.
* Fixing non existing AutoModel for TF.
* Naming.
* Remove extra condition.
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Add optional grouped parsers description to HfArgumentParser (#12042)
* Adding optional argument group to HfArgumentParser
* Minor
* remove whitespace
* Minor styling
* adds metric prefix. (#12057)
* adds metric prefix.
* update tests to include prefix
* skip failing test (#12059)
* Fix integration tests (#12066)
* Fix tapas issue (#12063)
* Fix scatter function to be compatible with torch-scatter 2.7.0
* Allow test again
* updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)
* updated the original RAG implementation to be compatible with the latest PL version
* updated the requirements.txt file
* execute make style
* code quality test
* code quality
* conflix resolved in requirement.txt
* code quality
* changed the MyDDP class name to CustomDDP
* Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}
* Remove torch.Tensor in examples
* Add torch to requirements.txt in language-modeling (#12040)
* Add torch to requirements.txt in language-modeling
* Update examples/pytorch/language-modeling/requirements.txt
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Properly indent block_size (#12070)
* [Deepspeed] various fixes (#12058)
* replace deprecated config
* sub_group_size was too big
* complete deprecation removal
* [Deepspeed Wav2vec2] integration (#11638)
* wip
* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044
* cleanup
* workaround
* working 5/8 modes
* solve fp32 distributed zero3
* style
* sync
* sync
* rework
* deprecation
* cleanup
* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged
* clean up
* add a guide
* more prose
* more prose
* fix
* more prose
* sub_group_size was too big
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor
* bug fix
* make the true check explicit
* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typo
* Update run_ner.py with id2label config (#12001)
* sync LayerDrop for Wav2Vec2Encoder + tests (#12076)
* Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [test] support more than 2 gpus (#12074)
* support more than 2 gpus
* style
* Wav2Vec2 Pretraining (#11306)
* Working quantizer forward
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Remove custom outputs from the shared ones
* correct conversion
* correct bug
* add first pretrain script
* save intermediate
* static shapes
* save intermediate
* finish first pretrain script version
* more refactor
* remove wanddb
* refactor more
* improve test
* correct perplexity compute bug
* finish model implementation
* add to docs
* finish docs
* finish pretraining script
* finish pretraining script
* remove wandb
* finish PR for merge
* finish config
* finish
* make deepspeed work
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
* fix flaky test
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* pass decay_mask fn to optimizer (#12087)
* rm require_version_examples (#12088)
* [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)
* fix_torch_device_generate_test
* remove @
* fix tests
* Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)
* Add text_column_name and label_column_name to run_ner args
* Minor fix: grouping for text and label column name
* CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)
* Resize with kept aspect ratio
* Fixed failed test
* Overload center_crop and resize methods instead
* resize should handle non-PIL images
* update slow test
* Tensor => tensor
Co-authored-by: patil-suraj <surajp815@gmail.com>
* New TF GLUE example (#12028)
* Pushing partially-complete new GLUE example
* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.
* Fix to the fit() call
* Bugfixes, making sure TPU and multi-GPU support is ready
* Remove logger line that depends on Pytorch
* Style pass
* Deleting old TF GLUE example
* Include label2id and id2label in the saved model config
* Don't clobber the existing model.config.label2id
* Style fixes
* Update examples/tensorflow/text-classification/run_glue.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix quality
* Update README.md to cover the TF GLUE example.
* Minor style edits
* Appending label2id and id2label to models to ensure inference works properly (#12102)
* Fix a condition in test_generate_with_head_masking (#11911)
* Fix a condition in test_generate_with_head_masking
* Fix usage of head_mask in bigbirg_pegasus
* Fix head masking for speech2text
* Resolve copy mismatch + drop unwanted print statement
* Fix the condition
* Flax VisionTransformer (#11951)
* adding vit for flax
* added test for Flax-vit and some bug-fixes
* overrided methods where variable changes were necessary for flax_vit test
* added FlaxViTForImageClassification for test
* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* made changes suggested in PR
* Adding jax-vit models for autoimport
* swapping num_channels and height,width dimension
* fixing the docstring for torch-like inputs for VIT
* add model to main init
* add docs
* doc, fix-copies
* docstrings
* small test fixes
* fix docs
* fix docstr
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add relevant description to tqdm in examples (#11927)
* add relevant `desc` in examples
* require_version datasets>=1.8.0
* Fix head masking generate tests (#12110)
* fix_torch_device_generate_test
* remove @
* fix tests
* Flax CLM script (#12023)
* first draft
* max_seq_length => block_size
* fix arg names
* fix typos
* fix loss calculation
* add max examples, fix train eval steps, metrics
* optimizer mask
* fix perpelexity, metric logging
* fix logging
* data_collator = > data_loader
* refactor loss_fn
* support single GPU
* pass distributed to write_metric
* fix jitting
* fix single device training
* fix single device metrics
* close inner progress bars once finished
* add overwrite_cache arg
* ifx dataset caching issue
* add more logs
* few small fixes,
* address nicholas suggestions
* fix docstr
* address patricks suggestions
* make flake happy
* pass new new_dropout_rng to apply_gradients
* reset train metrics after every epoc
* remove distributed logis, small fixes
* Add from_pretrained to dummy timm objects (#12097)
* Add from_pretrained to dummy timm
* Fix at the source
* Update utils/check_dummies.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Missing pretrained dummies
* Style
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix t5 error message (#12136)
* Fix t5 error message
* Fix again
* Fix megatron_gpt2 attention block's causal mask (#12007)
* Fix megatron_gpt2 attention block's causal mask.
* compatibility with checkpoints created with recent versions of Megatron-LM
* added integration test for the released Megatron-GPT2 model
* code style changes
* added option to megatron conversion script to read from config file
Co-authored-by: Guido Novati <gnovati@nvidia.com>
* Add mlm pretraining xla torch readme (#12011)
* fix_torch_device_generate_test
* remove @
* upload
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/flax/language-modeling/README.md
* add more info
* finish
* fix
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* add readme for flax clm (#12111)
* add readme for flax clm
* use section link for tokenizer
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update metrics
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* FlaxBart (#11537)
* Start working on FlaxBart
* Create modeling_flax_bart.py
* Write FlaxBartAttention
* Add FlaxBartEncoderLayer
* Add FlaxBartDecoderLayer and some typing
* Add helepr function for FlaxBart
* shift_tokens_right
* _make_causal_mask
* _expand_mask
* Add PositionalEmbedding and fix init_std naming
* Add FlaxBartPretrainedModel
* Add FlaxBartEncoder
* Add FlaxBartEncoder
* Add FlaxBartEncoder among modules to be imported
* YET WE CANNOT INITIALIZE THAT!! :(
* Make BartEncoder working
Change BartEncoder to instance of nn.Module so far
* Add FlaxBartDecoder
* Add FlaxBartModel
* TODO to make model run -> Prepapre model inputs
* Resolve padding
* Add FlaxBartModel
* Add FlaxBartModel into importable modules
* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
* make style; not properly working
* make style; make quality not pass due to some import I left
* Remove TODO for padding_idx in nn.Embed so far
* Add FlaxBartForConditionalGeneration
* Incorporate Flax model output classes, i.e. return_dict
* Add another models and incorporate use_cache arg
* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
* Incorporate use_cache arg from PyTorch implementation
* Add all necessary Flax output utils
* Add FlaxBartForCausalLM; not working yet'
* Add minor improvements; still lacks some functionality
* Update docs, src and tests
* Add support of FlaxBart to docs/source
* Fix some bugs in FlaxBart souce code
* Add some neccessary tests for FlaxBart models - jit_compilation not passing
* Fix tests and add test_head_masking
* Fix tests for @jax.jit computation
* Add test_head_masking
* Migrate FlaxBart tests from jax.numpy to numpy
* Remove FlaxBartForCausalLM
* Clean repo
* fix bart model weight structure
* Fix FlaxBartForSequenceClassification
Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.
* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
* Allow testing for FlaxBartForQA for pt_flax equivalence
* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
* remove past_key_values
* remove inputs_mebeds and make input_ids required
* add position ids
* re-write attention layer
* fix dataclass
* fix pos embeds and attention output
* fix pos embeds
* expose encode method
* expose decode method
* move docstring to top
* add cache for causal attn layer
* remove head masking for now
* s2s greedy search first pass
* boom boom
* fix typos
* fix greedy generate for bart
* use encoder, decoder layers instead of num_hidden_layers
* handle encoder_outputs
* cleanup
* simplify decoding
* more clean-up
* typos
* Change header + add {decoder_,}position_ids into 2 models
* add BartConfig
* fix existing tests
* add encode, decode methods
* Fix shift_tokens_right for JIT compilation + clarify one condition
* fix decode
* encoder => encode
* simplify generate
* add tests for encode and decode
* style
* add tests for cache
* fix equivalence tests
* sample generate now works with seq2seq
* generation tests
* initialize dense layers
* docstring and cleanup
* quality
* remove get/set input_embeddings
* address Patricks suggestions
* decode for every model, remove encoder_outputs from call
* update tests accordingly
* decode returns only decoder outputs and logits
* fix arguments
* doc encode, decode methods
* correct base_model_prefix
* fix test for seq classif model
* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)
* feature for tokenizer without slow/legacy version
* format
* modify common test
* add tests
* add PreTrainedTokenizerFast to AutoTokenizer
* format
* change tokenizer common test in order to be able to run test without a slow version
* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`
* add autokenizer test
* replace `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`
* remove obsolete change in comment
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change `get_main_tokenizer` into `get_tokenizers`
* clarify `get_tokenizers` method
* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`
* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version
* `test_rust_tokenizer = False` for BertJapaneseTokenizer
* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Flax] Add links to google colabs (#12146)
* fix_torch_device_generate_test
* remove @
* add colab links
* Don't log anything before logging is setup in examples (#12121)
* Don't log anything before logging is setup in examples
* Last example
* Use text_column_name variable instead of "text" (#12132)
* Use text_column_name variable instead of "text"
`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
* black formatting
* make style
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
* [lm examples] Replicate --config_overrides addition to other LM examples (#12135)
* [lm examples] Replicate --config_overrides addition to other LM examples
* Removing no trainer files changes
* Update README
Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
* fix error message (#12148)
* [optim] implement AdafactorSchedule (#12123)
* implement AdafactorSchedule
* typo
* fix
* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [style] consistent nn. and nn.functional (#12124)
* consistent nn. and nn.functional
* fix glitch
* fix glitch #2
* Adding TFWav2Vec2Model (#11617)
* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Fix flax pt equivalence tests (#12154)
* fix_torch_device_generate_test
* remove @
* upload
* consistent nn. and nn.functional: p2 templates (#12153)
* Flax Big Bird (#11967)
* add flax bert
* bert -> bigbird
* original_full ported
* add debugger
* init block sparse
* fix copies ; gelu_fast -> gelu_new
* block sparse port
* fix block sparse
* block sparse working
* all ckpts working
* fix-copies
* make quality
* init tests
* temporary fix for FlaxBigBirdForMultipleChoice
* skip test_attention_outputs
* fix
* gelu_fast -> gelu_new ; fix multiple choice model
* remove nsp
* fix sequence classifier
* fix
* make quality
* make fix-copies
* finish
* Delete debugger.ipynb
* Update src/transformers/models/big_bird/modeling_flax_big_bird.py
* make style
* finish
* bye bye jit flax tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [style] consistent nn. and nn.functional: part 3 `tests` (#12155)
* consistent nn. and nn.functional: p3 templates
* restore
* [style] consistent nn. and nn.functional: part 4 `examples` (#12156)
* consistent nn. and nn.functional: p4 examples
* restore
* consistent nn. and nn.functional: part 5 docs (#12161)
* Add video links to the documentation (#12162)
* [Flax generate] Add params to generate (#12171)
* fix_torch_device_generate_test
* remove @
* add params as input
* finish
* Use a released version of optax rather than installing from Git. (#12173)
Use a released version of optax rather than installing from Git
* Have dummy processors have a `from_pretrained` method (#12145)
* Add course banner (#12157)
* Add course banner
* Update course banner
* Adjust banner width
* Enable add_prefix_space if model_type is roberta or gpt2 (#12116)
* Update AutoModel classes in summarization example (#12178)
- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`
This silences all warnings.
* Ray Tune Integration Updates (#12134)
* fix
* fixes
* add back to scheduled tests
* formatting
* Update integrations.py
* [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)
* ensure concurrent pytest workers use a unique port for torch.distributed.launch
* reword
* Model card defaults (#12122)
* [WIP] Model card defaults
* finetuned_from default value
* Add all mappings to the mapping file
* Be more defensive on finetuned_from arg
* Add default task tag
* Separate tags from tasks
* Edge case for dataset
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Temporarily deactivate torch-scatter while we wait for new release (#12181)
* Temporarily deactivate torch-scatter while we wait for new release
* torch-1.8.1 binary for scatter
* Revert to 1.8.0
* Pin torch dependency
* torchaudio and torchvision
* Temporarily deactivate torchhub test (#12184)
* [Flax] Add Beam Search (#12131)
* fix_torch_device_generate_test
* remove @
* push new logit processors
* add processors
* save first working version
* save intermediate
* finish
* make style
* make fix-copies
* finish
* Update tests/test_modeling_flax_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Hubert (#11889)
* fix_torch_device_generate_test
* remove @
* add hubert
* add first test file
* more docs
* fix bugs
* fix bug
* finish
* finish
* finish docstring
* fix
* fix
* finalize
* add to ignored
* finish
* Apply suggestions from code review
* correct naming
* finish
* fix auto config
* finish
* correct convert script
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* apply suggestions lysandre & suraj
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* updated DLC images and sample notebooks (#12191)
* Enabling AutoTokenizer for HubertConfig. (#12198)
* Use yaml to create metadata (#12185)
* Use yaml to create metadata
* Fix typo
* Remove pin
* [Docs] fixed broken link (#12205)
* fixed broken link
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Pipeline update & tests (#12207)
* Improve detr (#12147)
* Remove unused variables
* Improve docs
* Fix docs of segmentation masks
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add link to the course (#12229)
* Support for torch 1.9.0 (#12224)
* Support for torch 1.9.0
* Torch scatter for 1.9.0
* Github Actions run on 1.9.0
* fix pt-1.9.0 `add_` deprecation (#12217)
* fix pt-1.9.0 add_ deprecation
* add () for clarity
* Trigger CI
* require_version(torch
* Release: v4.7.0
* Docs for v4.8.0
* AutoTokenizer: infer the class from the tokenizer config if possible (#12208)
* AutoTokenizer: infer the class from the tokenizer config if possible
* Add tests
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update desc for map in all examples (#12226)
* update desc for map in all examples
* added plm
* suggestions
* [Flax] FlaxAutoModelForSeq2SeqLM (#12228)
* add FlaxAutoModelForSeq2SeqLM
* [FlaxBart] few small fixes (#12247)
* boom boom
* remove flax clip example
* few small fixes
* Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)
* Moved Mish to Torch 1.9 version
* Run black formatting
* [t5 doc] make the example work out of the box (#12239)
* [run_clm.py] restore caching
* style
* [t5 doc] make the example work out of the box
This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.
* Update docs/source/model_doc/t5.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* expand the other example
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Fix the scheduled CI
* Better CI feedback (#12279)
* Better run ID
* Only part of CI
* Revert "Only part of CI"
This reverts commit
|