mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-23 22:38:58 +06:00
8ff619d95e
179 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
434022adac
|
Add RemBERT model code to huggingface (#10692)
* Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825)
get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).
Resolves: #11795
Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
* Replace double occurrences as the last step (#11367)
* [Flax] Fix PyTorch import error (#11839)
* fix_torch_device_generate_test
* remove @
* change pytorch import to flax import
* Fix reference to XLNet (#11846)
* Switch mem metrics flag (#11851)
* Switch mem metrics flag
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix flos single node (#11844)
* fixing flos bug/typo in non-distributed setting
* storing flos every logging_interval
* Fix two typos in docs (#11852)
* typo2
* fix typo
* [Trainer] Report both steps and num samples per second (#11818)
* [Trainer] Report both steps and num samples per second
* Fix batch number
* Update src/transformers/trainer_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Add some tests to the slow suite #11860
* Enable memory metrics in tests that need it (#11859)
* fixed a small typo in the doc (#11856)
* typo (#11858)
* Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training
* Use an alternate property
* [Wav2Vec2] SpecAugment Fast (#11764)
* first try
* finish
* [lm examples] fix overflow in perplexity calc (#11855)
* fix overflow in perplexity calc
* use inf
* fix
* [Examples] create model with custom config on the fly (#11798)
* create custom model on the flight
* better wording
* add update_from_string
* cleanup
* cleanup
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* more bool options
* style
* fix logger
* add test
* add the doc
* assert on conflict of options
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Wav2Vec2ForCTC] example typo fixed (#11878)
* Ensure input tensor are on device. (#11874)
The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.
* Fix usage of head masks by TF encoder-decoder models' `generate()` function (#11775)
* Fix Bart
* Fix Blenderbot{,_small}
* Fix LED
* Fix Marian
* Fix MBart
* Fix Pegasus
* Fix T5
* Add test for generation with head_mask
* Add a common TF test
* Override a test for the LED model as head masking is not yet properly implemented
* Remove all head_masks from input preparation for LED
* Drop masking for T5 as it needs a bit of refactor
* Correcting comments in T5Stack to reflect correct tuple order (#11330)
* Correcting comments to reflect correct tuple order
In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.
* Update modeling_t5.py
Updating another comment as well
* Removing extra space
* Fixing style and quality
* style & quality
* Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Allow dataclasses to be jitted (#11886)
* fix_torch_device_generate_test
* remove @
* change dataclasses to flax ones
* fix typo
* fix jitted tests
* fix bert & electra
* changing find_batch_size to work with tokenizer outputs (#11890)
* changing find_batch_size to work with tokenizer outputs
trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.
* Trigger CI
Co-authored-by: jrenner <joseph.renner@inria.fr>
* Link official Cloud TPU JAX docs (#11892)
* Flax Generate (#11777)
* fix_torch_device_generate_test
* remove @
* add
* indexing
* correct a couple of tests
* fix tests
* add logits processor
* finish top_k, top_p, temp
* add docs
* correct flax prng key default
* improve generate
* add generation docs
* add docs
* make style
* revert model outputs change
* make style
* correct typo
* fix tests
* fix slow test
* add raise
* finish generation
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Add Emotion Speech Noteboook (#11900)
* Update deepspeed config to reflect hyperparameter search parameters (#11896)
* rebuild deepspeed config for hyperparameter search
* reformat code to fix style issues
* Adding new argument `max_new_tokens` for generate. (#11476)
* Adding new argument `max_new_tokens` for generate.
This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.
* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.
* Removed redundant `no_grad`.
* Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes
* fix tests
* [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
* Added logic to return attention from flax-bert model and added test cases to check that
* Added new line at the end of file to test_modeling_flax_common.py
* fixing code style
* Fixing Roberta and Elextra models too from cpoying bert
* Added temporary hack to not run test_attention_outputs for FlaxGPT2
* Returning attention weights from GPT2 and changed the tests accordingly.
* last fixes
* bump flax dependency
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Test optuna and ray (#11924)
* Remove `datasets` submodule
* fix assert (#11935)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`
`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.
* Remove unused 'flax.linen' import
* Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)
* Add MT5ForConditionalGeneration as supported arch.
* Update README.md
* Add FlaxCLIP (#11883)
* add flax CLIP
* default input_shape
* add tests
* fix test
* fix name
* fix docs
* fix shapes
* attend at least 1 token
* flax conv to torch conv
* return floats
* fix equivalence tests
* fix import
* return attention_weights and update tests
* fix dosctrings
* address patricks comments
* input_shape arg
* add tests for get_image_features and get_text_features methods
* fix tests
* RAG-2nd2end-revamp (#11893)
* initial
* code quality test
* code quality
* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver
* minor change in test_modeling_rag
* fixed tests
* Update examples/research_projects/rag-end2end-retriever/README.md
typo corrected as suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py
type change suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update src/transformers/models/rag/retrieval_rag.py
Adding this change as mentioned by lhoestq.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* completed the minor changes suggested by the reviewers
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* modify qa-trainer (#11872)
* modify qa-trainer
* fix flax model
* bugfixes training_args.py (#11922)
modified according to:
https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html
* reinitialize wandb config for each hyperparameter search run (#11945)
* Add regression tests for slow sentencepiece tokenizers. (#11737)
* add test_vocab_size for sentencepiece tok.
* add test_get_vocab for sentencepiece tok.
* add test_convert_token_and_id for sentencepiece tok.
* add test_tokenize_and_convert_tokens_to_string for all tok.
* improve test_tokenize_and_convert_tokens_to_string for sp. tok.
* add common tokenizer integration tests
- for albert
- for barthez
* add tokenizer integration tests to bert gen.
* add most tokenizer integration tests
* fix camembert tokenizer integration test
* add tokenizer integration test to marian
* add tokenizer integration test to reformer
* add typing and doc to tokenizer_integration_test_util
* fix tokenizer integration test of reformer
* improve test_sentencepiece_tokenize_and_convert_tokens_to_string
* empty commit to trigger CI
* fix tokenizer integration test of reformer
* remove code not needed anymore
* empty commit to trigger CI
* empty commit to trigger CI
* Authorize args when instantiating an AutoModel (#11956)
* Neptune.ai integration (#11937)
An option that turns on neptune.ai logging
--report_to 'neptune'
Additional ENV variables:
NEPTUNE_PROJECT
NEPTUNE_API_TOKEN
NEPTUNE_RUN_NAME (optional)
NEPTUNE_STOP_TIMEOUT (optional)
* Run the integration tests on schedule tests instead of master tests
* [deepspeed] docs (#11940)
* deepspeed docs
* cleanup
* cleanup
* typo correction (#11973)
* typo correction
* type corrections
* ByT5 model (#11971)
* allow tf to use uneven num of layers
* add tokenizer
* finish docs
* finish docs
* Apply suggestions from code review
* include in index
* finish
* Update docs/source/model_doc/byt5.rst
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* apply sylvais suggestions
* make style
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Typo in usage example, changed to device instead of torch_device (#11979)
* [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966)
* decouple DeepSpeedConfigHF from Trainer
* add LoggingLevel ctx manager; add new test
* cleanup
* add docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* implemented suggested renames
* formatter workaround
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Trainer] add train loss and flops metrics reports (#11980)
* add train loss and flops metrics reports
* consistency
* add train_loss to skip keys
* restore on_train_end call timing
* Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5)
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [RAG] Fix rag from pretrained question encoder generator behavior (#11962)
* fix_torch_device_generate_test
* remove @
* fix rag from pretrained loading
* add test
* uplaod
* finish
* VisualBERT (#10534)
* Init VisualBERT
* Add cookie-cutter, Config, and Embeddings
* Add preliminary Model
* Add Bert analogous classes
* Add basic code for NLVR, VQA, Flickr
* Update Init
* Fix VisualBert Downstream Models
* Rename classifier to cls
* Comment position_ids buffer
* Remove sentence image predictor output
* Update output dicts
* Remove unnecessary files
* Fix Auto Modeling
* Fix transformers init
* Add conversion script
* Add conversion script
* Fix docs
* Update visualbert modelling
* Update configuration
* Style fixes
* Add model and integration tests
* Add all tests
* Update model mapping
* Add simple detector from original repository
* Update docs and configs
* Fix style
* Fix style
* Update docs
* Fix style
* Fix import issues in style
* Fix style
* Add changes from review
* Fix style
* Fix style
* Update docs
* Fix style
* Fix style
* Update docs/source/model_doc/visual_bert.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Remove convert run script
* Add changes from review
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Add changes from review
* Add visual embedding example in docs
* Fix "copied from" comments
* Add changes from review
* Fix error, style, checkpoints
* Update docs
* Fix integration tests
* Fix style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix examples (#11990)
* [docs] fix xref to `PreTrainedModel.generate` (#11049)
* fix xref to generate
* do the same for search methods
* style
* style
* Update return introduction (#11976)
Make it clear that the `forward` method now returns a dict instead of tuple.
Fix style
* [deepspeed] Move code and doc into standalone files (#11984)
* move code and docs
* style
* moved
* restore
* [deepspeed] add nvme test skip rule (#11997)
* add nvme skip rule
* fix
* Fix weight decay masking in `run_flax_glue.py` (#11964)
* Fix weight decay masking in `run_flax_glue.py`
Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`
* Fix formatting with black
* adapt results
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Flax] Refactor MLM (#12013)
* fix_torch_device_generate_test
* remove @
* finish refactor
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Deepspeed] Assert on mismatches between ds and hf args (#12021)
* wip
* add mismatch validation + test
* renames
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [TrainerArguments] format and sort __repr__, add __str__ (#12018)
* format and sort __repr__, add __str__
* typo
* use __str__ directly
* alias __repr__ = __str__
* Fixed Typo in modeling_bart.py (#12035)
* Fixed Typo in modeling_bart.py - Issue #11895
* Fixed Typo in modeling_bart.py
* fix deberta 2 tokenizer integration test (#12017)
* fix docs of past_key_values (#12049)
* [JAX] Bump jax lib (#12053)
* fix_torch_device_generate_test
* remove @
* bump up jax lib
* Fixes bug that appears when using QA bert and distilation. (#12026)
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
* Fixing all models QA clamp_ bug.
* Extend pipelines for automodel tupels (#12025)
* fix_torch_device_generate_test
* remove @
* finish
* refactor
* add test
* fix test
* Attempt at simplification.
* Small fix.
* Fixing non existing AutoModel for TF.
* Naming.
* Remove extra condition.
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Add optional grouped parsers description to HfArgumentParser (#12042)
* Adding optional argument group to HfArgumentParser
* Minor
* remove whitespace
* Minor styling
* adds metric prefix. (#12057)
* adds metric prefix.
* update tests to include prefix
* skip failing test (#12059)
* Fix integration tests (#12066)
* Fix tapas issue (#12063)
* Fix scatter function to be compatible with torch-scatter 2.7.0
* Allow test again
* updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)
* updated the original RAG implementation to be compatible with the latest PL version
* updated the requirements.txt file
* execute make style
* code quality test
* code quality
* conflix resolved in requirement.txt
* code quality
* changed the MyDDP class name to CustomDDP
* Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}
* Remove torch.Tensor in examples
* Add torch to requirements.txt in language-modeling (#12040)
* Add torch to requirements.txt in language-modeling
* Update examples/pytorch/language-modeling/requirements.txt
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Properly indent block_size (#12070)
* [Deepspeed] various fixes (#12058)
* replace deprecated config
* sub_group_size was too big
* complete deprecation removal
* [Deepspeed Wav2vec2] integration (#11638)
* wip
* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044
* cleanup
* workaround
* working 5/8 modes
* solve fp32 distributed zero3
* style
* sync
* sync
* rework
* deprecation
* cleanup
* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged
* clean up
* add a guide
* more prose
* more prose
* fix
* more prose
* sub_group_size was too big
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor
* bug fix
* make the true check explicit
* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typo
* Update run_ner.py with id2label config (#12001)
* sync LayerDrop for Wav2Vec2Encoder + tests (#12076)
* Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [test] support more than 2 gpus (#12074)
* support more than 2 gpus
* style
* Wav2Vec2 Pretraining (#11306)
* Working quantizer forward
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Remove custom outputs from the shared ones
* correct conversion
* correct bug
* add first pretrain script
* save intermediate
* static shapes
* save intermediate
* finish first pretrain script version
* more refactor
* remove wanddb
* refactor more
* improve test
* correct perplexity compute bug
* finish model implementation
* add to docs
* finish docs
* finish pretraining script
* finish pretraining script
* remove wandb
* finish PR for merge
* finish config
* finish
* make deepspeed work
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
* fix flaky test
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* pass decay_mask fn to optimizer (#12087)
* rm require_version_examples (#12088)
* [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)
* fix_torch_device_generate_test
* remove @
* fix tests
* Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)
* Add text_column_name and label_column_name to run_ner args
* Minor fix: grouping for text and label column name
* CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)
* Resize with kept aspect ratio
* Fixed failed test
* Overload center_crop and resize methods instead
* resize should handle non-PIL images
* update slow test
* Tensor => tensor
Co-authored-by: patil-suraj <surajp815@gmail.com>
* New TF GLUE example (#12028)
* Pushing partially-complete new GLUE example
* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.
* Fix to the fit() call
* Bugfixes, making sure TPU and multi-GPU support is ready
* Remove logger line that depends on Pytorch
* Style pass
* Deleting old TF GLUE example
* Include label2id and id2label in the saved model config
* Don't clobber the existing model.config.label2id
* Style fixes
* Update examples/tensorflow/text-classification/run_glue.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix quality
* Update README.md to cover the TF GLUE example.
* Minor style edits
* Appending label2id and id2label to models to ensure inference works properly (#12102)
* Fix a condition in test_generate_with_head_masking (#11911)
* Fix a condition in test_generate_with_head_masking
* Fix usage of head_mask in bigbirg_pegasus
* Fix head masking for speech2text
* Resolve copy mismatch + drop unwanted print statement
* Fix the condition
* Flax VisionTransformer (#11951)
* adding vit for flax
* added test for Flax-vit and some bug-fixes
* overrided methods where variable changes were necessary for flax_vit test
* added FlaxViTForImageClassification for test
* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* made changes suggested in PR
* Adding jax-vit models for autoimport
* swapping num_channels and height,width dimension
* fixing the docstring for torch-like inputs for VIT
* add model to main init
* add docs
* doc, fix-copies
* docstrings
* small test fixes
* fix docs
* fix docstr
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add relevant description to tqdm in examples (#11927)
* add relevant `desc` in examples
* require_version datasets>=1.8.0
* Fix head masking generate tests (#12110)
* fix_torch_device_generate_test
* remove @
* fix tests
* Flax CLM script (#12023)
* first draft
* max_seq_length => block_size
* fix arg names
* fix typos
* fix loss calculation
* add max examples, fix train eval steps, metrics
* optimizer mask
* fix perpelexity, metric logging
* fix logging
* data_collator = > data_loader
* refactor loss_fn
* support single GPU
* pass distributed to write_metric
* fix jitting
* fix single device training
* fix single device metrics
* close inner progress bars once finished
* add overwrite_cache arg
* ifx dataset caching issue
* add more logs
* few small fixes,
* address nicholas suggestions
* fix docstr
* address patricks suggestions
* make flake happy
* pass new new_dropout_rng to apply_gradients
* reset train metrics after every epoc
* remove distributed logis, small fixes
* Add from_pretrained to dummy timm objects (#12097)
* Add from_pretrained to dummy timm
* Fix at the source
* Update utils/check_dummies.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Missing pretrained dummies
* Style
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix t5 error message (#12136)
* Fix t5 error message
* Fix again
* Fix megatron_gpt2 attention block's causal mask (#12007)
* Fix megatron_gpt2 attention block's causal mask.
* compatibility with checkpoints created with recent versions of Megatron-LM
* added integration test for the released Megatron-GPT2 model
* code style changes
* added option to megatron conversion script to read from config file
Co-authored-by: Guido Novati <gnovati@nvidia.com>
* Add mlm pretraining xla torch readme (#12011)
* fix_torch_device_generate_test
* remove @
* upload
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/flax/language-modeling/README.md
* add more info
* finish
* fix
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* add readme for flax clm (#12111)
* add readme for flax clm
* use section link for tokenizer
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update metrics
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* FlaxBart (#11537)
* Start working on FlaxBart
* Create modeling_flax_bart.py
* Write FlaxBartAttention
* Add FlaxBartEncoderLayer
* Add FlaxBartDecoderLayer and some typing
* Add helepr function for FlaxBart
* shift_tokens_right
* _make_causal_mask
* _expand_mask
* Add PositionalEmbedding and fix init_std naming
* Add FlaxBartPretrainedModel
* Add FlaxBartEncoder
* Add FlaxBartEncoder
* Add FlaxBartEncoder among modules to be imported
* YET WE CANNOT INITIALIZE THAT!! :(
* Make BartEncoder working
Change BartEncoder to instance of nn.Module so far
* Add FlaxBartDecoder
* Add FlaxBartModel
* TODO to make model run -> Prepapre model inputs
* Resolve padding
* Add FlaxBartModel
* Add FlaxBartModel into importable modules
* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
* make style; not properly working
* make style; make quality not pass due to some import I left
* Remove TODO for padding_idx in nn.Embed so far
* Add FlaxBartForConditionalGeneration
* Incorporate Flax model output classes, i.e. return_dict
* Add another models and incorporate use_cache arg
* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
* Incorporate use_cache arg from PyTorch implementation
* Add all necessary Flax output utils
* Add FlaxBartForCausalLM; not working yet'
* Add minor improvements; still lacks some functionality
* Update docs, src and tests
* Add support of FlaxBart to docs/source
* Fix some bugs in FlaxBart souce code
* Add some neccessary tests for FlaxBart models - jit_compilation not passing
* Fix tests and add test_head_masking
* Fix tests for @jax.jit computation
* Add test_head_masking
* Migrate FlaxBart tests from jax.numpy to numpy
* Remove FlaxBartForCausalLM
* Clean repo
* fix bart model weight structure
* Fix FlaxBartForSequenceClassification
Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.
* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
* Allow testing for FlaxBartForQA for pt_flax equivalence
* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
* remove past_key_values
* remove inputs_mebeds and make input_ids required
* add position ids
* re-write attention layer
* fix dataclass
* fix pos embeds and attention output
* fix pos embeds
* expose encode method
* expose decode method
* move docstring to top
* add cache for causal attn layer
* remove head masking for now
* s2s greedy search first pass
* boom boom
* fix typos
* fix greedy generate for bart
* use encoder, decoder layers instead of num_hidden_layers
* handle encoder_outputs
* cleanup
* simplify decoding
* more clean-up
* typos
* Change header + add {decoder_,}position_ids into 2 models
* add BartConfig
* fix existing tests
* add encode, decode methods
* Fix shift_tokens_right for JIT compilation + clarify one condition
* fix decode
* encoder => encode
* simplify generate
* add tests for encode and decode
* style
* add tests for cache
* fix equivalence tests
* sample generate now works with seq2seq
* generation tests
* initialize dense layers
* docstring and cleanup
* quality
* remove get/set input_embeddings
* address Patricks suggestions
* decode for every model, remove encoder_outputs from call
* update tests accordingly
* decode returns only decoder outputs and logits
* fix arguments
* doc encode, decode methods
* correct base_model_prefix
* fix test for seq classif model
* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)
* feature for tokenizer without slow/legacy version
* format
* modify common test
* add tests
* add PreTrainedTokenizerFast to AutoTokenizer
* format
* change tokenizer common test in order to be able to run test without a slow version
* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`
* add autokenizer test
* replace `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`
* remove obsolete change in comment
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change `get_main_tokenizer` into `get_tokenizers`
* clarify `get_tokenizers` method
* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`
* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version
* `test_rust_tokenizer = False` for BertJapaneseTokenizer
* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Flax] Add links to google colabs (#12146)
* fix_torch_device_generate_test
* remove @
* add colab links
* Don't log anything before logging is setup in examples (#12121)
* Don't log anything before logging is setup in examples
* Last example
* Use text_column_name variable instead of "text" (#12132)
* Use text_column_name variable instead of "text"
`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
* black formatting
* make style
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
* [lm examples] Replicate --config_overrides addition to other LM examples (#12135)
* [lm examples] Replicate --config_overrides addition to other LM examples
* Removing no trainer files changes
* Update README
Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
* fix error message (#12148)
* [optim] implement AdafactorSchedule (#12123)
* implement AdafactorSchedule
* typo
* fix
* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [style] consistent nn. and nn.functional (#12124)
* consistent nn. and nn.functional
* fix glitch
* fix glitch #2
* Adding TFWav2Vec2Model (#11617)
* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Fix flax pt equivalence tests (#12154)
* fix_torch_device_generate_test
* remove @
* upload
* consistent nn. and nn.functional: p2 templates (#12153)
* Flax Big Bird (#11967)
* add flax bert
* bert -> bigbird
* original_full ported
* add debugger
* init block sparse
* fix copies ; gelu_fast -> gelu_new
* block sparse port
* fix block sparse
* block sparse working
* all ckpts working
* fix-copies
* make quality
* init tests
* temporary fix for FlaxBigBirdForMultipleChoice
* skip test_attention_outputs
* fix
* gelu_fast -> gelu_new ; fix multiple choice model
* remove nsp
* fix sequence classifier
* fix
* make quality
* make fix-copies
* finish
* Delete debugger.ipynb
* Update src/transformers/models/big_bird/modeling_flax_big_bird.py
* make style
* finish
* bye bye jit flax tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [style] consistent nn. and nn.functional: part 3 `tests` (#12155)
* consistent nn. and nn.functional: p3 templates
* restore
* [style] consistent nn. and nn.functional: part 4 `examples` (#12156)
* consistent nn. and nn.functional: p4 examples
* restore
* consistent nn. and nn.functional: part 5 docs (#12161)
* Add video links to the documentation (#12162)
* [Flax generate] Add params to generate (#12171)
* fix_torch_device_generate_test
* remove @
* add params as input
* finish
* Use a released version of optax rather than installing from Git. (#12173)
Use a released version of optax rather than installing from Git
* Have dummy processors have a `from_pretrained` method (#12145)
* Add course banner (#12157)
* Add course banner
* Update course banner
* Adjust banner width
* Enable add_prefix_space if model_type is roberta or gpt2 (#12116)
* Update AutoModel classes in summarization example (#12178)
- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`
This silences all warnings.
* Ray Tune Integration Updates (#12134)
* fix
* fixes
* add back to scheduled tests
* formatting
* Update integrations.py
* [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)
* ensure concurrent pytest workers use a unique port for torch.distributed.launch
* reword
* Model card defaults (#12122)
* [WIP] Model card defaults
* finetuned_from default value
* Add all mappings to the mapping file
* Be more defensive on finetuned_from arg
* Add default task tag
* Separate tags from tasks
* Edge case for dataset
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Temporarily deactivate torch-scatter while we wait for new release (#12181)
* Temporarily deactivate torch-scatter while we wait for new release
* torch-1.8.1 binary for scatter
* Revert to 1.8.0
* Pin torch dependency
* torchaudio and torchvision
* Temporarily deactivate torchhub test (#12184)
* [Flax] Add Beam Search (#12131)
* fix_torch_device_generate_test
* remove @
* push new logit processors
* add processors
* save first working version
* save intermediate
* finish
* make style
* make fix-copies
* finish
* Update tests/test_modeling_flax_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Hubert (#11889)
* fix_torch_device_generate_test
* remove @
* add hubert
* add first test file
* more docs
* fix bugs
* fix bug
* finish
* finish
* finish docstring
* fix
* fix
* finalize
* add to ignored
* finish
* Apply suggestions from code review
* correct naming
* finish
* fix auto config
* finish
* correct convert script
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* apply suggestions lysandre & suraj
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* updated DLC images and sample notebooks (#12191)
* Enabling AutoTokenizer for HubertConfig. (#12198)
* Use yaml to create metadata (#12185)
* Use yaml to create metadata
* Fix typo
* Remove pin
* [Docs] fixed broken link (#12205)
* fixed broken link
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Pipeline update & tests (#12207)
* Improve detr (#12147)
* Remove unused variables
* Improve docs
* Fix docs of segmentation masks
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add link to the course (#12229)
* Support for torch 1.9.0 (#12224)
* Support for torch 1.9.0
* Torch scatter for 1.9.0
* Github Actions run on 1.9.0
* fix pt-1.9.0 `add_` deprecation (#12217)
* fix pt-1.9.0 add_ deprecation
* add () for clarity
* Trigger CI
* require_version(torch
* Release: v4.7.0
* Docs for v4.8.0
* AutoTokenizer: infer the class from the tokenizer config if possible (#12208)
* AutoTokenizer: infer the class from the tokenizer config if possible
* Add tests
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update desc for map in all examples (#12226)
* update desc for map in all examples
* added plm
* suggestions
* [Flax] FlaxAutoModelForSeq2SeqLM (#12228)
* add FlaxAutoModelForSeq2SeqLM
* [FlaxBart] few small fixes (#12247)
* boom boom
* remove flax clip example
* few small fixes
* Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)
* Moved Mish to Torch 1.9 version
* Run black formatting
* [t5 doc] make the example work out of the box (#12239)
* [run_clm.py] restore caching
* style
* [t5 doc] make the example work out of the box
This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.
* Update docs/source/model_doc/t5.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* expand the other example
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Fix the scheduled CI
* Better CI feedback (#12279)
* Better run ID
* Only part of CI
* Revert "Only part of CI"
This reverts commit
|
||
![]() |
0dcc3c86e4
|
[doc] DP/PP/TP/etc parallelism (#12524)
* wip * complete the doc * missing img * improve * correction * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
fb65f65ea6
|
Add TFHubertModel (#12206)
* TFHubert * Update with TFWav2Vec Bug Fixes * Add OOV Error * Feedback changes * Fix kwargs call |
||
![]() |
65e27215ba
|
[Flax] Add flax marian (#12595)
* fix_torch_device_generate_test * remove @ * add marian * finish make style * add model * add docs * add test * add integration tests * up * solve bug * correct tests * correct some tests * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct adapt marian * finish Co-authored-by: Patrick von Platen <patrick@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
61400e1ec7
|
[Flax] Add FlaxMBart (#12236)
* Copy BART to MBart and rename some stuff * Add copy statements pointing to FlaxBart * Update/add some common files * Update shift_tokens_rigth + fix imports * Fix shift_tokens_right method according to MBart implementation * Update shift_tokens_right in tests accordingly * Fix the import issue and update docs file * make style quality * Do some minor changes according to patil-suraj suggestions * Change the order of normalization layer and attention * Add some copu statementes * Update generate method and add integration test for mBart * Make a few updates after a review Besides, add `lang_code_to_id` to MBartTokenizeFast * fix-copies; make style quality * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * fix output type, style * add copied from * resolve conflicts Co-authored-by: Suraj Patil <surajp815@gmail.com> |
||
![]() |
7a259c190c
|
FlaxGPTNeo (#12493)
* flax gpt neo * fix query scaling * update generation test * use flax model for test |
||
![]() |
0d1f67e651
|
[Flax] Add wav2vec2 (#12271)
* fix_torch_device_generate_test * remove @ * start flax wav2vec2 * save intermediate * forward pass has correct shape * add weight norm * add files * finish ctc * make style * finish gumbel quantizer * correct docstrings * correct some more files * fix vit * finish quality * correct tests * correct docstring * correct tests * start wav2vec2 pretraining script * save intermediate * start pretraining script * finalize pretraining script * finish * finish * small typo * finish * correct * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> * make style * push Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> |
||
![]() |
6e68597877
|
Add CANINE (#12024)
* First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Add support for hidden_states and attentions of shallow encoders * Define custom CanineModelOutputWithPooling, tests pass * First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Make conversion script work for Canine-c too * Fix tokenizer tests * Remove file Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
e98233dde1
|
Flax T5 (#12150)
* copy pytorch-t5 * init * boom boom * forward pass same * make generation work * add more tests * make test work * finish normal tests * make fix-copies * finish quality * correct slow example * correct slow test * version table * upload models * Update tests/test_modeling_flax_t5.py * correct incorrectly deleted line Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick@huggingface.co> |
||
![]() |
bfd5da8e28
|
[docs] performance (#12258)
* initial performance document * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * rewrites based on suggestions * 8x multiple is for AMP only * add contribute section Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> |
||
![]() |
afdd9e3663
|
Add link to the course (#12229) | ||
![]() |
ccca510276
|
Hubert (#11889)
* fix_torch_device_generate_test * remove @ * add hubert * add first test file * more docs * fix bugs * fix bug * finish * finish * finish docstring * fix * fix * finalize * add to ignored * finish * Apply suggestions from code review * correct naming * finish * fix auto config * finish * correct convert script * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com> * apply suggestions lysandre & suraj Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com> |
||
![]() |
d9c0d08f9a
|
Flax Big Bird (#11967)
* add flax bert * bert -> bigbird * original_full ported * add debugger * init block sparse * fix copies ; gelu_fast -> gelu_new * block sparse port * fix block sparse * block sparse working * all ckpts working * fix-copies * make quality * init tests * temporary fix for FlaxBigBirdForMultipleChoice * skip test_attention_outputs * fix * gelu_fast -> gelu_new ; fix multiple choice model * remove nsp * fix sequence classifier * fix * make quality * make fix-copies * finish * Delete debugger.ipynb * Update src/transformers/models/big_bird/modeling_flax_big_bird.py * make style * finish * bye bye jit flax tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> |
||
![]() |
d438eee030
|
Adding TFWav2Vec2Model (#11617)
* [WIP] Add TFWav2Vec2Model Work in progress for adding a tensorflow version of Wav2Vec2 * feedback changes * small fix * Test Feedback Round 1 * Add SpecAugment and CTC Loss * correct spec augment mask creation * docstring and correct copyright * correct bugs * remove bogus file * finish tests correction * del unnecessary layers * Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * correct final bug * Feedback Changes Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> |
||
![]() |
4a51b1dd9b
|
FlaxBart (#11537)
* Start working on FlaxBart * Create modeling_flax_bart.py * Write FlaxBartAttention * Add FlaxBartEncoderLayer * Add FlaxBartDecoderLayer and some typing * Add helepr function for FlaxBart * shift_tokens_right * _make_causal_mask * _expand_mask * Add PositionalEmbedding and fix init_std naming * Add FlaxBartPretrainedModel * Add FlaxBartEncoder * Add FlaxBartEncoder * Add FlaxBartEncoder among modules to be imported * YET WE CANNOT INITIALIZE THAT!! :( * Make BartEncoder working Change BartEncoder to instance of nn.Module so far * Add FlaxBartDecoder * Add FlaxBartModel * TODO to make model run -> Prepapre model inputs * Resolve padding * Add FlaxBartModel * Add FlaxBartModel into importable modules * Remove FlaxBartEncoder and FlaxBartDecoder from importable modules * make style; not properly working * make style; make quality not pass due to some import I left * Remove TODO for padding_idx in nn.Embed so far * Add FlaxBartForConditionalGeneration * Incorporate Flax model output classes, i.e. return_dict * Add another models and incorporate use_cache arg * Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering * Incorporate use_cache arg from PyTorch implementation * Add all necessary Flax output utils * Add FlaxBartForCausalLM; not working yet' * Add minor improvements; still lacks some functionality * Update docs, src and tests * Add support of FlaxBart to docs/source * Fix some bugs in FlaxBart souce code * Add some neccessary tests for FlaxBart models - jit_compilation not passing * Fix tests and add test_head_masking * Fix tests for @jax.jit computation * Add test_head_masking * Migrate FlaxBart tests from jax.numpy to numpy * Remove FlaxBartForCausalLM * Clean repo * fix bart model weight structure * Fix FlaxBartForSequenceClassification Slicing is not possible to use below jit, therefore, selecting sentence representation from hidden_states must be changed. * Allow FlaxBartForSequenceClassification for testing pt_flax equivalence * Allow testing for FlaxBartForQA for pt_flax equivalence * Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6 * remove past_key_values * remove inputs_mebeds and make input_ids required * add position ids * re-write attention layer * fix dataclass * fix pos embeds and attention output * fix pos embeds * expose encode method * expose decode method * move docstring to top * add cache for causal attn layer * remove head masking for now * s2s greedy search first pass * boom boom * fix typos * fix greedy generate for bart * use encoder, decoder layers instead of num_hidden_layers * handle encoder_outputs * cleanup * simplify decoding * more clean-up * typos * Change header + add {decoder_,}position_ids into 2 models * add BartConfig * fix existing tests * add encode, decode methods * Fix shift_tokens_right for JIT compilation + clarify one condition * fix decode * encoder => encode * simplify generate * add tests for encode and decode * style * add tests for cache * fix equivalence tests * sample generate now works with seq2seq * generation tests * initialize dense layers * docstring and cleanup * quality * remove get/set input_embeddings * address Patricks suggestions * decode for every model, remove encoder_outputs from call * update tests accordingly * decode returns only decoder outputs and logits * fix arguments * doc encode, decode methods * correct base_model_prefix * fix test for seq classif model * fix docs Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> |
||
![]() |
9a9314f6d9
|
Flax VisionTransformer (#11951)
* adding vit for flax * added test for Flax-vit and some bug-fixes * overrided methods where variable changes were necessary for flax_vit test * added FlaxViTForImageClassification for test * Update src/transformers/models/vit/modeling_flax_vit.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * made changes suggested in PR * Adding jax-vit models for autoimport * swapping num_channels and height,width dimension * fixing the docstring for torch-like inputs for VIT * add model to main init * add docs * doc, fix-copies * docstrings * small test fixes * fix docs * fix docstr * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style Co-authored-by: jayendra <jayendra@infocusp.in> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> |
||
![]() |
d3eacbb829
|
Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one * Improve docs * Fix tests * Style * Improve docs some more and fix most tests * Fix slow tests of ViT, DeiT and DETR * Improve replacement of batch norm * Restructure timm backbone forward * Make DetrForSegmentation support any timm backbone * Fix name of output * Address most comments by @LysandreJik * Give better names for variables * Conditional imports + timm in setup.py * Address additional comments by @sgugger * Make style, add require_timm and require_vision to testsé * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone * Add png files to fixtures * Fix type hint * Add timm to workflows * Add `BatchNorm2d` to the weight initialization * Fix retain_grad test * Replace model checkpoints by Facebook namespace * Fix name of checkpoint in test * Add user-friendly message when scipy is not available * Address most comments by @patrickvonplaten * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner * Better initialization * Scipy is necessary to get sklearn metrics * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel * Make style * Improve docs and add 2 community notebooks Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> |
||
![]() |
88ca6a231d
|
VisualBERT (#10534)
* Init VisualBERT * Add cookie-cutter, Config, and Embeddings * Add preliminary Model * Add Bert analogous classes * Add basic code for NLVR, VQA, Flickr * Update Init * Fix VisualBert Downstream Models * Rename classifier to cls * Comment position_ids buffer * Remove sentence image predictor output * Update output dicts * Remove unnecessary files * Fix Auto Modeling * Fix transformers init * Add conversion script * Add conversion script * Fix docs * Update visualbert modelling * Update configuration * Style fixes * Add model and integration tests * Add all tests * Update model mapping * Add simple detector from original repository * Update docs and configs * Fix style * Fix style * Update docs * Fix style * Fix import issues in style * Fix style * Add changes from review * Fix style * Fix style * Update docs * Fix style * Fix style * Update docs/source/model_doc/visual_bert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add changes from review * Remove convert run script * Add changes from review * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/visual_bert/modeling_visual_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add changes from review * Add changes from review * Add visual embedding example in docs * Fix "copied from" comments * Add changes from review * Fix error, style, checkpoints * Update docs * Fix integration tests * Fix style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
7ec596ecda
|
[DeepSpeed] decouple DeepSpeedConfigHF from Trainer (#11966)
* decouple DeepSpeedConfigHF from Trainer * add LoggingLevel ctx manager; add new test * cleanup * add docs * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * implemented suggested renames * formatter workaround Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
47a98fc4cb
|
ByT5 model (#11971)
* allow tf to use uneven num of layers * add tokenizer * finish docs * finish docs * Apply suggestions from code review * include in index * finish * Update docs/source/model_doc/byt5.rst Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * apply sylvais suggestions * make style Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> |
||
![]() |
ad25fd62bd
|
Add FlaxCLIP (#11883)
* add flax CLIP * default input_shape * add tests * fix test * fix name * fix docs * fix shapes * attend at least 1 token * flax conv to torch conv * return floats * fix equivalence tests * fix import * return attention_weights and update tests * fix dosctrings * address patricks comments * input_shape arg * add tests for get_image_features and get_text_features methods * fix tests |
||
![]() |
206f06f2dd
|
Add new model RoFormer (use rotary position embedding ) (#11684)
* add roformer * Update docs/source/model_doc/roformer.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/roformer.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * update * add TFRoFormerSinusoidalPositionalEmbedding and fix TFMarianSinusoidalPositionalEmbedding * update docs * make style and make quality * roback * unchanged * rm copies from , this is a error in TFMarianSinusoidalPositionalEmbedding * update Copyright year * move # Add modeling imports here to the correct position * max_position_embeddings can be set to 1536 * # Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->RoFormer * # Copied from transformers.models.bert.modeling_bert.BertLayer.__init__ with Bert->RoFormer * update tokenization_roformer * make style * add staticmethod apply_rotary_position_embeddings * add TF staticmethod apply_rotary_position_embeddings * update torch apply_rotary_position_embeddings * fix tf apply_rotary_position_embeddings error * make style * add pytorch RoFormerSelfAttentionRotaryPositionEmbeddingTest * add TF rotary_position_embeddings test * update test_modeling_rofomer * Update docs/source/model_doc/roformer.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/convert_roformer_original_tf_checkpoint_to_pytorch.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/modeling_roformer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/modeling_roformer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/roformer/modeling_tf_roformer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * refact roformer tokenizer * add RoFormerTokenizerFast * add RoFormerTokenizationTest * add require_jieba * update Copyright * update tokenizer & add copy from * add option rotary_value * use rust jieba * use rjieba * use rust jieba * fix test_alignement_methods * slice normalized_string is too slow * add config.embedding_size when embedding_size!=hidden_size * fix pickle tokenizer * Update docs/source/model_doc/roformer.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style and make quality Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> |
||
![]() |
ca33278fdb
|
FlaxGPT2 (#11556)
* flax gpt2 * combine masks * handle shared embeds * add causal LM sample * style * add tests * style * fix imports, docs, quality * don't use cache * add cache * add cache 1st version * make use cache work * start adding test for generation * finish generation loop compilation * rewrite test * finish * update * update * apply sylvains suggestions * update * refactor * fix typo Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> |
||
![]() |
cebb96f53a
|
Add more subsections to main doc (#11758)
* add headers to main doc * Apply suggestions from code review * update * upload |
||
![]() |
0fc56df5fb
|
Add visual + link to Premium Support webpage (#11740)
* Update README.md * Update index.rst |
||
![]() |
f063c56d94
|
Fix clip docs (#11694)
* fix doc url * fix example |
||
![]() |
8719afa1ad
|
CLIP (#11445)
* begin second draft * fix import, style * add loss * fix embeds, logits_scale, and projection * fix imports * add conversion script * add feature_extractor and processor * style * add tests for tokenizer, extractor and processor * add vision model tests * add weight init * add more tests * fix save_load test * model output, dosstrings, causal mask * config doc * add clip model tests * return dict * bigin integration test * add integration tests * fix-copies * fix init * Clip => CLIP * fix module name * docs * fix doc * output_dim => projection_dim * fix checkpoint names * remoe fast tokenizer file * fix conversion script * fix tests, quality * put causal mask on device * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix attribute test * style * address sylvains comments * style * fix docstrings * add qucik_gelu in activations, docstrings * clean-up attention test * fix act fun * fix config * fix torchscript tests * even batch_size * remove comment * fix ouput tu_tuple * fix save load tests * fix add tokens test * add fast tokenizer * update copyright * new processor API * fix docs * docstrings * docs * fix doc * fix doc * fix tokenizer * fix import in doc example * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * check types of config * valhalla => openai * load image using url * fix test * typo Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
f7f872955d
|
Big Bird Fast Tokenizer implementation (#11075)
* Added Big Bird Fast Tokenizer initial file * style fixes * flake fixes * Added big bird fast tokenizer to init files * Added big bird fast to Auto tokenization * fix styles * minor quality fixes * Added initial test code * Fix SpmConverter when precompiled_charsmap doesn't exist * fixed post processor * minor style fix * minor fix input names * Actually fix identity normalization * style * Added token type ids to fast tokenizer * style * flake fix * fix copies Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com> |
||
![]() |
dc3f6758cf
|
Add BigBirdPegasus (#10991)
* init bigbird pegasus * add debugging nb ; update config * init conversion * update conversion script * complete conversion script * init forward() * complete forward() * add tokenizer * add some slow tests * commit current * fix copies * add docs * add conversion script for bigbird-roberta-summarization * remove TODO * small fixups * correct tokenizer * add bigbird core for now * fix config * fix more * revert pegasus-tokenizer back * make style * everything working for pubmed; yayygit status * complete tests finally * remove bigbird pegasus tok * correct tokenizer * correct tests * add tokenizer files * finish make style * fix test * update * make style * fix tok utils base file * make fix-copies * clean a bit * small update * fix some suggestions * add to readme * fix a bit, clean tests * fix more tests * Update src/transformers/__init__.py * Update src/transformers/__init__.py * make fix-copies * complete attn switching, auto-padding left * make style * fix auto-padding test * make style * fix batched attention tests * put tolerance at 1e-1 for stand-alone decoder test * fix docs * fix tests * correct slow tokenizer conversion * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * complete remaining suggestions * fix test Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
0afe4a90f9
|
[Flax] Add Electra models (#11426)
* add electra model to flax * Remove Electra Next Sentence Prediction model added by mistake * fix parameter sharing and loosen equality threshold * fix styling issues * add mistaken removen imports * fix electra table * Add FlaxElectra to automodels and fixe docs * fix issues pointed out the PR * fix flax electra to comply with latest changes * remove stale class * add copied from Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> |
||
![]() |
f3cf8ae7b3
|
Add LUKE (#11223)
* Rebase with master * Minor bug fix in docs * Copy files from adding_luke_v2 and improve docs * change the default value of use_entity_aware_attention to True * remove word_hidden_states * fix head models * fix tests * fix the conversion script * add integration tests for the pretrained large model * improve docstring * Improve docs, make style * fix _init_weights for pytorch 1.8 * improve docs * fix tokenizer to construct entity sequence with [MASK] entity when entities=None * Make fix-copies * Make style & quality * Bug fixes * Add LukeTokenizer to init * Address most comments by @patil-suraj and @LysandreJik * rename _compute_extended_attention_mask to get_extended_attention_mask * add comments to LukeSelfAttention * fix the documentation of the tokenizer * address comments by @patil-suraj, @LysandreJik, and @sgugger * improve docs * Make style, quality and fix-copies * Improve docs * fix docs * add "entity_span_classification" task * update example code for LukeForEntitySpanClassification * improve docs * improve docs * improve the code example in luke.rst * rename the classification layer in LukeForEntityClassification from typing to classifier * add bias to the classifier in LukeForEntitySpanClassification * update docs to use fine-tuned hub models in code examples of the head models * update the example sentences * Make style & quality * Add require_torch to tokenizer tests * Add require_torch to tokenizer tests * Address comments by @sgugger and add community notebooks * Make fix-copies Co-authored-by: Ikuya Yamada <ikuya@ikuya.net> |
||
![]() |
282f3ac3ef
|
[debug utils] activation/weights underflow/overflow detector (#11274)
* sync * add activation overflow debug utility * cleanup * document detect_overflow * import torch * add deprecation warning * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * convert to rst, add note * add class * fix docs * improve the doc * rework to dump a lot more info about each frame * complete expansion * cleanup * format * cleanup * doesn't have to be transformers * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * wrap long line * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
30ede8994e
|
Implement Fast Tokenization for Deberta (#11387) | ||
![]() |
2d27900b5d
|
Update min versions in README and add Flax (#11472)
* Update min versions in README and add Flax * Adapt index |
||
![]() |
63ca402380
|
[troubleshooting] add 2 points of reference to the offline mode (#11236)
* add 2 points of reference to the offline mode * link the new doc * add error message * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * rename * Trigger CI Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
22fa0a6004
|
Add documentation for BertJapanese (#11219)
* Start writing BERT-Japanese doc * Fix typo, Update toctree * Modify model file to use comment for document, Add examples * Clean bert_japanese by make style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Split a big code block into two * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add prefix >>> to all lines in code blocks * Clean bert_japanese by make fixup Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
9f1260971f
|
Add DeiT (PyTorch) (#11056)
* First draft of deit * More improvements * Remove DeiTTokenizerFast from init * Conversion script works * Add DeiT to ViT conversion script * Add tests, add head model, add support for deit in vit conversion script * Update model checkpoint names * Update image_mean and image_std, set resample to bicubic * Improve docs * Docs improvements * Add DeiTForImageClassificationWithTeacher to init * Address comments by @sgugger * Improve feature extractors * Make fix-copies * Minor fixes * Address comments by @patil-suraj * All models uploaded * Fix tests * Remove labels argument from DeiTForImageClassificationWithTeacher * Fix-copies, style and quality * Fix tests * Fix typo * Multiple docs improvements * More docs fixes |
||
![]() |
0c6fcd3034
|
Added documentation for data collator. (#10941)
* Added documentation for data collator. * Update docs/source/data_collator.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Added documentation for data collator. * Added documentation for the data collator. * Merge branch 'doc_DataCollator' of C:\Users\mahii\PycharmProjects\transformers with conflicts. * Update documentation for the data collator. * Update documentation for the data collator. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Amna <A.A.Ahmad@student.tudelft.nl> |
||
![]() |
fb41f9f50c
|
Add a special tokenizer for CPM model (#11068)
* Add a special tokenizer for CPM model * make style * fix * Add docs * styles * cpm doc * fix ci * fix the overview * add test * make style * typo * Custom tokenizer flag * Add REAMDE.md Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> |
||
![]() |
02ec02d6d3
|
Add nvidia megatron models (#10911)
* Add support for NVIDIA Megatron models * Add support for NVIDIA Megatron GPT2 and BERT Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This commit includes a script to convert a Megatron-GPT2 checkpoint downloaded from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details. Add the megatron_bert model. That model is implemented as a modification of the existing BERT model in Transformers. This commit includes a script to convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details. * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Remove model.half in tests + add "# Copied ..." Remove the model.half() instruction which makes tests fail on the CPU. Add a comment "# Copied ..." before many classes in the model to enable automatic tracking in CI between the new Megatron classes and the original Bert ones. * Fix issues * Fix Flax/TF tests * Fix copyright * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/megatron_bert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/megatron_gpt2.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Resolve most of 'sgugger' comments * Fix conversion issue + Run make fix-copies/quality/docs * Apply suggestions from code review * Causal LM & merge * Fix init * Add CausalLM to last auto class Co-authored-by: Julien Demouth <jdemouth@nvidia.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> |
||
![]() |
9f4e0c23d6
|
Documentation about loading a fast tokenizer within Transformers (#11029)
* Documentation about loading a fast tokenizer within Transformers * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
30677dc743
|
Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one
* Update ViTFeatureExtractor to use image_utils instead of torchvision
* Remove torchvision and add Pillow
* Small docs improvement
* Address most comments by @sgugger
* Fix tests
* Clean up conversion script
* Pooler first draft
* Fix quality
* Improve conversion script
* Make style and quality
* Make fix-copies
* Minor docs improvements
* Should use fix-copies instead of manual handling
* Revert "Should use fix-copies instead of manual handling"
This reverts commit
|
||
![]() |
860264379f
|
GPT Neo (#10848)
* lets begin * boom boom * fix out proj in attn * fix attention * fix local attention * add tokenizer * fix imports * autotokenizer * fix checkpoint name * cleanup * more clean-up * more cleanup * output attentions * fix attn mask creation * fix imports * config doc * add tests * add slow tests * quality * add conversion script * copyright * typo * another bites the dust * fix attention tests * doc * add embed init in convert function * fix copies * remove tokenizer * enable caching * address review comments * improve config and create attn layer list internally * more consistent naming * init hf config from mesh-tf config json file * remove neo tokenizer from doc * handle attention_mask in local attn layer * attn_layers => attention_layers * add tokenizer_class in config * fix docstring * raise if len of attention_layers is not same as num_layers * remove tokenizer_class from config * more consistent naming * fix doc * fix checkpoint names * fp16 compat * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
6dfd027279
|
BigBird (#10183)
* init bigbird * model.__init__ working, conversion script ready, config updated * add conversion script * BigBirdEmbeddings working :) * slightly update conversion script * BigBirdAttention working :) ; some bug in layer.output.dense * add debugger-notebook * forward() working for BigBirdModel :) ; replaced gelu with gelu_fast * tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :) * BigBirdModel working in block-sparse attention mode :) * add BigBirdForPreTraining * small fix * add tokenizer for BigBirdModel * fix config & hence modeling * fix base prefix * init testing * init tokenizer test * pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements * remove position_embedding_type arg * complete normal tests * add comments to block sparse attention * add attn_probs for sliding & global tokens * create fn for block sparse attn mask creation * add special tests * restore pos embed arg * minor fix * attn probs update * make big bird fully gpu friendly * fix tests * remove pruning * correct tokenzier & minor fixes * update conversion script , remove norm_type * tokenizer-inference test add * remove extra comments * add docs * save intermediate * finish trivia_qa conversion * small update to forward * correct qa and layer * better error message * BigBird QA ready * fix rebased * add triva-qa debugger notebook * qa setup * fixed till embeddings * some issue in q/k/v_layer * fix bug in conversion-script * fixed till self-attn * qa fixed except layer norm * add qa end2end test * fix gradient ckpting ; other qa test * speed-up big bird a bit * hub_id=google * clean up * make quality * speed up einsum with bmm * finish perf improvements for big bird * remove wav2vec2 tok * fix tokenizer * include docs * correct docs * add helper to auto pad block size * make style * remove fast tokenizer for now * fix some * add pad test * finish * fix some bugs * fix another bug * fix buffer tokens * fix comment and merge from master * add comments * make style * commit some suggestions Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix typos * fix some more suggestions * add another patch Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix copies * another path Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * update * update nit suggestions * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> |
||
![]() |
4684bfc757
|
Layout lm tf 2 (#10636)
* Added embeddings layer * Added layoutlm layers, main model, maskedlm and token classification classes * Added model classes to tf auto models * Added model to PT to TF conversion script * Added model to doc README * Added tests * Removed unused imports * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py * Made tests pass! * Fixed typos in imports and docs * Fixed a typo in embeddings layer * Removed imports * Fixed formatting issues, imports, tests * Added layoutlm layers, main model, maskedlm and token classification classes * Added model classes to tf auto models * Added model to PT to TF conversion script * Removed unused imports * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py * Made tests pass! * Fixed typos in imports and docs * Removed imports * Fixed small formatting issues * Removed duplicates import from main __init__.py * Chnaged deafult arg to true for adding pooling layer to tf layoutlm * Fixed formatting issues * Style * Added copied from to classes copied from bert * Fixed doc strings examples to work with layoutlm inputs * Removed PyTorch reference in doc strings example * Added integration tests * Cleaned up initialization file * Updated model checkpoint identifiers * Fixed imports Co-authored-by: Amir Tahmasbi <amir@ehsai.ca> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> |
||
![]() |
77ffd5edd5
|
Amazon SageMaker Documentation (#10867)
* added finished documentation * changed version from 1.6 to 1.6.0 for distributed * updated versions * updated urls |
||
![]() |
c988db5af2 | Release v4.4.0 | ||
![]() |
602d63f05c
|
[XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models (#10648)
* add conversion script * add wav2vec2 xslr models * finish * Update docs/source/model_doc/xlsr_wav2vec2.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
d26b37e744
|
Speech2TextTransformer (#10175)
* s2t * fix config * conversion script * fix import * add tokenizer * fix tok init * fix tokenizer * first version working * fix embeds * fix lm head * remove extra heads * fix convert script * handle encoder attn mask * style * better enc attn mask * override _prepare_attention_mask_for_generation * handle attn_maks in encoder and decoder * input_ids => input_features * enable use_cache * remove old code * expand embeddings if needed * remove logits bias * masked_lm_loss => loss * hack tokenizer to support feature processing * fix model_input_names * style * fix error message * doc * remove inputs_embeds * remove input_embeds * remove unnecessary docstring * quality * SpeechToText => Speech2Text * style * remove shared_embeds * subsample => conv * remove Speech2TextTransformerDecoderWrapper * update output_lengths formula * fix table * remove max_position_embeddings * update conversion scripts * add possibility to do upper case for now * add FeatureExtractor and Processor * add tests for extractor * require_torch_audio => require_torchaudio * add processor test * update import * remove classification head * attention mask is now 1D * update docstrings * attention mask should be of type long * handle attention mask from generate * alwyas return attention_mask * fix test * style * doc * Speech2TextTransformer => Speech2Text * Speech2TextTransformerConfig => Speech2TextConfig * remove dummy_inputs * nit * style * multilinguial tok * fix tokenizer * add tgt_lang setter * save lang_codes * fix tokenizer * add forced_bos_token_id to tokenizer * apply review suggestions * add torchaudio to extra deps * add speech deps to CI * fix dep * add libsndfile to ci * libsndfile1 * add speech to extras all * libsndfile1 -> libsndfile1 * libsndfile * libsndfile1-dev * apt update * add sudo to install * update deps table * install libsndfile1-dev on CI * tuple to list * init conv layer * add model tests * quality * add integration tests * skip_special_tokens * add speech_to_text_transformer in toctree * fix tokenizer * fix fp16 tests * add tokenizer tests * fix copyright * input_values => input_features * doc * add model in readme * doc * change checkpoint names * fix copyright * fix code example * add max_model_input_sizes in tokenizer * fix integration tests * add do_lower_case to tokenizer * remove clamp trick * fix "Add modeling imports here" * fix copyrights * fix tests * SpeechToTextTransformer => SpeechToText * fix naming * fix table formatting * fix typo * style * fix typos * remove speech dep from extras[testing] * fix copies * rename doc file, * put imports under is_torch_available * run feat extract tests when torch is available * dummy objects for processor and extractor * fix imports in tests * fix import in modeling test * fxi imports * fix torch import * fix imports again * fix positional embeddings * fix typo in import * adapt new extractor refactor * style * fix torchscript test * doc * doc * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fix docs, copied from, style * fix docstring * handle imports * remove speech from all extra deps * remove s2t from seq2seq lm mapping * better names * skip training tests * add install instructions * List => Tuple * doc * fix conversion script * fix urls * add instruction for libsndfile * fix fp16 test Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
![]() |
696e8a4365
|
Add TFRag (#9002)
* Create modeling_tf_dpr.py * Add TFDPR * Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot last commit accidentally deleted these 4 lines, so I recover them back * Add TFDPR * Add TFDPR * clean up some comments, add TF input-style doc string * Add TFDPR * Make return_dict=False as default * Fix return_dict bug (in .from_pretrained) * Add get_input_embeddings() * Create test_modeling_tf_dpr.py The current version is already passed all 27 tests! Please see the test run at : https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing * fix quality * delete init weights * run fix copies * fix repo consis * del config_class, load_tf_weights They shoud be 'pytorch only' * add config_class back after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion * newline after .. note:: * import tf, np (Necessary for ModelIntegrationTest) * slow_test from_pretrained with from_pt=True At the moment we don't have TF weights (since we don't have official official TF model) Previously, I did not run slow test, so I missed this bug * Add simple TFDPRModelIntegrationTest Note that this is just a test that TF and Pytorch gives approx. the same output. However, I could not test with the official DPR repo's output yet * upload correct tf model * remove position_ids as missing keys * create modeling_tf_rag * add tests for tf * add tf tests * revert wrong pt commit * further refactor * further refactor * refactor * Update modeling_tf_rag.py - input_processing - fix prepare_input_for_generation (mostly fix generate bug) - bring back from_pretrained hack in order to test generate * delete colab pieces of code * Show case of greedy "generate" Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output. * cosmetic update * correct typos * update * push some progress * make easy check * fix rag save from pretrained * Update src/transformers/modeling_tf_utils.py * remove commented out lines * delete unnecessary lines * add simple test case for nq_checkpoint Add nq_checkpoint test to show that current version without hack still fails * temporarily put ugly hack back again * Add TFRagSequenceForGeneration!! * __init__.py , import TFRagSequenceForGeneration * Add TFRagSequence tests! * rag init.py - add TFRagSequenceForGeneration * fix from_pretrained * fix prepare_inputs_for_generation * Beam search for RagToken! * minor clean up * add tf.cast in TFRagModel * More tf.cast * Add all remaining tests (still have issues) * delete all T5 related * make style * fix load weight prefix * fix bart * fix return_dict for tf_rag make all tests pass .. Hooray * fix some tests * fix code quality * fix qualtiy check * finish tests tf rag * add tf rag to docs * remove TFT5 from docstring Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * remove TFT5 from docstring Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Delete outdated comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * improve doc strings * add generative model classes * fix adjust token logic * refactor generate for TFRag * using shape_list, not _get_shape Co-authored-by: Julien Plu <plu.julien@gmail.com> * axis=[1]->axis=1 * delete NEED_HELP comment * improve readability Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * improve readability Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * improve readability Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Indicating model is in a developing state in docstrings As suggested by Julien * small last changes * apply sylvains suggestions * finish tf rag Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: patrickvonplaten <patrick@huggingface.co> Co-authored-by: Julien Plu <plu.julien@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |