Commit Graph

1702 Commits

Author SHA1 Message Date
Yih-Dar
2199382dfd
Use random_attention_mask for TF tests (#16517)
* use random_attention_mask for TF tests

* Fix for TFCLIP test (for now).

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-01 16:53:07 +02:00
Jim Rohrer
9de70f213e
Add ONNX export for BeiT (#16498)
* Add beit onnx conversion support

* Updated docs

* Added cross reference to ViT ONNX config
2022-04-01 10:52:42 +02:00
Francesco Saverio Zuppichini
c4deb7b3ae
Feature Extractor accepts segmentation_maps (#15964)
* feature extractor accepts

* resolved conversations

* added examples in test for ADE20K

* num_classes -> num_labels

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolving conversations

* resolving conversations

* removed ADE

* CI

* minor changes in conversion script

* reduce_labels in feature extractor

* minor changes

* correct preprocess for instace segmentation maps

* minor changes

* minor changes

* CI

* debugging

* better padding

* going to update labels inside the model

* going to update labels inside the model

* minor changes

* tests

* removed changes in feature_extractor_utils

* conversation

* conversation

* example in feature extractor

* more docstring in modeling

* test

* make style

* doc

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-30 18:46:51 +02:00
Yih-Dar
2b483230a1
Raise diff tolerance value for TFViTMAEModelTest (#16483)
* Raise diff tolerance value

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-29 22:12:27 +02:00
Sander Land
d7c8ce57d4
Avoid accessing .dataset of a DataLoader in Trainer (#16451)
* Avoid accessing .dataset of a dataloader

* style

* fix

* cleaning up, reverting some misunderstandings

* black

* add train_dataset argument to get_train_dataloader, and fix other instances of length checks

* flake8

* address comments

* fix bug

* cleanup

* add test

* Update tests/trainer/test_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* under torch

* merge

* stylistic suggestion

Co-authored-by: Sander Land <sander@chatdesk.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-29 15:00:18 -04:00
Sayak Paul
5b40a37bc4
Add TF ViT MAE (#16255)
* ported TFViTMAEIntermediate and TFViTMAEOutput.

* added TFViTMAEModel and TFViTMAEDecoder.

* feat: added a noise argument in the implementation for reproducibility.

* feat: vit mae models with an additional noise argument for reproducibility.

Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-29 18:24:15 +01:00
Joao Gante
7a9ef8181c
TF: properly handle kwargs in encoder_decoder architectures (#16465)
* properly handle kwargs in encoder_decoder architectures

* make fixup
2022-03-29 18:17:47 +01:00
Yih-Dar
86cff21cf6
Fix some TF GPT-J CI testings (#16454)
* Fix for test_mixed_precision

* Fix test_saved_model_creation by using shape_list instead of shape

* skit test_model_from_pretrained on GPU for now to avoid GPU OOM

* skip test_gptj_sample_max_time for now

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-29 18:04:20 +02:00
Yih-Dar
aebca696af
Fix missing output_attentions in PT/Flax equivalence test (#16271)
* fix - set output_attentions to True

* Update tests/test_modeling_flax_common.py

* update for has_attentions

* overwrite check_outputs in FlaxBigBirdModelTest

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-03-29 17:51:48 +02:00
NielsRogge
979b039c89
Add DPT (#15991)
* First draft

* More improvements

* Add fusion blocks

* Make conversion script work for dpt_large

* Make conversion script work

* Improve implementation

* Improve conversion script

* Add DPTForSemanticSegmentation

* Make conversion work for semantic segmentation

* Add tests

* Remove print statements

* First draft

* Redesign neck

* Improve tests

* Improve implementation some more

* Make neck output list of tensors

* Improve neck and feature extractor

* Fix integration tests

* Make more tests pass

* Make all tests pass

* Add missing config archive map

* Add in_index attribute to make heads accept list of tensors

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply some more suggestions

* Add copied from statements

* Remove assert

* Apply suggestions from code review

* Apply suggestions from code review

* Remove DPTInterpolate in favor of nn.Upsample

* Add comments

* Apply suggestions from code review

* Apply suggestions from code review

* Add proposed design

* Update design

* Add DPTReassembleLayer

* Add DPTFeatureFusionStage

* Apply more suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Fix rebase

* Update in_index and out_indices

* Fix conversion script

* Fix code quality

* Add model to toctree and use DepthEstimatorOutput

* Fix rebase

* Fix code examples

* Improve code

* Fix copied from statements

* Apply suggestions from code review

* Remove compute_loss method

* Apply suggestions from code review

* Fix documentation tests file

* Remove test.py file

* Improve doc example

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
2022-03-28 16:28:10 +02:00
Sanchit Gandhi
7ca4633555
[FlaxSpeechEncoderDecoderModel] Ensure Input and Output Word Embeddings Are **Not** Tied (#16444)
* [FlaxSpeechEncoderDecoderModel] Ensure Input and Output Word Embeddings Are **Not** Tied

* rebase
2022-03-28 14:14:10 +02:00
Jaesun Park
e0ac72b7bd
Fix PerceiverMLP and test (#16405)
Co-authored-by: Jaesun Park <jaesun.park1@navercorp.com>
2022-03-28 14:06:48 +02:00
Sanchit Gandhi
925fc57b70
[Flax] Improve Robustness of Back-Prop Tests (#16418)
* [Flax] Improve Robustness of Back-Prop Tests

* check equality of logits/outputs

* make fixup
2022-03-28 11:56:54 +02:00
Daniel Stancl
ed2ee373d0
Add TF implementation of GPT-J (#15623)
* Initial commit

* Add TFGPTJModel

* Fix a forward pass

* Add TFGPTJCausalLM

* Add TFGPTJForSequenceClassification

* Add TFGPTJForQuestionAnswering

* Fix docs

* Deal with TF dynamic shapes

* Add Loss parents to models

* Adjust split and merge heads to handle 4 and 5-dim tensors

* Update outputs for @tooslow tests
2022-03-25 19:27:19 +00:00
Sanchit Gandhi
e231c72906
[FlaxSpeechEncoderDecoder] Fix feature extractor gradient test (#16407) 2022-03-25 17:46:53 +01:00
lewtun
a97f3150c4
Add ONNX support for Blenderbot and BlenderbotSmall (#15875)
* Add ONNX support for Blenderbot

* Add BlenderbotSmall ONNX configuration

* Update serialization table
2022-03-25 17:04:43 +01:00
Sylvain Gugger
b473617d63
Checkpoint sharding (#16343)
* Sharded checkpoint support

* Handle distant sharded checkpoints

* Add tests

* TODO is done

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix docstring

* Add example and format

* Address review comments

* More review comments

* End of merge

* Revert unintentional change

* VsCode what did you do?

* Style

* Changes

* Address final comments

* Quality

* Moar tests

* Move import beneath is_pt_available

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-03-25 11:59:25 -04:00
Sylvain Gugger
cae394c8fa Adapt import to new structure 2022-03-24 14:40:05 -04:00
Yih-Dar
f571dc20ac
Update PT Flax equivalence tests in PT test file (#16280)
* update PT/Flax equivalence tests on PT side

* overwrite check_outputs in BigBirdModelTest

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-24 14:45:30 +01:00
Yih-Dar
2a27c80063
Fix BigBirdModelTester (#16310)
* fix

* update the expected value in test_fast_integration

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-24 13:43:52 +01:00
Edward Beeching
aff9bc405a
Decision transformer gym (#15845)
* Created the Decision Transformer Modle

* updating tests, copy to other machine

* Added last hidden size to Decision Transformer modelling outputs

* Removed copy of original DT file

* made a temporary change to gpt2 to have it conform with the Decision Transformer version

* Updated tests

* Ignoring a file used to test the DT model

* added comments to config file

* added comments and argument descriptions to decision transformer file

* Updated doc

* Ran "make style"

* Remove old model imports

* Removed unused imports, cleaned up init file

* Update docs/source/model_doc/decision_transformer.mdx

added my username

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Reverted changes made to gpt2

* Removed datasets submodule

* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states

* Added support for return of hidden states, attentions and return dict of gpt2 model.

* Updated tests to include many of the ModelTesterMixin tests. 

The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes

* Added missing line to the end of gpt2 file

* Added an integration test for the Decision Transformer

Test performs and autoregressive evaluation for two time steps

* Set done and info to _ to fix failing test

* Updated integration test to be deterministic and check expected outputs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unnecessary config options

* Cleaned up commented code and old comments.

* Cleaned up commented code.

* Changed DecisionTransformer to Decision Transformer

* Added Decision Transformer to the main README file

* Added copy of GTP2 called DecisionTranformerGPT2Model

* isorted imports

* isorted imports

* Added model to non-English README files

* Ran make fix-copies and corrected some cases.

* Updated index file to include Decision Transformer

* Added gpt2 model as copy inside the Decision Transformer model file

* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS

* Deleted redundant checkpoint files (I don't know how these got committed)

* Removed testing files. (These should have never been committed)

* Removed accidentally committed files

* Moved the Decision Transformer test to its own directory

* Add type hints for Pegasus (#16324)

* Funnel type hints (#16323)

* add pt funnel type hints

* add tf funnel type hints

* Add type hints for ProphetNet PyTorch (#16272)

* [GLPN] Improve docs (#16331)

* Add link to notebook

* Add link

* Fix bug

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Added type hints for Pytorch Marian calls (#16200)

* Added type hinting for forward functions in pytorch marian

* typo correction

* Removed type hints on functions from BART per Suraj Patil request

* fix import pb

* fix typo

* corrected tuple call

* ran black

* after fix-copies
Some optional tags on primitives were removed, past_key_values in MarianForCausalLM changed from Tuple of Tuple to List

* Fixing copies to roformer and pegasus

Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>

* Moved DecisionTransformOutput to modeling_decision_transformer

* Moved the example usage to research project and cleaned comments

* Made tests ignore the copy of gpt2 in Decision Transformer

* Added module output to modelling decision transformer

* removed copied gpt2 model from list of transformers models

* Updated tests and created __init__ file for new test location

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unneeded summary type from config file

* Fixed copies

* Updated pretrained config map to refer to hopper-medium checkpoint

* done (#16340)

* Added Decision transformer to model docs

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add type annotations for Rembert/Splinter and copies (#16338)

* undo black autoformat

* minor fix to rembert forward with default

* make fix-copies, make quality

* Adding types to template model

* Removing List from the template types

* Remove `Optional` from a couple of types that don't accept `None`

Co-authored-by: matt <rocketknight1@gmail.com>

* [Bug template] Shift responsibilities for long-range (#16344)

* Fix code repetition in serialization guide (#16346)

* Adopt framework-specific blocks for content (#16342)

*  refactor code samples with framework-specific blocks

*  update training.mdx

* 🖍 apply feedback

* Updates the default branch from master to main (#16326)

* Updates the default branch from master to main

* Links from `master` to `main`

* Typo

* Update examples/flax/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updated model with custom docstring example

* Created the Decision Transformer Modle

* updating tests, copy to other machine

* Added last hidden size to Decision Transformer modelling outputs

* Removed copy of original DT file

* made a temporary change to gpt2 to have it conform with the Decision Transformer version

* Updated tests

* Ignoring a file used to test the DT model

* added comments to config file

* added comments and argument descriptions to decision transformer file

* Updated doc

* Ran "make style"

* Remove old model imports

* Removed unused imports, cleaned up init file

* Update docs/source/model_doc/decision_transformer.mdx

added my username

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Reverted changes made to gpt2

* Removed datasets submodule

* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states

* Added support for return of hidden states, attentions and return dict of gpt2 model.

* Updated tests to include many of the ModelTesterMixin tests. 

The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes

* Added missing line to the end of gpt2 file

* Added an integration test for the Decision Transformer

Test performs and autoregressive evaluation for two time steps

* Set done and info to _ to fix failing test

* Updated integration test to be deterministic and check expected outputs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unnecessary config options

* Cleaned up commented code and old comments.

* Cleaned up commented code.

* Changed DecisionTransformer to Decision Transformer

* Added Decision Transformer to the main README file

* Added copy of GTP2 called DecisionTranformerGPT2Model

* isorted imports

* isorted imports

* Added model to non-English README files

* Ran make fix-copies and corrected some cases.

* Updated index file to include Decision Transformer

* Added gpt2 model as copy inside the Decision Transformer model file

* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS

* Deleted redundant checkpoint files (I don't know how these got committed)

* Removed testing files. (These should have never been committed)

* Removed accidentally committed files

* Moved the Decision Transformer test to its own directory

* Moved DecisionTransformOutput to modeling_decision_transformer

* Moved the example usage to research project and cleaned comments

* Made tests ignore the copy of gpt2 in Decision Transformer

* Added module output to modelling decision transformer

* removed copied gpt2 model from list of transformers models

* Updated tests and created __init__ file for new test location

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Removed unneeded summary type from config file

* Fixed copies

* Updated pretrained config map to refer to hopper-medium checkpoint

* Added Decision transformer to model docs

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updated model with custom docstring example

* Updated copies, config auto, and readme files.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Tegzes <48134725+Tegzes@users.noreply.github.com>
Co-authored-by: Adam Montgomerie <adam@avanssion.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: Jacob Dineen <54680234+jacobdineen@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-03-23 16:18:43 -04:00
Sylvain Gugger
c595b6e6a9
Make Transformers use cache files when hf.co is down (#16362)
* Make Transformers use cache files when hf.co is down

* Fix tests

* Was there a random circleCI failure?

* Isolate patches

* Style

* Comment out the failure since it doesn't fail anymore

* Better comment
2022-03-23 15:56:49 -04:00
Joao Gante
9e8c37dc82
TF - Fix interchangeable past/past_key_values and revert output variable name in GPT2 (#16332)
* revert tf gpt2

* add test for unpack_inputs and fix test case

* add changes to vision encoder decoder
2022-03-23 18:41:18 +00:00
Sylvain Gugger
4975002df5
Reorganize file utils (#16264)
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
Lysandre Debut
eca77f4719
Updates the default branch from master to main (#16326)
* Updates the default branch from master to main

* Links from `master` to `main`

* Typo

* Update examples/flax/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-23 03:46:59 -04:00
NielsRogge
0c55d47cde
Add GLPN (#16199)
* First draft

* Fix logits calculation

* Improve tests

* Add copied from statements

* Fix base_model_prefix

* Improve implementation, upload new models

* Update design

* Fix integration test

* Add model to README and toctree

* Add document image

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add decoder_hidden_size attribute

* Update design of decoder

* Add DepthEstimatorOutput class

* Rename in_index to head_in_index and add feature extractor tests

* Apply suggestions from code review

* Apply suggestions from code review

* Update pretrained model name and add to doc tests

* Remove test.py script

* Update copied from statements and clean up

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-22 08:51:13 +01:00
Yih-Dar
f466936476
Add has_attentions to TFModelTesterMixin as done on PyTorch side (#16259)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-19 11:44:17 +01:00
Yih-Dar
75c666b4a8
Aggressive PT/TF equivalence test on PT side (#16250)
* Aggressive PT/TF equivalence test on PT side

* Ugly fix for `TFTapasForQuestionAnswering`

* apply review suggestions

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-18 18:51:24 +01:00
Yih-Dar
d481b6414d
Make Flax pt-flax equivalence test more aggressive (#15841)
* Make test_equivalence_pt_to_flax more aggressive

* Make test_equivalence_flax_to_pt more aggressive

* don't use to_tuple

* clean-up

* fix missing test cases + testing on GPU

* fix conversion

* fix `ValueError: assignment destination is read-only`

* Add type checking

* commit to revert later

* Fix

* fix

* fix device

* better naming

* clean-up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-18 18:15:36 +01:00
Suraj Patil
b25b92ac4f
update jax version and re-enable some tests (#16254) 2022-03-18 16:45:39 +01:00
Nicolas Patry
ecb4662d17
Attention mask is important in the case of batching... (#16222)
* Attention mask is important in the case of batching...

* Improve the fix.

* Making the sentence different enough that they exhibit different
predictions.
2022-03-18 10:02:12 +01:00
NielsRogge
ec4e421b7d
Update expected slices for pillow > 9 (#16117)
* Update expected slices for pillow > 9

* Add expected slices depending on pillow version

* Add different slices depending on pillow version for other models

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-18 09:46:45 +01:00
Suraj Patil
632ff3c39e
[FlaxSpeechEncoderDecoderModel] Skip from_encoder_decoder_pretrained (#16236)
* skip the test

* fix

* fix skip
2022-03-17 20:05:14 +01:00
罗崚骁(LUO Lingxiao)
81643edda5
Support PEP 563 for HfArgumentParser (#15795)
* Support PEP 563 for HfArgumentParser

* Fix issues for Python 3.6

* Add test for string literal annotation for HfArgumentParser

* Remove wrong comment

* Fix typo

* Improve code readability

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Use `isinstance` to compare types to pass quality check

* Fix style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-17 13:51:37 -04:00
Lysandre Debut
5a6b3ccd28
Skip equivalence test for TransfoXL (#16224)
* Skip test for TransfoXL

* Single list
2022-03-17 09:03:07 -04:00
Francesco Saverio Zuppichini
d9b8d1a9f5
update test (#16219) 2022-03-17 08:11:55 -04:00
NielsRogge
03c14a515f
[Tests] Fix DiT test (#16218)
* Fix device

* Clean up

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-17 10:53:57 +01:00
Lysandre Debut
73f0a5d1f6
Fixes Loss for TransfoXL when using Trainer API v2 (#16140)
* fix(transfo_xl): Fixes TransfoXL support when using Trainer.

* fix(tests): Uses losses_1 and losses_2 pattern with TransfoXL test.

* fix(transfo_xl): Adds requested changes to allow for backward compatibility.

fix(transfo_xl): Adds requested changes to allow for backward compatibility.

fix(transfo_xl): Fixes code styling.

* Backward compatibility

* Update src/transformers/models/transfo_xl/modeling_transfo_xl.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Gustavo de Rosa <gth.rosa@uol.com.br>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-17 05:49:24 -04:00
Patrick von Platen
2410d0f8ed
Fix generation min length (#16206)
* up

* fix min lengths
2022-03-16 18:49:23 +01:00
Francesco Saverio Zuppichini
667b823b89
Swin support for any input size (#15986)
* padding done

* correctly return one attention per layer

* almost correct, attentions are not flatten one tuple per stage

* tests green

* doc

* conversations

* reshaping hidden_states

* view in the test

* reshape_hidden_states in Encoder and Model

* new outputs with reshaped_hidden_states

* conversations

* doc

* Update docs/source/model_doc/swin.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* conversations

* fix tests

* minor changes

* resolved conversations

* attentions one per stage

* typo

* typos

* typos

* function signature

* CI

* clean up tests

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-03-16 18:38:25 +01:00
Joao Gante
204c54d411
TF: add beam search tests (#16202) 2022-03-16 15:44:33 +00:00
Suraj Patil
190994573a
Fix loading CLIPVisionConfig and CLIPTextConfig (#16198)
* override from_pretrained

* add tests

* remove docstrings

* fix typo

* Trigger CI
2022-03-16 16:24:01 +01:00
Sanchit Gandhi
ee27b3d7df
Replace all deprecated jax.ops operations with jnp's at (#16078)
* Replace all deprecated `jax.ops` operations with jnp's `at`

* np to jnp scores

* suggested changes
2022-03-16 09:08:55 +00:00
Matt
cd4c5c9060
TF XLA greedy generation (#15786)
* First attempt at TF XLA generation

* Fix comments

* Update XLA greedy generate with direct XLA calls

* Support attention mask, prepare_inputs_for_generation no longer hardcoded for greedy

* Handle position_ids correctly

* make xla generate work for non xla case

* force using xla generate

* refactor

* more fixes

* finish cleaning

* finish

* finish

* clean gpt2 tests

* add gpt2 tests

* correct more cases

* up

* finish

* finish

* more fixes

* flake 8 stuff

* final rag fix

* Update src/transformers/models/rag/modeling_tf_rag.py

* finish t5 as well

* finish

* Update src/transformers/generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-03-15 14:19:20 +01:00
NielsRogge
a7aca42fc4
Improve Swin for VisionEncoderDecoder (#16070)
* Add Swin2Bart test

* Fix swin tests

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-15 09:59:48 +01:00
Francesco Saverio Zuppichini
0a057201a9
Visual Attention Network (VAN) (#16027)
* encoder works

* addded files

* norm in stage

* convertion script

* tests

* fix copies

* make fix-copies

* fixed __init__

* make fix-copies

* fix

* shapiro test needed

* make fix-copie

* minor changes

* make style + quality

* minor refactor conversion script

* rebase + tests

* removed unused variables

* updated doc

* toctree

* CI

* doc

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolved conversations

* make fixup

* config passed to modules

* config passed to modules

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* conversations

* conversations

* copyrights

* normal test

* tests

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-03-15 08:47:12 +01:00
Francesco Saverio Zuppichini
e3008c679f
[WIP] Resnet (#15770)
* first commit

* ResNet model correctly implemented.

basic modeling + weights conversion is done

removed unused doc

mdx file

doc and conversion script

added feature_extractor to auto

test

minor changes + style + quality

doc

test

Delete process.yml

A left over from my attempt of running circleci locally

* minor changes

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* new test format

* minor changes from conversations

* minor changes from conversations

* make style + quality

* readded the tests

* test + README

* minor changes from conversations

* error in README

* make fix-copies

* removed regression for classification head

* make quality

* fixed loss control flow

* fixed loss control flow

* resolved conversations

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* READMEs

* index.mdx

* minor changes

* updated tests and models

* unused import

* outputs

* Update docs/source/model_doc/resnet.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* added embeddings_size

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* conversation

* added push to hub

* test

* embedding_size

* make fix-copies

* resolved conversations

* CI

* changed organization

* minor changes

* CI

* minor changes

* conversations

* conversation

* doc

* tests

* removed unused docstring

* conversation

* removed unused outputs

* CI

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-03-14 19:57:55 +01:00
Yih-Dar
923c35b5c5
Make TF pt-tf equivalence test more aggressive (#15839)
* Make TF pt-tf equivalence test more aggressive

* Fix for TFConvNextModelTest and TFTransfoXLModelTest

* fix kwargs for outputs

* clean-up

* Add docstring for check_outputs()

* remove: need to rename encoder-decoder

* clean-up

* send PyTorch things to the correct device

* Add back the accidentally removed test case in test_pt_tf_model_equivalence()

* Fix: change to tuple before calling check_outputs()

* Fix: tfo could be a list

* use to_tuple()

* allow tfo only to be tuple or tensor

* allow tfo to be list or tuple for now + style change

* minor fix

* remove np.copy and update comments

* tfo -> tf_output, same for pt

* Add more detailed comment

* remove the incorrect comment

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-14 13:31:32 +01:00
Sanchit Gandhi
2de99e6c43
Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints (#16056)
* Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints

* change wording
2022-03-14 10:12:29 +01:00
lewtun
6e1e88fd38
Add TFCamembertForCausalLM and ONNX integration test (#16073)
* Make Camembert great again!

* Add Camembert to TensorFlow ONNX tests
2022-03-14 08:40:42 +01:00