Commit Graph

8957 Commits

Author SHA1 Message Date
Stas Bekman
f15c99fabf
[deepspeed docs] misc additions (#15585)
* [deepspeed docs] round_robin_gradients

* training and/or eval/predict loss is

* Update docs/source/main_classes/deepspeed.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-11 10:54:04 -08:00
Sylvain Gugger
2dce350b33
Fix _configuration_file argument getting passed to model (#15629) 2022-02-11 13:46:08 -05:00
Steven Liu
85aee09e9a
🖍 remove broken link (#15615) 2022-02-11 12:33:55 -06:00
Joao Gante
2f40c728c9
TF MT5 embeddings resize (#15567)
* Fix TF MT5 vocab resize

* more assertive testing
2022-02-11 17:35:10 +00:00
Mishig Davaadorj
8c03df1010
Rebase (#15606) 2022-02-11 12:02:02 -05:00
Joao Gante
3fae83d23a
TF: Add informative warning for inexistent CPU backprop ops (#15612)
* Add informative warning
2022-02-11 16:16:26 +00:00
lewtun
7e4844fc2a
Enable ONNX export when PyTorch and TensorFlow installed in the same environment (#15625) 2022-02-11 16:25:06 +01:00
Sylvain Gugger
6cf06d198c
Mark "code in the Hub" API as experimental (#15624) 2022-02-11 09:55:31 -05:00
Patrick von Platen
45c7b5b1c7
[Generate] Small refactor (#15611) 2022-02-10 18:29:27 +01:00
Ngo Quang Huy
c0864d98ba
Correct JSON format (#15600) 2022-02-10 09:02:03 -08:00
lewtun
2e8b85f72e
Add local and TensorFlow ONNX export examples to docs (#15604)
* Add local and TensorFlow ONNX export examples to docs

* Use PyTorch - TensorFlow split
2022-02-10 16:31:00 +01:00
NielsRogge
3a2ed96714
Fix Seq2SeqTrainer (#15603)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2022-02-10 16:26:14 +01:00
Yih-Dar
724e51c6e6
Compute loss independent from decoder for TF EncDec models (as #14139) (#15175)
* Compute loss independent from decoder (as 14139)

* fix expected seq_len + style

* Apply the same change to TFVisionEncoderDecoderModel

* fix style

* Add case with labels in equivalence test

* uncomment

* Add case with labels in equivalence test

* add decoder_token_labels

* use hf_compute_loss

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add copied from

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-02-10 15:47:02 +01:00
Patrick von Platen
3d5dea9bf0
Add example batch size to all commands (#15596) 2022-02-10 08:52:07 -05:00
Alberto Bégué
cb7ed6e083
Add Tensorflow handling of ONNX conversion (#13831)
* Add TensorFlow support for ONNX export

* Change documentation to mention conversion with Tensorflow

* Refactor export into export_pytorch and export_tensorflow

* Check model's type instead of framework installation to choose between TF and Pytorch

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Alberto Bégué <alberto.begue@della.ai>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-02-10 11:18:41 +01:00
Lysandre
e923917cd9 Reformat tokenization_fnet 2022-02-09 22:23:32 -05:00
Sylvain Gugger
644ec05233 Make slow tests slow 2022-02-09 19:10:22 -05:00
Sylvain Gugger
c722753afd
Expand tutorial for custom models (#15587)
* Expand tutorial for custom models

* Style

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-02-09 17:44:28 -05:00
NielsRogge
a86ee2261e
Add link (#15588)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2022-02-09 23:33:39 +01:00
Stas Bekman
dee17d5676
[trainer docs] document how to select specific gpus (#15551)
* [trainer docs] document how to select specific gpus

* expand

* add urls

* add accelerate launcher
2022-02-09 10:12:29 -08:00
Yih-Dar
258480864d
update serving_output for some TF models (#15568)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-09 18:32:51 +01:00
Sylvain Gugger
315e67404d
Fix tests hub failure (#15580)
* Expose hub test problem

* Fix tests
2022-02-09 12:27:59 -05:00
Sylvain Gugger
b1ba03e082 Fix quality 2022-02-09 12:06:59 -05:00
Sylvain Gugger
eed3186b79 Trigger doc build 2022-02-09 11:57:59 -05:00
Chan Woo Kim
2b5603f6ac
Constrained Beam Search [without disjunctive decoding] (#15416)
* added classes to get started with constrained beam search

* in progress, think i can directly force tokens now but not yet with the round robin

* think now i have total control, now need to code the bank selection

* technically works as desired, need to optimize and fix design choices leading to undersirable outputs

* complete PR #1 without disjunctive decoding

* removed incorrect tests

* Delete k.txt

* Delete test.py

* Delete test.sh

* revert changes to test scripts

* genutils

* full implementation with testing, no disjunctive yet

* shifted docs

* passing all tests realistically ran locally

* removing accidentally included print statements

* fixed source of error in initial PR test

* fixing the get_device() vs device trap

* fixed documentation docstrings about constrained_beam_search

* fixed tests having failing for Speech2TextModel's floating point inputs

* fix cuda long tensor

* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search

* deleted accidentally added test halting code with assert False

* code reformat

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

* fixing based on comments on PR

* took out the testing code that should but work fails without the beam search moditification ; style changes

* fixing comments issues

* docstrings for ConstraintListState

* typo in PhrsalConstraint docstring

* docstrings improvements

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-09 16:59:26 +01:00
Clara Meister
0113aae5b7
Add implementation of typical sampling (#15504)
* typical decoding

* changing arg name

* add test config params

* forgotten arg rename

* fix edge case where scores are same

* test for typical logits warper

* code quality fixes
2022-02-09 16:48:41 +01:00
Suraj Patil
f588cf4050
[Flax tests/FlaxBert] make from_pretrained test faster (#15561) 2022-02-09 16:48:08 +01:00
Lysandre Debut
7029240927
Upgrade click version (#15579) 2022-02-09 10:28:43 -05:00
Sanchit Gandhi
9e00566b9b
Add Wav2Vec2 Adapter Weights to Flax (#15566)
* Add Wav2Vec2 Adapter Weights to Flax

* Suggested changes
2022-02-09 10:24:40 -05:00
Sylvain Gugger
1f60bc46f3
Make sure custom configs work with Transformers (#15569)
* Make sure custom configs work with Transformers

* Apply code review suggestions
2022-02-09 10:04:44 -05:00
Lysandre Debut
7732d0fe7a
Upgrade black to version ~=22.0 (#15565)
* Upgrade black to version ~=22.0

* Check copies

* Fix code
2022-02-09 09:28:57 -05:00
Leandro von Werra
d923f76203
add model scaling section (#15119)
* add model scaling section

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* integrate reviewer feedback

* initialize GPU properly

* add note about BnB optimizer

* move doc from `scaling.mdx` to `performance.mdx`

* integrate reviewer feedback

* revert section levels

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-09 15:27:30 +01:00
Sylvain Gugger
b5c6fdecf0
PoC for a ProcessorMixin class (#15549)
* PoC for a ProcessorMixin class

* Documentation

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Roll out to other processors

* Add base feature extractor class in init

* Use args and kwargs

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-09 09:24:49 -05:00
Yih-Dar
ba3f9a71a1
logger.warn --> logger.warning (#15572)
* change logger.warn to logger.warning

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-09 08:20:05 -05:00
Suraj Patil
a6885db912
[Flax tests] fix test_model_outputs_equivalence (#15571)
* fix test_model_outputs_equivalence

* fix tuple outputs for blenderbot
2022-02-09 12:26:48 +01:00
Nathan Raw
fcb4f11c92
📝 Add codecarbon callback to docs (#15563) 2022-02-08 14:10:53 -05:00
Boris Dayma
077c00c0b2
feat(flax): allow encoder_outputs in generate (#15554)
* feat(flax): allow encoder_outputs in generate

* doc(flax): encoder_outputs in generate

* fix: style

* fix: style
2022-02-08 17:53:22 +01:00
Joao Gante
8406fa6dd5
Add TFSpeech2Text (#15113)
* Add wrapper classes

* convert inner layers to tf

* Add TF Encoder and Decoder layers

* TFSpeech2Text models

* Loadable model

* TF model with same outputs as PT model

* test skeleton

* correct tests and run the fixup

* correct attention expansion

* TFSpeech2Text pask_key_values with TF format
2022-02-08 16:27:23 +00:00
Yih-Dar
6a5472a8e1
Force use_cache to be False in PyTorch (#15385)
* use_cache = False for PT models if labels is passed

* Fix for BigBirdPegasusForConditionalGeneration

* add warning if users specify use_cache=True

* Use logger.warning instead of warnings.warn

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-08 16:20:53 +01:00
Suraj Patil
0acd84f7cb
[GPTJ] fix docs (#15558) 2022-02-08 15:54:19 +01:00
aaron
87d08afb16
electra is added to onnx supported model (#15084)
* electra is added to onnx supported model

* add google/electra-base-generator for test onnx module

Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
2022-02-08 15:47:49 +01:00
Michael Benayoun
0fe17f375a
FX tracing improvement (#14321)
* Change the way tracing happens, enabling dynamic axes out of the box

* Update the tests and modeling xlnet

* Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors).

* Comments and making tracing work for gpt-j and xlnet

* Refactore things related to num_choices (and batch_size, sequence_length)

* Update fx to work on PyTorch 1.10

* Postpone autowrap_function feature usage for later

* Add copyrights

* Remove unnecessary file

* Fix issue with add_new_model_like

* Apply suggestions
2022-02-07 22:25:33 +01:00
Steven Liu
552f8d3091
Create a custom model guide (#15489)
* 📝 add config section

* 📝 finish first draft

* 📝 add feature extractor and processor

* 🖍 apply feedback from review

* 📝 minor edits

* last review
2022-02-07 12:34:56 -06:00
Yih-Dar
ad1d3c4d4b
Make TF Wav2Vec2 outputs the same as PT's version (#15530)
* fix outputs

* fix for CTC

* fix doc

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-07 18:09:57 +01:00
Yih-Dar
131e258411
Fix TF T5/LED missing cross attn in retrun values (#15511)
* add cross attn to outputs

* add cross attn to outputs for TFLED

* add undo padding

* remove unused import

* fix style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-07 17:41:48 +01:00
lewtun
6775b211b6
Remove Longformers from ONNX-supported models (#15273) 2022-02-07 17:32:13 +01:00
François REMY
7a1412e12b
Wav2Vec2 models must either throw or deal with add_apater (#15409)
* Wav2Vec2 models must either throw or deal with add_apater

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add pre-add_adapter backwards compatibility

* Add pre-add_adapter backwards compatibility

* Fix issue in tests/test_modeling_wav2vec2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-07 17:03:12 +01:00
Anton Lozhkov
a459f7f97d
Add ASR CTC streaming example (#15309)
* Single-epoch run

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Infinite dataset

* Trainer fix + distributed benchmark

* Benchmark fix

* unused import

* interleaved splits

* interleaved splits

* has_length util

* Move to research projects

* Leftover Sized checks

* Bump min version

* Unused import

* Revert trainer changes

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-07 18:35:37 +03:00
Anton Lozhkov
75b13f82e9
[Trainer] Deeper length checks for IterableDatasetShard (#15539)
* Unused import

* Make `has_length()` torch-independent to use in callbacks

* Update src/transformers/trainer_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-07 18:34:56 +03:00
NielsRogge
84eec9e6ba
Add ConvNeXT (#15277)
* First draft

* Add conversion script

* Improve conversion script

* Improve docs and implement tests

* Define model output class

* Fix tests

* Fix more tests

* Add model to README

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply more suggestions from code review

* Apply suggestions from code review

* Rename dims to hidden_sizes

* Fix equivalence test

* Rename gamma to gamma_parameter

* Clean up conversion script

* Add ConvNextFeatureExtractor

* Add corresponding tests

* Implement feature extractor correctly

* Make implementation cleaner

* Add ConvNextStem class

* Improve design

* Update design to also include encoder

* Fix gamma parameter

* Use sample docstrings

* Finish conversion, add center cropping

* Replace nielsr by facebook, make feature extractor tests smaller

* Fix integration test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-07 16:11:37 +01:00