Patrick von Platen
3bb6356d4d
[From pretrained] Allow download from subfolder inside model repo ( #18184 )
...
* add first generation tutorial
* [from_pretrained] Allow loading models from subfolders
* remove gen file
* add doc strings
* allow download from subfolder
* add tests
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply comments
* correct doc string
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-19 11:53:53 +02:00
Snehan Kekre
ce0152819d
Update docs README with instructions on locally previewing docs ( #18196 )
...
* Update docs README with instructions on locally previewing docs
* Add instructions to install `watchdog` before previewing the docs
2022-07-19 11:47:26 +02:00
orgoro
798384467b
bugfix: div-->dim ( #18135 )
2022-07-19 10:24:56 +02:00
Sylvain Gugger
e630dad555
Add vision example to README ( #18194 )
2022-07-19 09:46:18 +02:00
Duong A. Nguyen
4bea6584e3
Remove use_auth_token from the from_config method ( #18192 )
...
* remove use_auth_token from from_config
* restore use_auth_token from_pretrained run_t5_mlm_flax
2022-07-19 08:13:20 +02:00
Sylvain Gugger
29fd471556
Use smaller variant of BLOOM for doc to fix tests
2022-07-18 15:17:29 -04:00
Sourab Mangrulkar
bc8e30bab9
FSDP integration enhancements and fixes ( #18134 )
...
* FSDP integration enhancements and fixes
* resolving comments
* fsdp fp16 mixed precision requires `ShardedGradScaler`
2022-07-19 00:02:10 +05:30
Nicola Procopio
8e445ca51d
Translation/training: italian translation training.mdx ( #17662 )
...
* added training.mdx
* updated training.mdx
* updated training.mdx
* updated training.mdx
* updated _toctree.yml
* fixed typos after review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-18 19:21:07 +02:00
Younes Belkada
6a1b1bf7a6
BLOOM minor fixes small test ( #18175 )
...
* minor fixes
- add correct revision
- corrected dosctring for test
- removed a test
* contrib credits
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
2022-07-18 19:18:19 +02:00
Nicola Procopio
c4cc894086
Translation italian: multilingual.mdx ( #17768 )
...
* added multilingual.mdx
* updated multilingual.mdx
* italian translation multilingual.mdx
* updated _toctree.yml
* fixed typos _toctree.yml
* fixed typos after review
* fixed error after review
2022-07-18 19:09:08 +02:00
Nicola Procopio
0a5b61d004
Added preprocessing.mdx italian translation ( #17600 )
...
* updated _toctree.yml
* added preprocessing
* updated preprocessing.mdx
* updated preprocessing.mdx
updated after review
2022-07-18 19:06:10 +02:00
SaulLu
ced1f1f5db
fix typo inside bloom documentation ( #18187 )
2022-07-18 17:43:52 +02:00
Sylvain Gugger
edadfc58af
Better default for offload_state_dict in from_pretrained ( #18183 )
2022-07-18 16:02:41 +02:00
Sylvain Gugger
aeeab1ffd0
Fix template for new models in README ( #18182 )
2022-07-18 16:01:51 +02:00
Ayan Sengupta
45255814a2
FIX: Typo ( #18156 )
2022-07-18 15:46:08 +02:00
Yih-Dar
6561fbcc6e
Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests ( #18073 )
...
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-18 15:29:14 +02:00
Yih-Dar
cb19c2afdc
Fix expected loss values in some (m)T5 tests ( #18177 )
...
* fix expected loss values
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-18 15:26:21 +02:00
Wang, Yi
7417f3acb7
[HPO] update to sigopt new experiment api ( #18147 )
...
* [HPO] update to sigopt new experiment api
* follow https://docs.sigopt.com/experiments
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* [HPO] use new API if sigopt version >= 8.0.0
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-07-18 15:19:40 +02:00
gcheron
8c14b342aa
add ONNX support for LeVit ( #18154 )
...
Co-authored-by: Guilhem Chéron <guilhemc@authentifier.com>
2022-07-18 15:17:07 +02:00
Lysandre Debut
c1c79b0655
NLLB tokenizer ( #18126 )
...
* NLLB tokenizer
* Apply suggestions from code review - Thanks Stefan!
Co-authored-by: Stefan Schweter <stefan@schweter.it>
* Final touches
* Style :)
* Update docs/source/en/model_doc/nllb.mdx
Co-authored-by: Stefan Schweter <stefan@schweter.it>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* PR reviews
* Auto models
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-18 08:12:34 -04:00
John Giorgi
a4f97e6ce0
Fix incorrect type hint for lang ( #18161 )
2022-07-18 09:53:18 +02:00
John Giorgi
c46d39f390
Fix check for falsey inputs in run_summarization ( #18155 )
2022-07-18 09:50:32 +02:00
Nicolas Patry
ccc0897804
Adding support for device_map
directly in pipeline(..)
function. ( #17902 )
...
* Adding support for `device_map` directly in `pipeline(..)` function.
* Updating the docstring.
* Adding a better docstring
* Put back type hints.
* Blacked. (`make fixup` didn't work ??!!)
2022-07-15 15:54:26 +02:00
Nicolas Patry
fca66ec4ef
Fixing a hard to trigger bug for text-generation
pipeline. ( #18131 )
...
* Fixing a bug where attention mask was not passed to generate.
* Fixing zero-size prompts.
* Comment on top.
2022-07-15 15:54:07 +02:00
amyeroberts
8581a798c0
Add TF DeiT implementation ( #17806 )
...
* Initial TF DeiT implementation
* Fix copies naming issues
* Fix up + docs
* Properly same main layer
* Name layers properly
* Initial TF DeiT implementation
* Fix copies naming issues
* Fix up + docs
* Properly same main layer
* Name layers properly
* Fixup
* Fix import
* Fix import
* Fix import
* Fix weight loading for tests whilst not on hub
* Add doc tests and remove to_2tuple
* Add back to_2tuple
Removing to_2tuple results in many downstream changes needed because of the copies checks
* Incorporate updates in Improve vision models #17731 PR
* Don't hard code num_channels
* Copy PyTorch DeiT embeddings and remove pytorch operations with mask
* Fix patch embeddings & tidy up
* Update PixelShuffle to move logic into class layer
* Update doc strings - remove PT references
* Use NHWC format in internal layers
* Fix up
* Use linear activation layer
* Remove unused import
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Move dataclass to top of file
* Remove from_pt now weights on hub
* Fixup
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
2022-07-13 18:04:08 +01:00
Wei
7ea6ccc2b3
Enable torchdynamo with torch_tensorrt(fx path) ( #17765 )
...
* enable fx2trt
* Update perf_train_gpu_one.mdx
* Update perf_train_gpu_one.mdx
* add lib check
* update
* format
* update
* fix import check
* fix isort
* improve doc
* refactor ctx manager
* fix isort
* black format
* isort fix
* fix format
* update args
* update black
* cleanups
* Update perf_train_gpu_one.mdx
* code refactor
* code refactor to init
* remove redundancy
* isort
* replace self.args with args
Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-13 12:43:28 -04:00
Sylvain Gugger
37aeb5787a
Make sharded checkpoints work in offline mode ( #18125 )
...
* Make sharded checkpoints work in offline mode
* Add test
2022-07-13 12:43:08 -04:00
Sylvain Gugger
0a21a48564
Revert "Make sharded checkpoints work in offline mode"
...
This reverts commit 3564c65786
.
2022-07-13 10:53:25 -04:00
Sylvain Gugger
3564c65786
Make sharded checkpoints work in offline mode
2022-07-13 10:51:56 -04:00
lmagne
56e6487c40
add dataset split and config to model-index in TrainingSummary.from_trainer ( #18064 )
...
* added metadata to training summary
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-13 16:07:20 +02:00
John Giorgi
fde22c75a1
Add summarization name mapping for MultiNews ( #18117 )
...
* Add summarization name mapping for MultiNews
* Add summarization name mapping for MultiNews
2022-07-13 08:19:20 -04:00
Sebastian Sosa
195133363e
supported python versions reference ( #18116 )
...
* supported python versions reference
* Update CONTRIBUTING.md
removing commit hash from link
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-13 08:18:44 -04:00
Joao Gante
20509ab0e0
TF: unpack_inputs decorator independent from main_input_name ( #18110 )
2022-07-13 10:43:41 +01:00
Joao Gante
fcefa200b2
TF: remove graph mode distinction when processing boolean options ( #18102 )
2022-07-12 19:05:31 +01:00
Niklas Muennighoff
bc34c21191
Fix BLOOM dtype ( #17995 )
...
* Add fp16 option
* Fix BLOOM dtype
* Formatting
* Remove torch_dtype arg
* Revert formatting
* Apply formatting
* Add n_embed backward compat
2022-07-12 10:36:08 -04:00
Joao Gante
981714efe1
CLI: reenable pt_to_tf
test ( #18108 )
2022-07-12 13:38:05 +01:00
wei zhao
f5221c06e4
Report value for a step instead of epoch. ( #18095 )
...
* Report value for a step instead of epoch.
Report an objective function value for a step instead of epoch to optuna.
I made this modification for the following reason:
If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here.
* MOD: make style.
Co-authored-by: zhaowei01 <zhaowei01@yuanfudao.com>
2022-07-12 08:18:35 -04:00
Sijun He
d4ebd4e112
speed up test ( #18106 )
2022-07-12 04:28:28 -04:00
jianan-gu
b7d8bd378c
Enhance IPEX integration in Trainer ( #18072 )
...
* enhance ipex import
* refine codes
* refine style
* add link
* style
Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-11 21:34:09 -07:00
Younes Belkada
a462fc9232
Bloom Optimize operations ( #17866 )
...
* fix tolerance for a bloom slow test
* enhance alibi padding
- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi
Co-authored-by: justheuristic <justheuristic@gmail.com>
* optimize attention mask
* fix scaled softmax limit values
* optimize building alibi tensor
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fix attention_mask shape when it's None
* minor fixes
- fix docstring + arg names
* remove colons in docstring
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* apply suggestion
* remove unsued arg
* refactor a bit
- use [:, None] for consistency
* refactor attention block
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
* quick fixes
* first attempt
* refactor attention block and fix all tests except "test_simple_generation"
- added comments to better explain attention block
* remove debug lines and add TODO comment
* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`
* styling
* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* styling
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fix support for accelerate
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove attn softmax in fp32
* refactor comments
* refactor a bit
- remove warning message
- remove print on test
* refer to pytorch t5
* change the slow tests
- do the tests in fp32
- remove some comments
- keep large comments
* update expected output for `test_simple_generation`
- we now test using fp32
* make style + change comments a bit
* fix dtype padd test
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-11 13:16:13 -04:00
Sylvain Gugger
5ff6f853d7
Mark slow test as such
2022-07-11 12:48:57 -04:00
Sylvain Gugger
b1b8222d80
Add filename to info diaplyed when downloading things in from_pretrained ( #18099 )
2022-07-11 12:45:06 -04:00
Sylvain Gugger
6c8017a5c8
Fix image segmentation and object detection pipeline tests ( #18100 )
2022-07-11 12:41:56 -04:00
Sylvain Gugger
b0520f594c
Skip failing tests
2022-07-11 10:16:54 -04:00
Duong A. Nguyen
1e8140caad
Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts ( #18069 )
...
* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts
* using np.permutation for creating batch_idx
* train_samples_idx -> training_samples_idx
* fix type hints
2022-07-11 15:59:08 +02:00
Yih-Dar
ac98a88fbc
Fix torchscript tests for GPT-NeoX ( #18012 )
...
* fix dtype issue in _attn
* fix RotaryEmbedding
* fix RotaryEmbedding 2
* clean up
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-11 05:02:54 -04:00
Yulv-git
95113d1365
Fix some typos. ( #17560 )
...
* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>
* Fix typo.
Signed-off-by: Yulv-git <yulvchi@qq.com>
* make fixup.
2022-07-11 05:00:13 -04:00
Stas Bekman
ad28ca291b
[bloom] fix alibi device placement ( #18087 )
2022-07-10 09:11:46 -07:00
neverix
8b332a6a16
Make predict() close progress bars after finishing ( #17952 ) ( #18078 )
...
* Make Trainer.predict call on_evaluate (#17952 )
* Add on_predict
* Small fix
* Small and different fix
* Add tests
2022-07-08 16:44:24 -04:00
Sylvain Gugger
7c046c5c22
Update localized READMES when template is filled. ( #18062 )
2022-07-08 11:08:52 -04:00