Patrick von Platen
da69de17e8
[Assistant Generation] Improve Encoder Decoder ( #26701 )
...
* [Assistant Generation] Improve enc dec
* save more
* Fix logit processor checks
* Clean
* make style
* fix deprecation
* fix generation test
* Apply suggestions from code review
* fix biogpt
* make style
2023-10-11 15:52:20 +02:00
Yih-Dar
5334796d20
Copied from
for test files (#26713 )
...
* copied statement for test files
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-11 14:12:09 +02:00
Ben Gubler
9f40639292
Update docs to explain disabling callbacks using report_to ( #26155 )
...
* feat: update callback doc to explain disabling callbacks using report_to
* docs: update report_to docstring
2023-10-11 07:50:23 -04:00
Billy Bradley
dcc49d8a7e
In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) ( #25242 )
...
* In assisted decoding, pass model_kwargs to model's forward call
Previously, assisted decoding would ignore any additional kwargs
that it doesn't explicitly handle. This was inconsistent with other
generation methods, which pass the model_kwargs through
prepare_inputs_for_generation and forward the returned dict to the
model's forward call.
The prepare_inputs_for_generation method needs to be amended in all
models, as previously it only kept the last input ID when a past_key_values
was passed.
* Improve variable names in _extend_attention_mask
* Refactor extending token_type_ids into a function
* Replace deepcopy with copy to optimize performance
* Update new persimmon model with llama changes for assisted generation
* Update new mistral model for assisted generation with prepare_inputs_for_generation
* Update position_ids creation in falcon prepare_inputs_for_generation to support assisted generation
2023-10-11 13:18:42 +02:00
Thien Tran
1e3c9ddacc
Make Whisper Encoder's sinusoidal PE non-trainable by default ( #26032 )
...
* set encoder's PE as non-trainable
* freeze flax
* init sinusoids
* add test for non-trainable embed positions
* simplify TF encoder embed_pos
* revert tf
* clean up
* add sinusoidal init for jax
* make consistent sinusoidal function
* fix dtype
* add default dtype
* use numpy for sinusoids. fix jax
* add sinusoid init for TF
* fix
* use custom embedding
* use specialized init for each impl
* fix sinusoids init. add test for pytorch
* fix TF dtype
* simplify sinusoid init for flax and tf
* add tests for TF
* change default dtype to float32
* add sinusoid test for flax
* Update src/transformers/models/whisper/modeling_flax_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* move sinusoidal init to _init_weights
---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-10-11 09:08:54 +01:00
Roy Hvaara
fc63914399
[JAX] Replace uses of jnp.array
in types with jnp.ndarray
. ( #26703 )
...
`jnp.array` is a function, not a type:
https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.array.html
so it never makes sense to use `jnp.array` in a type annotation. Presumably the intent was to write `jnp.ndarray` aka `jax.Array`.
Co-authored-by: Peter Hawkins <phawkins@google.com>
2023-10-10 21:35:16 +02:00
jheitmann
3eceaa3637
Fix source_prefix default value ( #26654 )
2023-10-10 20:49:10 +02:00
théo gigant
975003eacb
fix a typo in flax T5 attention - attention_mask variable is misnamed ( #26663 )
...
* fix a typo in flax t5 attention
* fix the typo in flax longt5 attention
2023-10-10 20:36:32 +02:00
Pavarissy
e8fdd7875d
[docstring] Fix docstring for LlamaConfig
( #26685 )
...
* Your commit message here
* fix LlamaConfig docstring
* run make fixup
* fix formatting after review
reformat of the file to prevent script issues
* rerun make fixup after reformat
2023-10-10 17:05:48 +02:00
Tuowei Wang
a9862a0f49
Fix Typo: table in deepspeed.md ( #26705 )
2023-10-10 11:50:10 +02:00
jiqing-feng
592f2eabd1
Control first downsample stride in ResNet ( #26374 )
...
* control first downsample stride
* reduce first only works for ResNetBottleNeckLayer
* fix param name
* fix style
2023-10-10 06:45:24 +02:00
Isaac Chung
a5e6df82c0
[docstring] Fix docstrings for CLIP
( #26691 )
...
fix docstrings for vanilla clip
2023-10-09 17:39:05 +02:00
Lysandre Debut
87b4ade9e5
Fix stale bot ( #26692 )
...
* Fix stale bot
* Comments
2023-10-09 16:39:57 +02:00
Alex Bzdel
3257946fb7
[docstring] Fix docstring for DonutImageProcessor ( #26641 )
...
* removed donutimageprocessor from objects_to_ignore
* added docstring for donutimageprocessor
* readding donut file
* moved docstring to correct location
2023-10-09 16:32:13 +02:00
Isaac Chung
d2f06dfffc
[docstring] Fix docstring for CLIPImageProcessor
( #26676 )
...
fix docstring for CLIPImageProcessor
2023-10-09 14:22:44 +02:00
Isaac Chung
3763101f85
[docstring] Fix docstring CLIP configs ( #26677 )
...
* fix docstrings for CLIP configs
* black formatted
2023-10-09 12:34:01 +02:00
tom white
c7f01beece
fix typos in idefics.md ( #26648 )
...
* fix typos in idefics.md
Two typos found in reviewing this documentation.
1) max_new_tokens=4, is not sufficient to generate "Vegetables" as indicated - you will get only "Veget". (incidentally - some mention of how to select this value might be useful as it seems to change in each example)
2) inputs = processor(prompts, return_tensors="pt").to(device) as inputs need to be on the same device (as they are in all other examples on the page)
* Update idefics.md
Change device to cuda explicitly to match other examples
2023-10-09 12:18:02 +02:00
Yih-Dar
740fc6a1da
Avoid CI OOM ( #26639 )
...
fix avoid oom
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-09 11:42:08 +02:00
D. Carpintero
8835bff6a0
fix links in README.md for the GPT, GPT-2, and Llama2 Models ( #26640 )
...
* fix OpenAI GPT, GPT-2 links
* fix Llama2 link
2023-10-09 11:34:44 +02:00
Shreyas S
86a4e5a96b
Fixed malapropism error ( #26660 )
...
Update test_integration.py
Fixed malapropism clone>copy
2023-10-09 11:04:57 +02:00
NielsRogge
2629c8f36a
[DINOv2] Convert more checkpoints ( #26177 )
...
* Convert checkpoints
* Update doc test
* Address comment
2023-10-09 09:58:04 +02:00
Jabasukuriputo Wang
897a826d83
docs(zh): review and punctuation & space fix ( #26627 )
2023-10-06 09:24:28 -07:00
Yih-Dar
360ea8fc72
[docstring] Fix docstring for AlbertConfig
( #26636 )
...
example fix docstring
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-06 17:36:22 +02:00
Arthur
9ad815e412
[LlamaTokenizerFast
] Adds edge cases for the template processor ( #26606 )
...
* make sure eos and bos are properly handled for fast tokenizer
* fix code llama as well
* nits
* fix the conversion script as well
* fix failing test
2023-10-06 16:40:54 +02:00
statelesshz
27597fea07
remove SharedDDP as it is deprecated ( #25702 )
...
* remove SharedDDP as it was drepracated
* apply review suggestion
* make style
* Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer.
* remove the unnecessary conditional statement
* keep the logic of IPEX
* clean code
* mix precision setup & make fixup
---------
Co-authored-by: statelesshz <jihuazhong1@huawei.com>
2023-10-06 16:03:11 +02:00
Yih-Dar
e840aa67e8
Fix failing MusicgenTest .test_pipeline_text_to_audio
( #26586 )
...
* fix
* fix
* Fix
* Fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-06 15:53:59 +02:00
rui-ren
87499420bf
fix RoPE t range issue for fp16 ( #26602 )
2023-10-06 12:04:54 +01:00
Matt
ea52ed9dc8
Update chat template docs with more tips on writing a template ( #26625 )
2023-10-06 12:04:40 +01:00
fxmarty
64845307b3
Remove unnecessary unsqueeze - squeeze in rotary positional embedding ( #26162 )
...
* remove unnecessary unsqueeze-squeeze in llama
* correct other models
* fix
* revert gpt_neox_japanese
* fix copie
* fix test
2023-10-06 18:25:15 +09:00
Tianqi Liu
65aabafe2f
Update tokenization_code_llama_fast.py ( #26576 )
...
* Update tokenization_code_llama_fast.py
* Update test_tokenization_code_llama.py
* Update test_tokenization_code_llama.py
2023-10-06 10:49:02 +02:00
Towdo
af38c837ee
Fixed inconsistency in several fast tokenizers ( #26561 )
2023-10-06 10:40:47 +02:00
Ramiro Leal-Cavazos
8878eb1bd9
Remove unnecessary view
s of position_ids
( #26059 )
...
* Remove unnecessary `view` of `position_ids` in `modeling_llama`
When `position_ids` is `None`, its value is generated using
`torch.arange`, which creates a tensor of size `(seq_length +
past_key_values_length) - past_key_values_length = seq_length`. The
tensor is then unsqueezed, resulting in a tensor of shape `(1,
seq_length)`. This means that the last `view` to a tensor of shape
`(-1, seq_length)` is a no-op.
This commit removes the unnecessary view.
* Remove no-op `view` of `position_ids` in rest of transformer models
2023-10-06 10:28:00 +02:00
Yih-Dar
75a33d60f2
Don't install pytorch-quantization
in Doc Builder docker file ( #26622 )
...
Fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-05 16:57:50 +02:00
Maria Khalusova
18fbeec824
[docs] Update to scripts building index.md ( #26546 )
...
* build the table in index.md with links to the model_doc
* removed list generation on index.md
* fixed missing models
* make style
2023-10-05 10:20:41 -04:00
Yih-Dar
9d20601259
Fix transformers-pytorch-gpu
docker build ( #26615 )
...
Fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-05 15:33:35 +02:00
eajechiloae
9e78c9acfb
Don't close ClearML task if it was created externally ( #26614 )
...
don't close clearml task if it was created externally
2023-10-05 15:33:05 +02:00
Marvin Gabler
0a3b9d02fe
#26566 swin2 sr allow in out channels ( #26568 )
...
* feat: close #26566 , changed model & config files to accept arbitary in and out channels
* updated docstrings
* fix: linter error
* fix: update Copy docstrings
* fix: linter update
* fix: rename num_channels_in to num_channels to prevent breaking changes
* fix: make num_channels_out None per default
* Update src/transformers/models/swin2sr/configuration_swin2sr.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix: update tests to include num_channels_out
* fix:linter
* fix: remove normalization with precomputed rgb values when #input_channels!=#output_channels
---------
Co-authored-by: marvingabler <marvingabler@outlook.de>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-05 15:20:38 +02:00
Younes Belkada
e6d250e4cd
[core
] fix silent bug keep_in_fp32
modules ( #26589 )
...
* fix silent bug `keep_in_fp32` modules
* final fix
* added a common test.
* Trigger CI
* revert
2023-10-05 14:44:31 +02:00
Charles Bensimon
19f0b7dd02
Make ModelOutput
serializable ( #26493 )
...
* Make `ModelOutput` serializable
Original PR from diffusers : https://github.com/huggingface/diffusers/pull/5234
* Black
2023-10-05 11:08:44 +02:00
Yih-Dar
54e17a15dc
Fix failing tests on main
due to torch 2.1 ( #26607 )
...
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-05 10:27:05 +02:00
Yun Dai
2ab76c2c4f
[Falcon] Set use_cache=False
before creating presents
which relies on use_cache
( #26328 )
...
* Set `presents=None` when `use_cache` is set to False for activation ckpt
* Update modeling_falcon.py
* fix black
2023-10-05 10:18:27 +02:00
Arthur
253f9a3f97
[GPTNeoX
] Faster rotary embedding for GPTNeoX (based on llama changes) ( #25830 )
...
* Faster rotary embedding for GPTNeoX
* there might be un-necessary moves from device
* fixup
* fix dtype issue
* add copied from statements
* fox copies
* oupsy
* add copied from Llama for scaled ones as well
* fixup
* fix
* fix copies
2023-10-05 10:05:39 +02:00
Arthur
b4e66d7a67
[ NougatProcessor
] Fix the default channel ( #26608 )
...
fix
2023-10-05 09:38:08 +02:00
Yeyang
43bfd093e1
add zh translation for installation ( #26084 )
...
* translate installation to zh
* fix translation typo
2023-10-04 09:39:02 -07:00
Sanchit Gandhi
2d8ee9817c
[Wav2Vec2] Fix tokenizer set lang ( #26349 )
...
* fix wav2vec2 doctest
* suggestion
* fix
* final fix
* revert since we need AddedTokens
2023-10-04 17:12:09 +01:00
Galland
f9ab07f920
Update mistral.md to update 404 link ( #26590 )
2023-10-04 17:48:11 +02:00
Arthur
c037b2e340
skip flaky hub tests ( #26594 )
...
skip flaky
2023-10-04 17:47:55 +02:00
Soyoung Yoon
ca7912d191
Fix encoder->decoder typo bug in convert_t5x_checkpoint_to_pytorch.py ( #26587 )
...
Fix bug in convert_t5x_checkpoint_to_pytorch.py
2023-10-04 17:34:32 +02:00
Matt
8b03615b7b
Fix embarrassing typo in the doc chat template! ( #26596 )
2023-10-04 16:28:53 +01:00
dg845
9deb18ca1a
Add # Copied from statements to audio feature extractors that use the floats_list function ( #26581 )
...
Add # Copied from statements to audio feature extractors that use the floats_list function.
2023-10-04 17:09:48 +02:00