doomdagadiggiedahdah
1c944ac1e1
Fix issue #32518 : Update llm_tutorial.md ( #32523 )
...
Update llm_tutorial.md
remove comma re: issue 32518
https://github.com/huggingface/transformers/issues/32518
2024-08-08 10:54:02 +01:00
Tom Aarsen
aefd3e2ae1
Fix typo: depracted -> deprecated ( #32489 )
...
Hello!
## Pull Request overview
* Fix typo
## Details
This should speak for itself.
cc @itazap @ArthurZucker
- Tom Aarsen
2024-08-08 09:37:14 +02:00
Francisco Kurucz
f5cdbf6e54
Fix link to autoclass_tutorial.md in i18n.md ( #32501 )
2024-08-07 16:09:52 -07:00
Jiyoon
78566dbdf0
🌐 [i18n-KO] Translated chat_templating.md
to Korean ( #32362 )
...
* docs: ko: chat_templating.md
* feat: nmt draft
* fix: manual edits
* Update docs/source/ko/chat_templating.md
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
* Update docs/source/ko/chat_templating.md
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
* fix: apply suggestions from code review - anchor
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
* fix: manual edits
Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com>
Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com>
* fix: manual edits
* fix: delete 'default template' section
---------
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com>
Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com>
2024-08-07 11:25:19 -07:00
Sai-Suraj-27
543df48914
Docs: Fixed WhisperModel.forward’s docstring link ( #32498 )
...
Fixed WhisperModel.forward’s docstring link.
2024-08-07 11:01:33 -07:00
Francisco Kurucz
73a59a2fcb
Fix references to model google mt5 small ( #32497 )
2024-08-07 17:57:20 +01:00
Jiwook Han
cba7bcf87b
🌐 [i18n-KO] Translated image_feature_extraction.md
to Korean ( #32239 )
...
* docs: ko: tasks/images_feature_extraction.md
* feat: nmt draft
* fix: manual edits
* fix: manual edits
* fix: manual edits
* fix: manual edits
* feat: manual edits
* Update docs/source/ko/tasks/image_feature_extraction.md
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
* Update docs/source/ko/tasks/image_feature_extraction.md
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
* fix: manual edits
---------
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
2024-08-07 09:56:23 -07:00
Sungmin Oh
fa59fd87dd
🌐 [i18n-KO] Translated quantization/quanto.md
to Korean ( #32281 )
...
* docs: ko: quantization/quanto.md
* feat: nmt draft
* fix: resolve suggestions
Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com>
Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com>
Co-authored-by: 김준재 <55151385+junejae@users.noreply.github.com>
* fix: resolve suggestions
Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com>
---------
Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com>
Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com>
Co-authored-by: 김준재 <55151385+junejae@users.noreply.github.com>
2024-08-07 09:52:57 -07:00
Chaewon Song
fcc4f2ae8f
🌐 [i18n-KO] Translated prompting.md
to Korean ( #32294 )
...
* docs: ko: tasks/prompting.md
* feat: nmt-draft
* fix: update translation in prompting.md
* fix: update toctree.yml
* fix: manual edits
* fix: toctree edits
* fix: resolve suggestions
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>
---------
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>
2024-08-07 09:44:31 -07:00
Minki Kim
1124d95dbb
🌐 [i18n-KO] Translated gptq.md
to Korean ( #32293 )
...
* fix: manual edits
* fix: manual edits2
* fix: delete files
* fix: resolve suggestions
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com>
Co-authored-by: 김준재 <55151385+junejae@users.noreply.github.com>
* fix: resolve suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com>
Co-authored-by: 김준재 <55151385+junejae@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-07 09:19:35 -07:00
Joao Gante
b7fb393f68
Docs: alert for the possibility of manipulating logits ( #32467 )
...
* logits
* words
2024-08-07 16:34:46 +01:00
Jonathan Rahn
b6401030de
fix broken link in docs ( #32491 )
...
`https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline.__call__ `
`generate_kwargs (dict, optional) — Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).`
link in "here" doesnt work
2024-08-07 15:14:03 +01:00
Aymeric Roucher
e0d82534cc
Agents use grammar ( #31735 )
...
* Allow optional use of grammars to constrain generation
2024-08-07 11:42:52 +02:00
Bill Zhou
c54a6f994a
Fix typo in tokenization_utils_base.py ( #32484 )
2024-08-07 10:29:44 +01:00
append-only
46d09af4fc
enable xla fsdp ( #32048 )
...
* enable xla fsdp
* add acceleration version check for xla fsdp
2024-08-07 10:28:17 +01:00
Raushan Turganbay
7ad784ae9d
Gemma2: add cache warning ( #32279 )
...
* gemma2 fallback to dynamic cache
* Update src/transformers/models/gemma2/modeling_gemma2.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/gemma2/modeling_gemma2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* raise error and dont fallback to dynamic cache
* prev will break most forward calls/tests
* Update src/transformers/models/gemma2/modeling_gemma2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update
* fix copies
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-07 10:03:05 +05:00
Raushan Turganbay
a30c865f99
Cache: new Cache format in decoder-only models ( #31421 )
...
* draft bart with new cache
* add cache for decoder-only models
* revert utils
* modify docstring
* revert bart
* minor fixes
* fix copies (not related)
* revert tests
* remove enc-dec related code
* remove bloom
* remove opt (enc-dec)
* update docstring
* git, codegen, gpt_neo, gpt_neox, gpj
* clean up
* copied from statements
* revert
* tmp
* update warning msg
* forgot git
* add more flags
* run-slow git,codegen,gpt_neo,gpt_neox,gpj
* add cache flag to VLMs
* remove files
* style
* video LLMs also need a flag
* style
* llava will go in another PR
* style
* [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* copy from
* deprecate until v4.45 and warn if not training
* nit
* fix test
* test static cache
* add more tests and fix models
* fix copies
* return sliding window mask
* run slow tests & fix + codestyle
* one more falcon fix for alibi
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-07 10:02:16 +05:00
HyunJi Shin
6af0854efa
🌐 [i18n-KO] Translated image_to_image.md
to Korean ( #32327 )
...
* docs: ko: tasks/image_to_image.md
* feat: nmt draft
* fix: manual edits
* fix: resolve suggestions
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
* fix: handle remaining suggestions
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
---------
Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
2024-08-06 11:59:44 -07:00
boyunJang
3b193c7bae
🌐 [i18n-KO] Translated idefics.md
to Korean ( #32258 )
...
* docs: ko: tasks/idefics.md
* feat: nmt draft
* fix: manual edits
* fix: resolve suggestions
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>
---------
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>
2024-08-06 11:58:21 -07:00
timdalxx
5301b981d7
🌐 [i18n-KO] Translated mask_generation.md
to Korean ( #32257 )
...
* docs: ko: tasks/mask_generation.md
* feat: nmt draft
* fix : toc local
* fix : manual edits
* fix : ko-toctree
* fix: resolve suggestions
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
* fix: resolve suggestions
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
* fix: resolve suggestions
* fix: resolve suggestions
* fix: resolve suggestions
---------
Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
2024-08-06 11:36:14 -07:00
Matthew Douglas
ac2707e8ee
Revert "fixes to properly shard FSDP across cpu and meta for cpu_effcient_loading for prequantized 4bit ( #32276 )" ( #32477 )
...
* Revert "fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276 )"
This reverts commit 62c60a3018
.
We uncovered an issue with this change that caused our training runs to hang.
* `is_torchdynamo_compiling` -- cast a wide exception net (#32476 )
* cast a wide net
* make fix-copies with a few manual changes
* add copied from
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-08-06 20:28:59 +02:00
Joao Gante
4fdc7020b2
is_torchdynamo_compiling
-- cast a wide exception net (#32476 )
...
* cast a wide net
* make fix-copies with a few manual changes
* add copied from
2024-08-06 20:12:58 +02:00
Arthur Zucker
26a9443dae
dev version 4.45.0
2024-08-06 18:33:18 +02:00
Chris Toukmaji
50c3ba889a
Documentation: BOS token_id deprecation change for NLLB ( #32443 )
...
Update nllb.md
2024-08-06 09:22:08 -07:00
Zach Mueller
194cf1f392
Migrate import checks not need accelerate, and be more clear on min versions ( #32292 )
...
* Migrate import checks to secondary accelerate calls
* better errs too
* Revert, just keep the import checks + remove accelerate-specific things
* Rm extra'
* Empty commit for ci
* Small nits
* Final
2024-08-06 12:03:09 -04:00
Pablo Montalvo
80b90e7b2f
Add codestral mamba2 ( #32080 )
...
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-08-06 16:39:52 +02:00
Joao Gante
3d8bd11942
Generate: fix end to end compilation ( #32465 )
2024-08-06 15:06:47 +01:00
Ao Tang
6a03942db7
Add Nemotron HF Support ( #31699 )
...
* Add nemotron support
* fix inference
* add unit test
* add layernorm1p as a class to avoid meta device mismatch
* test fixed
* Add copied_from statements
* remove pretraining_tp args
* remove nemotronlayernorm
* force LN computation done in FP32
* remove nemotrontokenizer and use llamatokenizer
* license update
* add option for kv_channels for minitron8b
* remove assert
* o_proj fixed
* o_proj reshape
* add gated_proj option
* typo
* remove todos
* fix broken test after merging latest main
* remove nezha/nat after meging main
* chnage default config to 15b model
* add nemo conversion script
* rename conversion script
* remove gate_proj option
* pr comment resolved
* fix unit test
* rename kv_channels to head_dim
* resolve PR issue
* add nemotron md
* fix broken tests
* refactor rope for nemotron
* test fix
* remove linearscaling
* whitespace and import
* fix some copied-from
* code style fix
* reformatted
* add position_embedding to nemotronattention
* rope refactor to only use config, copied-from fix
* format
* Run make fix-copies
* nemotron md with autodoc
* doc fix
* fix order
* pass check_config_docstrings.py
* fix config_attributes
* remove all llama BC related code
* Use PreTrainedTokenizerFast
* ruff check examples
* conversion script update
* add nemotron to toctree
2024-08-06 15:42:05 +02:00
Joao Gante
36fd35e1cf
Dependencies: fix typo ( #32389 )
...
deps_2
2024-08-06 12:36:33 +01:00
Francisco Kurucz
438d06c95a
Fix get large model config for Switch Transformer encoder only tester ( #32438 )
2024-08-06 11:48:32 +01:00
Pavel Iakubovskii
fb66ef8147
Update kwargs validation for preprocess
with decorator ( #32024 )
...
* BLIP preprocess
* BIT preprocess
* BRIDGETOWER preprocess
* CHAMELEON preprocess
* CHINESE_CLIP preprocess
* CONVNEXT preprocess
* DEIT preprocess
* DONUT preprocess
* DPT preprocess
* FLAVA preprocess
* EFFICIENTNET preprocess
* FUYU preprocess
* GLPN preprocess
* IMAGEGPT preprocess
* INTRUCTBLIPVIDEO preprocess
* VIVIT preprocess
* ZOEDEPTH preprocess
* VITMATTE preprocess
* VIT preprocess
* VILT preprocess
* VIDEOMAE preprocess
* VIDEOLLAVA
* TVP processing
* TVP fixup
* SWIN2SR preprocess
* SIGLIP preprocess
* SAM preprocess
* RT-DETR preprocess
* PVT preprocess
* POOLFORMER preprocess
* PERCEIVER preprocess
* OWLVIT preprocess
* OWLV2 preprocess
* NOUGAT preprocess
* MOBILEVIT preprocess
* MOBILENETV2 preprocess
* MOBILENETV1 preprocess
* LEVIT preprocess
* LAYOUTLMV2 preprocess
* LAYOUTLMV3 preprocess
* Add test
* Update tests
2024-08-06 11:33:05 +01:00
Fanli Lin
e85d86398a
add the missing flash attention test marker ( #32419 )
...
* add flash attention check
* fix
* fix
* add the missing marker
* bug fix
* add one more
* remove order
* add one more
2024-08-06 11:18:58 +01:00
Prakarsh Kaushik
0aa8328293
Llava: fix checkpoint_doc ( #32458 )
...
fix: add new llava like model bug
2024-08-06 10:11:59 +01:00
Raushan Turganbay
37c5ca5eb9
Cache: create docs ( #32150 )
...
* draft
* updates
* works?
* try adding python example in hidden section
* another try
* hwo do i render python
* format as html code?
* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update docs/source/en/kv_cache.md
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* one more small update
* should render hidden secrtion now
* add outputs
* fix links
* check links
* update all links
* update with offloaded cache
* all cache is importable, so they appear in docs
* fix copies
* docstring...
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-08-06 10:24:19 +05:00
Francisco Kurucz
13dc6b0853
Fix documentation links and code reference to model llava-next ( #32434 )
2024-08-05 15:14:50 -07:00
amyeroberts
7e5d46ded4
Respect the config's attn_implementation if set ( #32383 )
...
* Respect the config's attn if set
* Update test - can override in from_config
* Fix
2024-08-05 16:33:19 +01:00
Sai-Suraj-27
458b0cd2c5
fix: Updated test_embeded_special_tokens
for luke and mluke models ( #32413 )
...
Fixed tokenizertests for luke, mluke models.
2024-08-05 15:19:42 +01:00
Abdi
baf7e5c927
Persist embedding type of BART and mBART models after resize ( #32242 )
...
* fix: persist embedding type of MBartConditonalGeneration after resize
* fix: persist embedding type of BartConditonalGeneration after resize
2024-08-05 14:15:36 +01:00
Francisco Kurucz
f5f1e52f6c
Fix documentation references to google/bit-50 model ( #32407 )
2024-08-05 10:18:28 +02:00
Nicholas Broad
ea5da52ebc
add values for neftune ( #32399 )
...
I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.
2024-08-05 09:51:58 +02:00
Ita Zaporozhets
3d7c2f9dea
#32184 save total_vocab_size ( #32240 )
...
* save total_vocab_size = vocab_size + user added tokens to speed up operation
* updating length when added_tokens_decoder is set
* add test len(tokenizer)
2024-08-05 09:22:48 +02:00
Raushan Turganbay
3bb646a54f
Phi3 tests: fix typing for Python 3.8 ( #32388 )
...
fix phi
2024-08-05 11:58:42 +05:00
TechInterMezzo
05ae3a300d
fix: SeamlessM4TFeatureExtractor stride remainder ( #32088 )
...
* fix: SeamlessM4TFeatureExtractor stride remainder
* Added attention mask size test
* Reran ruff for style correction
2024-08-05 08:40:58 +02:00
dependabot[bot]
847bb856d5
Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decision_transformer ( #32393 )
...
Bump keras in /examples/research_projects/decision_transformer
Bumps [keras](https://github.com/keras-team/keras ) from 2.8.0 to 2.13.1.
- [Release notes](https://github.com/keras-team/keras/releases )
- [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1 )
---
updated-dependencies:
- dependency-name: keras
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-05 08:38:34 +02:00
Xueshen Liu
621fb3c0ed
MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. ( #31500 )
...
* Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)
* fix typo [:-1] to [:, -1]
* to meet formatting requirement
* to meet formatting requirement
* remove white space
* MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.
* propagate to startcoder2, phi3, mixtral and qwen2
* update qwen2_moe
2024-08-03 20:07:55 +02:00
Shaopeng Fu
7c31d05b59
fix: (issue #32124 ) Exception raised when running transformers/examples/flax/language-modeling/t5_tokenizer_model.py
. ( #32157 )
...
fix: Exception raised when running .
2024-08-03 18:24:11 +02:00
Sanchit Gandhi
c1aa0edb48
[generate] only require an attention mask for mps with torch<2.4 ( #32367 )
...
* up
* style
* stopping
2024-08-02 17:32:50 +08:00
Joao Gante
083e13b7c4
RoPE: Add numerical tests ✨ ( #32380 )
...
tests! :D
2024-08-02 09:39:45 +01:00
Raushan Turganbay
2af199c42b
Update docs ( #32368 )
...
nits
2024-08-02 09:54:16 +05:00
Zach Mueller
82efc53513
Yell at the user if zero-3 init wasn't performed, but expected to have been done ( #32299 )
...
* Test this zach
* Test for improper init w/o zero3
* Move back
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Get rid of stars in warning
* Make private
* Make clear
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-01 15:18:43 -04:00