Lysandre Debut
95020f208e
Fix CPU offload + disk offload tests ( #27204 )
...
Fix disk offload tests + weight sharing issues
2023-11-01 19:25:23 +01:00
Marc Sun
c9e72f55b2
Add exllamav2 better ( #27111 )
...
* add_ xllamav2 arg
* add test
* style
* add check
* add doc
* replace by use_exllama_v2
* fix tests
* fix doc
* style
* better condition
* fix logic
* add deprecate msg
* deprecate exllama
* remove disable_exllama from the linter
* remove
* fix warning
* Revert the commits deprecating exllama
* deprecate disable_exllama for use_exllama
* fix
* fix loading attribute
* better handling of args
* remove disable_exllama from init and linter
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* better arg
* fix warning
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* switch to dict
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* style
* nits
* style
* better tests
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 13:09:21 -04:00
jiaqiw09
239cd0eaa2
Translate task summary to chinese ( #27180 )
...
* translate task_summary.md to chinese
* update translation
* update translation
* fix _toctree.yml
2023-11-01 09:28:34 -07:00
Rafael Padilla
1e32b05e06
improving TimmBackbone to support FrozenBatchNorm2d ( #27160 )
...
* supporting freeze_batch_norm_2d
* supporting freeze_batch_norm_2d
* including unfreeze + separate into methods
* fix typo
* calling unfreeze
* lint
* Update src/transformers/models/timm_backbone/modeling_timm_backbone.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Rafael Padilla <rafael.padilla@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 12:58:35 -03:00
Wesley L Passos
21a2fbaf48
Fix docstring in get_oneformer_resize_output_image_size func ( #27207 )
2023-11-01 15:31:13 +00:00
Andi Powers Holmes
f8afb2b2ec
Add TensorFlow implementation of ConvNeXTv2 ( #25558 )
...
* Add type annotations to TFConvNextDropPath
* Use tf.debugging.assert_equal for TFConvNextEmbeddings shape check
* Add TensorFlow implementation of ConvNeXTV2
* check_docstrings: add TFConvNextV2Model to exclusions
TFConvNextV2Model and TFConvNextV2ForImageClassification have docstrings
which are equivalent to their PyTorch cousins, but a parsing issue prevents them
from passing the test.
Adding exclusions for these two classes as discussed in #25558 .
2023-11-01 15:09:55 +00:00
Patrick von Platen
391d14e810
[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding ( #27195 )
...
* finish
* add tests
* fix all tests
* [Assistant Decoding] Add test
* fix more
* better
* finish
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* finish
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 16:01:53 +01:00
Alexander Kozlov
f9b4bea0a6
Added cache_block_outputs option to enable GPTQ for non-regular models ( #27032 )
...
* Added cache_block_outputs option to enable GPTQ for non-regular models
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Fixed style
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 14:37:19 +00:00
Shashank Rajput
037fb7d0e1
added unsqueeze_dim to apply_rotary_pos_emb ( #27117 )
...
* added unsqueeze_dim to apply_rotary_pos_emb
* Added docstring
* Modified docstring
* Modified docstring
* Modified docstring
* Modified docstring
* Modified docstring
* ran make fix-copies and make fixup
* Update src/transformers/models/llama/modeling_llama.py
Accepting the proposed changes in formatting.
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* incorporating PR suggestions
* incorporating PR suggestions
* incorporating PR suggestions
* incorporating PR suggestions
* ..
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 14:16:57 +00:00
Wesley L Passos
f3c1a172bb
Fixing docstring in get_resize_output_image_size function ( #27191 )
2023-11-01 12:42:41 +00:00
MD FAIZAN KHAN
636f704d0b
Fix the typos and grammar mistakes in CONTRIBUTING.md. ( #27193 )
...
Fix the typos and grammar mistakes in CONTRIBUTING.md
2023-11-01 12:42:22 +00:00
Wesley L Passos
71025520bc
Fix docstring get maskformer resize output image size ( #27196 )
...
* fix docstring in get_maskformer_resize_output_image_size
* fix functions docstring
* fix 'copied from' functions docstring
* fix docstring
* fix return type
* fix docstring resize
2023-11-01 12:26:14 +00:00
Younes Belkada
ae093eef01
[core
/ Quantization
] AWQ integration ( #27045 )
...
* working v1
* oops
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fixup
* oops
* push
* more changes
* add docs
* some fixes
* fix copies
* add v1 doc
* added installation guide
* relax constraints
* revert
* attempt llm-awq
* oops
* oops
* fixup
* raise error when incorrect cuda compute capability
* nit
* add instructions for llm-awq
* fixup
* fix copies
* fixup and docs
* change
* few changes + add demo
* add v1 tests
* add autoawq in dockerfile
* finalize
* Update tests/quantization/autoawq/test_awq.py
* fix test
* fix
* fix issue
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add link to example script
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add more content
* add more details
* add link to quantization docs
* camel case + change backend class name
* change to string
* fixup
* raise errors if libs not installed
* change to `bits` and `group_size`
* nit
* nit
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* disable training
* address some comments and fix nits
* fix
* final nits and fix tests
* adapt to our new runners
* make fix-copies
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* move to top
* add conversion test
* final nit
* add more elaborated test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 09:06:31 +01:00
Hz, Ji
82c7e87987
device agnostic fsdp testing ( #27120 )
...
* make fsdp test cases device agnostic
* make style
2023-11-01 07:17:06 +01:00
Yeyang
7d8ff3629b
🌐 [i18n-ZH] Translate tflite.md into Chinese ( #27134 )
...
* docs(zh): translate tflite.md
* docs(zh): add space around links
* Update docs/source/zh/tflite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-10-31 12:50:48 -07:00
Lysandre Debut
113ebf80ac
Safetensors serialization by default ( #27064 )
...
* Safetensors serialization by default
* First pass on the tests
* Second pass on the tests
* Third pass on the tests
* Fix TF weight loading from TF-format safetensors
* Specific encoder-decoder fixes for weight crossloading
* Add VisionEncoderDecoder fixes for TF too
* Change filename test for pt-to-tf
* One missing fix for TFVisionEncoderDecoder
* Fix the other crossload test
* Support for flax + updated tests
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Sanchit's comments
* Sanchit's comments 2
* Nico's comments
* Fix tests
* cleanup
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-10-31 19:16:49 +01:00
Dong-geon Lee
25e6e9418c
Unify warning styles for better readability ( #27184 )
2023-10-31 18:12:14 +00:00
Hz, Ji
50378cbf6c
device agnostic models testing ( #27146 )
...
* device agnostic models testing
* add decorator `require_torch_fp16`
* make style
* apply review suggestion
* Oops, the fp16 decorator was misused
2023-10-31 18:12:14 +01:00
Steven Liu
77930f8a01
[docs] Update CPU/GPU inference docs ( #26881 )
...
* first draft
* remove non-existent paths
* edits
* feedback
* feedback and optimum
* Apply suggestions from code review
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
* redirect to correct doc
* _redirects.yml
---------
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
2023-10-31 09:44:51 -07:00
jiaqiw09
6b7f8ff1f3
translate traning.md to chinese ( #27122 )
...
* translate traning.md
* update _tocree.yml
* update _tocree.yml
* update _tocree.yml
2023-10-31 08:57:37 -07:00
Susnato Dhar
e22b7ced9a
Fix dropout in StarCoder
( #27182 )
...
fix dropout in modeling_gpt_bigcode.py
2023-10-31 16:44:57 +01:00
Younes Belkada
4bb50aa212
[Quantization
/ tests
] Fix bnb MPT test ( #27178 )
...
fix bnb mpt test
2023-10-31 16:25:53 +01:00
Matt
05f2290114
Backward compatibility fix for the Conversation class ( #27176 )
...
* Backward compatibility fix for the Conversation class
* Explain what's going on in the conditional
2023-10-31 15:12:06 +00:00
Younes Belkada
309a90664f
[FEAT] Add Neftune into transformers Trainer ( #27141 )
...
* add v1 neftune
* use `unwrap_model` instead
* add test + docs
* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* more details
* fixup
* Update docs/source/en/main_classes/trainer.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* refactor a bit
* more elaborated test
* fix unwrap issue
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-10-31 16:03:59 +01:00
Hz, Ji
f53041a753
device agnostic pipelines testing ( #27129 )
...
* device agnostic pipelines testing
* pass torch_device
2023-10-31 15:46:31 +01:00
Matt
08fadc8085
Shorten the conversation tests for speed + fixing position overflows ( #26960 )
...
* Shorten the conversation tests for speed + fixing position overflows
* Put max_new_tokens back to 5
* Remove test skips
* Increase max_position_embeddings in blenderbot tests
* Add skips for blenderbot_small
* Correct TF test skip
* make fixup
* Reformat skips to use is_pipeline_test_to_skip
* Update tests/models/blenderbot_small/test_modeling_blenderbot_small.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/blenderbot_small/test_modeling_flax_blenderbot_small.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-10-31 14:20:04 +00:00
Yih-Dar
a8e74ebdc5
Trigger CI if tiny_model_summary.json
is modified ( #27175 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-31 14:49:02 +01:00
Vivek Khandelwal
2963e196ee
Add support for loading GPTQ models on CPU ( #26719 )
...
* Add support for loading GPTQ models on CPU
Right now, we can only load the GPTQ Quantized model on the CUDA
device. The attribute `gptq_supports_cpu` checks if the current
auto_gptq version is the one which has the cpu support for the
model or not.
The larger variants of the model are hard to load/run/trace on
the GPU and that's the rationale behind adding this attribute.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
* Update quantization.md
* Update quantization.md
* Update quantization.md
2023-10-31 13:45:23 +00:00
Nick Hill
3cd3eaf960
fix: Fix typical_p behaviour broken in recent change ( #27165 )
...
A recent PR https://github.com/huggingface/transformers/pull/26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.
However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.
Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).
We still need a max clamp in case that last token is the very last one in the tensor.
2023-10-31 13:09:56 +00:00
Susnato Dhar
b5db8ca66f
Add flash attention for gpt_bigcode
( #26479 )
...
* added flash attention of gpt_bigcode
* changed docs
* Update src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py
* add FA-2 docs
* oops
* Update docs/source/en/perf_infer_gpu_one.md Last Nit
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* oops
* remove padding_mask
* change getattr->hasattr logic
* changed .md file
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-31 11:21:02 +00:00
Yih-Dar
9dc4ce9ea7
Disable CI runner check ( #27170 )
...
Disable runner check
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-31 11:59:21 +01:00
Seungwoo, Jeong
14bb196cc8
[doctring] Fix docstring for BlipTextConfig, BlipVisionConfig ( #27173 )
...
Update configuration_blip.py
edit docstrings
2023-10-31 10:41:56 +00:00
Akshar Goyal
9234caefb0
[docstring] Fix docstring for AltCLIPTextConfig, AltCLIPVisionConfig and AltCLIPConfig ( #27128 )
...
* [docstring] Fix docstring for AltCLIPVisionConfig, AltCLIPTextConfig + cleaned some docstring
* Removed entries from check_docstring.py
* Removed entries from check_docstring.py
* Removed entry from check_docstring.py
* [docstring] Fix docstring for AltCLIPTextConfig, AltCLIPVisionConfig and AltCLIPConfig
2023-10-31 10:20:14 +00:00
Clifford Ressel
b5c8e23f0f
Remove broken links to s-JoL/Open-Llama ( #27164 )
2023-10-31 10:17:54 +00:00
Hz, Ji
df6f36a171
deprecate function get_default_device
in tools/base.py
( #26774 )
...
* get default device through `PartialState().default_device` as is has
been officially released
* apply code review suggestion
* apply code review suggestion
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2023-10-31 09:15:39 +00:00
NielsRogge
8211c59b9a
[KOSMOS-2] Update docs ( #27157 )
...
Update docs
2023-10-30 21:42:19 +01:00
NielsRogge
d39352d12c
Fix import of torch.utils.checkpoint ( #27155 )
...
* Fix import
* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-10-30 20:08:29 +00:00
MD FAIZAN KHAN
e971486d89
Fix: typos in README.md ( #27154 )
2023-10-30 19:12:09 +00:00
Younes Belkada
f7ea959b96
[core
/ GC
/ tests
] Stronger GC tests ( #27124 )
...
* stronger GC tests
* better tests and skip failing tests
* break down into 3 sub-tests
* break down into 3 sub-tests
* refactor a bit
* more refactor
* fix
* last nit
* credits contrib and suggestions
* credits contrib and suggestions
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-10-30 19:53:46 +01:00
Hz, Ji
5bbf671276
Device agnostic trainer testing ( #27131 )
2023-10-30 18:16:40 +00:00
Rockerz
84724efd10
Translating en/main_classes
folder docs to Japanese 🇯🇵 ( #26894 )
...
* add
* add
* add
* Add deepspeed.md
* Add
* add
* Update docs/source/ja/main_classes/callback.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/output.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/pipelines.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/text_generation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update logging.md
* Update toctree.yml
* Update docs/source/ja/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add suggesitons
* m
* Update docs/source/ja/main_classes/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update toctree.yml
* Update Quantization.md
* Update docs/source/ja/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update toctree.yml
* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-10-30 09:39:14 -07:00
Yeyang
9093b19b13
🌐 [i18n-ZH] Translate serialization.md into Chinese ( #27076 )
...
* docs(zh): translate serialization.md
* docs(zh): add space around links
2023-10-30 08:50:29 -07:00
Yih-Dar
3224c0c13f
Remove some Kosmos-2 copied from
( #27149 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-30 16:07:27 +01:00
Hz, Ji
cd19b19378
make tests of pytorch_example device agnostic ( #27081 )
2023-10-30 14:56:41 +00:00
Younes Belkada
6b466771b0
[tests
/ Quantization
] Fix bnb test ( #27145 )
...
* fix bnb test
* link to GH issue
2023-10-30 15:43:08 +01:00
Yih-Dar
576994963f
Fix some tests using "common_voice"
( #27147 )
...
* Use mozilla-foundation/common_voice_11_0
* Update expected values
* Update expected values
* For test_word_time_stamp_integration
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-30 15:27:15 +01:00
Yih-Dar
691fd8fdde
Add Kosmos-2
model ( #24709 )
...
* Add KOSMOS-2 model
* update
* update
* update
* address review comment - 001
* address review comment - 002
* address review comment - 003
* style
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix
* address review comment - 004
* address review comment - 005
* address review comment - 006
* address review comment - 007
* address review comment - 008
* address review comment - 009
* address review comment - 010
* address review comment - 011
* update readme
* fix
* fix
* fix
* [skip ci] fix
* revert the change in _decode
* fix docstring
* fix docstring
* Update docs/source/en/model_doc/kosmos-2.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* no more Kosmos2Tokenizer
* style
* remove "returned when being computed by the model"
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* UTM5 Atten
* fix attn mask
* use present_key_value_states instead of next_decoder_cache
* style
* conversion scripts
* conversion scripts
* conversion scripts
* Add _reorder_cache
* fix doctest and copies
* rename 1
* rename 2
* rename 3
* make fixup
* fix table
* fix docstring
* rename 4
* change repo_id
* remove tip
* update md file
* make style
* update md file
* put docs/source/en/model_doc/kosmos-2.md to slow
* update conversion script
* Use CLIPImageProcessor in Kosmos2Processor
* Remove Kosmos2ImageProcessor
* Remove to_dict in Kosmos2Config
* Remove files
* fix import
* Update conversion
* normalized=False
* Not using hardcoded values like <image>
* elt --> element
* Apply suggestion
* Not using hardcoded values like </image>
* No assert
* No nested functions
* Fix md file
* copy
* update doc
* fix docstring
* fix name
* Remove _add_remove_spaces_around_tag_tokens
* Remove dummy docstring of _preprocess_single_example
* Use `BatchEncoding`
* temp
* temp
* temp
* Update
* Update
* Make Kosmos2ProcessorTest a bit pretty
* Update gradient checkpointing
* Fix gradient checkpointing test
* Remove one liner remove_special_fields
* Simplify conversion script
* fix add_eos_token
* update readme
* update tests
* Change to microsoft/kosmos-2-patch14-224
* style
* Fix doc
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-30 13:32:17 +01:00
Hz, Ji
d751dbecb2
remove the obsolete code related to fairscale FSDP ( #26651 )
...
* remove the obsolete code related to fairscale FSDP
* apple review suggestion
2023-10-30 11:55:03 +00:00
Younes Belkada
5fbed2d7ca
[Trainer
/ GC
] Add gradient_checkpointing_kwargs
in trainer and training arguments ( #27068 )
...
* add `gradient_checkpointing_kwargs` in trainer and training arguments
* add comment
* add test - currently failing
* now tests pass
2023-10-30 12:41:48 +01:00
Thien Tran
e830495c1c
Fix data2vec-audio note about attention mask ( #27116 )
...
fix data2vec audio note about attention mask
2023-10-30 10:52:24 +00:00