transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 05:40:05 +06:00

Author	SHA1	Message	Date
NielsRogge	63ffd56d02	Add SiglipForImageClassification and CLIPForImageClassification (#28952 ) * First draft * Add CLIPForImageClassification * Remove scripts * Fix doctests	2024-02-14 08:41:31 +01:00
Jonathan Tow	de6029a059	Add `StableLM` (#28810 ) * Add `StableLM` * fix(model): re-create from `huggingface-cli add-new-model-like persimmon` * fix: re-add changes to address comments * fix(readme): add links to paper * fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref * fix(tests): re-add `@slow` decorator to integration tests * fix(tests): import slow... * fix(readme_hd): remove whitespace edit * fix(tokenizer): auto tokenizer tuple * skip doctests for `modeling_stablelm`	2024-02-14 07:15:18 +01:00
Klaus Hipp	d90acc1643	[i18n-de] Translate CONTRIBUTING.md to German (#28954 ) * Translate contributing.md to German * Fix formatting issues in contributing.md * Address review comments * Fix capitalization	2024-02-12 13:39:20 -08:00
NielsRogge	78ba9f4617	[Docs] Add video section (#28958 ) Add video section	2024-02-12 19:50:31 +01:00
Klaus Hipp	fe3df9d5b3	[Docs] Add language identifiers to fenced code blocks (#28955 ) Add language identifiers to code blocks	2024-02-12 10:48:31 -08:00
NielsRogge	ef5ab72f4b	[Docs] Update README and default pipelines (#28864 ) * Update README and docs * Update README * Update README	2024-02-12 10:21:36 +01:00
Klaus Hipp	2749e479f3	[Docs] Fix broken links and syntax issues (#28918 ) * Fix model documentation links in attention.md * Fix external link syntax * Fix target anchor names of section links * Fix copyright statement comments * Fix documentation headings	2024-02-08 14:13:35 -08:00
Arthur	115ac94d06	[`Core generation`] Adds support for static KV cache (#27931 ) Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-02-08 11:50:34 +01:00
Klaus Hipp	33df036917	[Docs] Revert translation of '@slow' decorator (#28912 )	2024-02-08 03:31:47 +01:00
Klaus Hipp	1c31b7aa3b	[Docs] Add missing language options and fix broken links (#28852 ) * Add missing entries to the language selector * Add links to the Colab and AWS Studio notebooks for ONNX * Use anchor links in CONTRIBUTING.md * Fix broken hyperlinks due to spaces * Fix links to OpenAI research articles * Remove confusing footnote symbols from author names, as they are also considered invalid markup	2024-02-06 12:01:01 -08:00
Klaus Hipp	4830f26965	[Docs] Fix backticks in inline code and documentation links (#28875 ) Fix backticks in code blocks and documentation links	2024-02-06 11:15:44 -08:00
nakranivaibhav	2e7c942c81	Adds LlamaForQuestionAnswering class in modeling_llama.py along with AutoModel Support (#28777 ) * This is a test commit * testing commit * final commit with some changes * Removed copy statement * Fixed formatting issues * Fixed error added past_key_values in the forward method * Fixed a trailing whitespace. Damn the formatting rules are strict * Added the copy statement	2024-02-06 03:41:42 +01:00
amyeroberts	ba3264b4e8	Image Feature Extraction pipeline (#28216 ) * Draft pipeline * Fixup * Fix docstrings * Update doctest * Update pipeline_model_mapping * Update docstring * Update tests * Update src/transformers/pipelines/image_feature_extraction.py Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Fix docstrings - review comments * Remove pipeline mapping for composite vision models * Add to pipeline tests * Remove for flava (multimodal) * safe pil import * Add requirements for pipeline run * Account for super slow efficientnet * Review comments * Fix tests * Swap order of kwargs * Use build_pipeline_init_args * Add back FE pipeline for Vilt * Include image_processor_kwargs in docstring * Mark test as flaky * Update TODO * Update tests/pipelines/test_pipelines_image_feature_extraction.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add license header --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-02-05 14:50:07 +00:00
Klaus Hipp	721ee783ca	[Docs] Fix spelling and grammar mistakes (#28825 ) * Fix typos and grammar mistakes in docs and examples * Fix typos in docstrings and comments * Fix spelling of `tokenizer` in model tests * Remove erroneous spaces in decorators * Remove extra spaces in Markdown link texts	2024-02-02 08:45:00 +01:00
Steven Liu	2418c64a1c	[docs] HfQuantizer (#28820 ) * tidy * fix path	2024-02-02 08:22:18 +01:00
Steven Liu	abbffc4525	[docs] Backbone (#28739 ) * backbones * fix path * fix paths * fix code snippet * fix links	2024-02-01 09:16:16 -08:00
Rockerz	23ea6743f2	Add models from deit (#28302 ) * Add modelss * Add 2 more models * add models to tocrree * Add modles * Update docs/source/ja/model_doc/detr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/deit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ja/model_doc/deplot.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix bugs --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-02-01 09:15:55 -08:00
Matt	7bc6d76396	Add tip on setting tokenizer attributes (#28764 ) * Add tip on setting tokenizer attributes * Grammar * Remove the bit that was causing doc builds to fail	2024-02-01 14:44:58 +00:00
JB (Don)	0d26abdd3a	Adding [T5/MT5/UMT5]ForTokenClassification (#28443 ) * Adding [T5/MT5/UMT5]ForTokenClassification * Add auto mappings for T5ForTokenClassification and variants * Adding ForTokenClassification to the list of models * Adding attention_mask param to the T5ForTokenClassification test * Remove outdated comment in test * Adding EncoderOnly and Token Classification tests for MT5 and UMT5 * Fix typo in umt5 string * Add tests for all the existing MT5 models * Fix wrong comment in dependency_versions_table * Reverting change to common test for _keys_to_ignore_on_load_missing The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing. * Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model * Add fix-copies to MT5ModelTest	2024-02-01 03:53:49 +01:00
Kian Sierra McGettigan	f7076cd346	Flax mistral (#26943 ) * direct copy from llama work * mistral modules forward pass working * flax mistral forward pass with sliding window * added tests * added layer collection approach * Revert "added layer collection approach" This reverts commit `0e2905bf22`. * Revert "Revert "added layer collection approach"" This reverts commit `fb17b6187a`. * fixed attention outputs * added mistral to init and auto * fixed import name * fixed layernorm weight dtype * freeze initialized weights * make sure conversion consideres bfloat16 * added backend * added docstrings * added cache * fixed sliding window causal mask * passes cache tests * passed all tests * applied make style * removed commented out code * applied fix-copies ignored other model changes * applied make fix-copies * removed unused functions * passed generation integration test * slow tests pass * fixed slow tests * changed default dtype from jax.numpy.float32 to float32 for docstring check * skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids * updated checkpoint since from_pt not included * applied black style * removed unused args * Applied styling and fixup * changed checkpoint for doc back * fixed rf after adding it to hf hub * Add dummy ckpt * applied styling * added tokenizer to new ckpt * fixed slice format * fix init and slice * changed ref for placeholder TODO * added copies from Llama * applied styling * applied fix-copies * fixed docs * update weight dtype reconversion for sharded weights * removed Nullable input ids * Removed unnecessary output attentions in Module * added embedding weight initialziation * removed unused past_key_values * fixed deterministic * Fixed RMS Norm and added copied from * removed input_embeds * applied make style * removed nullable input ids from sequence classification model * added copied from GPTJ * added copied from Llama on FlaxMistralDecoderLayer * added copied from to FlaxMistralPreTrainedModel methods * fix test deprecation warning * freeze gpt neox random_params and fix copies * applied make style * fixed doc issue * skipped docstring test to allign # copied from * applied make style * removed FlaxMistralForSequenceClassification * removed unused padding_idx * removed more sequence classification * removed sequence classification * applied styling and consistency * added copied from in tests * removed sequence classification test logic * applied styling * applied make style * removed freeze and fixed copies * undo test change * changed repeat_kv to tile * fixed to key value groups * updated copyright year * split casual_mask * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest * went back to 2023 for tests_pr_documentation_tests * went back to 2024 * changed tile to repeat * applied make style * empty for retry on Wav2Vec2	2024-01-31 14:19:02 +01:00
Matt	415e9a0980	Add tf_keras imports to prepare for Keras 3 (#28588 ) * Port core files + ESM (because ESM code is odd) * Search-replace in modelling code * Fix up transfo_xl as well * Fix other core files + tests (still need to add correct import to tests) * Fix cookiecutter * make fixup, fix imports in some more core files * Auto-add imports to tests * Cleanup, add imports to sagemaker tests * Use correct exception for importing tf_keras * Fixes in modeling_tf_utils * make fixup * Correct version parsing code * Ensure the pipeline tests correctly revert to float32 after each test * Ensure the pipeline tests correctly revert to float32 after each test * More tf.keras -> keras * Add dtype cast * Better imports of tf_keras * Add a cast for tf.assign, just in case * Fix callback imports	2024-01-30 17:26:36 +00:00
Younes Belkada	866253f85e	[`HfQuantizer`] Move it to "Developper guides" (#28768 ) Update _toctree.yml	2024-01-30 07:20:20 +01:00
Poedator	d78e78a0e4	`HfQuantizer` class for quantization-related stuff in `modeling_utils.py` (#26610 ) * squashed earlier commits for easier rebase * rm rebase leftovers * 4bit save enabled @quantizers * TMP gptq test use exllama * fix AwqConfigTest::test_wrong_backend for A100 * quantizers AWQ fixes * _load_pretrained_model low_cpu_mem_usage branch * quantizers style * remove require_low_cpu_mem_usage attr * rm dtype arg from process_model_before_weight_loading * rm config_origin from Q-config * rm inspect from q_config * fixed docstrings in QuantizationConfigParser * logger.warning fix * mv is_loaded_in_4(8)bit to BnbHFQuantizer * is_accelerate_available error msg fix in quantizer * split is_model_trainable in bnb quantizer class * rm llm_int8_skip_modules as separate var in Q * Q rm todo * fwd ref to HFQuantizer in type hint * rm note re optimum.gptq.GPTQQuantizer * quantization_config in __init__ simplified * replaced NonImplemented with create_quantized_param * rm load_in_4/8_bit deprecation warning * QuantizationConfigParser refactoring * awq-related minor changes * awq-related changes * awq config.modules_to_not_convert * raise error if no q-method in q-config in args * minor cleanup * awq quantizer docstring * combine common parts in bnb process_model_before_weight_loading * revert test_gptq * .process_model_ cleanup * restore dict config warning * removed typevars in quantizers.py * cleanup post-rebase 16 jan * QuantizationConfigParser classmethod refactor * rework of handling of unexpected aux elements of bnb weights * moved q-related stuff from save_pretrained to quantizers * refactor v1 * more changes * fix some tests * remove it from main init * ooops * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix awq issues * fix * fix * fix * fix * fix * fix * add docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/hf_quantizer.md * address comments * fix * fixup * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address final comment * update * Update src/transformers/quantizers/base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/quantizers/auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * add kwargs update * fixup * add `optimum_quantizer` attribute * oops * rm unneeded file * fix doctests --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-01-30 02:48:25 +01:00
Sanchit Gandhi	da3c79b245	[Whisper] Make tokenizer normalization public (#28136 ) * [Whisper] Make tokenizer normalization public * add to docs	2024-01-29 16:07:35 +00:00
Julien Chaumond	26aa03a252	small doc update for CamemBERT (#28644 )	2024-01-29 15:46:32 +01:00
Vinyzu	3a08cc485f	[Docs] Fix Typo in English & Japanese CLIP Model Documentation (TMBD -> TMDB) (#28751 ) * [Docs] Fix Typo in English CLIP model_doc * [Docs] Fix Typo in Japanese CLIP model_doc	2024-01-29 10:06:51 +00:00
Steven Liu	abe0289e6d	[docs] Fix datasets in guides (#28715 ) * change datasets * fix	2024-01-26 09:29:07 -08:00
D	3a46e30dd1	[`docs`] Update preprocessing.md (#28719 ) * Update preprocessing.md adjust ImageProcessor link to working target (same as in lower section of file) * Update preprocessing.md	2024-01-26 11:58:57 +00:00
Peter Götz	2875195887	[`docs`] Improve visualization for vertical parallelism (#28583 ) The documentation says "We refer to this Model parallelism as “Vertical” because of how models are typically visualized.", but then visualizes the model horizontally. This change visualizes the model indeed vertically.	2024-01-25 17:55:11 +00:00
Yusuf	24f1a00e4c	Update question_answering.md (#28694 ) fix typo: from: "model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")" to: model = TFAutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")	2024-01-25 14:06:38 +00:00
Merve Noyan	2000095666	Improve Backbone API docs (#28666 ) Update backbones.md	2024-01-25 11:51:58 +00:00
NielsRogge	963db81a5a	Add Depth Anything (#28654 ) * First draft * More improvements * More improvements * More improvements * More improvements * Add docs * Remove file * Add copied from * Address comments * Address comments * Address comments * Fix style * Update docs * Convert all checkpoints, add integration test * Rename checkpoints * Add pretrained backbone attributes * Fix default config * Address comment * Add figure to docs * Fix bug thanks to @xenova * Update conversion script * Fix integration test	2024-01-25 09:34:50 +01:00
Steven Liu	f40b87de0c	[docs] Fix doc format (#28684 ) * fix hfoptions * revert changes to other files * fix	2024-01-24 11:18:59 -08:00
Fanli Lin	8278b1538e	improve efficient training on CPU documentation (#28646 ) * update doc * revert * typo fix * refine * add dtypes * Update docs/source/en/perf_train_cpu.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/perf_train_cpu.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/perf_train_cpu.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * no comma * use avx512-vnni --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-01-24 09:07:13 -08:00
Steven Liu	738ec75c90	[docs] DeepSpeed (#28542 ) * config * optim * pre deploy * deploy * save weights, memory, troubleshoot, non-Trainer * done	2024-01-24 08:31:28 -08:00
amyeroberts	e547458c43	Fix phi model doc checkpoint (#28581 ) Co-authored-by: Pashmina Cameron <11311835+pashminacameron@users.noreply.github.com>	2024-01-22 17:15:07 +00:00
Matt	692c3c6b73	Add config tip to custom model docs (#28601 ) Add tip to custom model docs	2024-01-22 13:46:04 +00:00
NielsRogge	faf03541e2	[SigLIP] Don't pad by default (#28578 ) First draft	2024-01-19 13:30:00 +01:00
Yoach Lacombe	d2cdefb9ec	Add new meta w2v2-conformer BERT-like model (#28165 ) * first commit * correct default value non causal * update config and modeling code * update converting checkpoint * clean modeling and fix tests * make style * add new config parameters to docstring * fix copied from statements * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * make position_embeddings_type docstrings clearer * clean converting script * remove function not used * clean modeling file * apply suggestion for test file + add convert script to not_doctested * modify tests according to review - cleaner logic and more tests * Apply nit suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add checker of valid position embeddings type * instantiate new layer norm layer with the right eps * fix freeze_feature_encoder since it can be None in some cases * add test same output in convert script * restore wav2vec2conformer and add new model * create processor and FE + clean * add new model code * fix convert script and set default config parameters * correct model id paths * make style * make fix-copies and cleaning files * fix copied from statements * complete .md and fixe copies * clean convert script argument defaults * fix config parameters docstrings * fix config docstring * add copied from and enrich FE tests * fix copied from and repo-consistency * add autotokenizer * make test input length shorter and change docstring code * fix docstrings and copied from * add add_adapter to ASR training example * make testing of adapters more robust * adapt to multi adapter layers * refactor input_values->input_features and remove w2v2-bert feature extractor * remove pretraining model * remove depreciated features and useless lines * add copied from and ignore statements to modeling tests * remove pretraining model #2 * change import in convert script * change default in convert script * update readme and remove useless line * Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * refactor BERT to Bert for consistency * remove useless ignore copy statement * add persistent to buffer in rotary * add eps in LayerNorm init and remove copied from * add adapter activation parameters and add copied from statements * Fix copied statements and add unitest.skip reasons * add copied statement in test_processor * refactor processor * make style * replace numpy random by torch rand * remove expected output CTC * improve converting script with processor class * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove gumbel class * remove tests related to previously deleted class * Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * correct typos * remove uused parameters * update processor to takes both text and audio * update checkpoints * update expected output and add ctc expected output * add label_attention_mask * replace pt with np in processor tests * fix typo * revert to behaviour with labels_attention_mask --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-01-18 13:37:34 +00:00
Junyang Lin	d6ffe74dfa	Add qwen2 (#28436 ) * add config, modeling, and tokenization * add auto and init * update readme * update readme * update team name * fixup * fixup * update config * update code style * update for fixup * update for fixup * update for fixup * update for testing * update for testing * fix bug for config and tokenization * fix bug for bos token * not doctest * debug tokenizer * not doctest * debug tokenization * debug init for tokenizer * fix style * update init * delete if in token auto * add tokenizer doc * add tokenizer in init * Update dummy_tokenizers_objects.py * update * update * debug * Update tokenization_qwen2.py * debug * Update convert_slow_tokenizer.py * add copies * add copied from and make style * update files map * update test * fix style * fix merge reading and update tests * fix tests * fix tests * fix style * debug a variable in readme * Update src/transformers/models/qwen2/configuration_qwen2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update test and copied from * fix style * update qwen2 tokenization and tests * Update tokenization_qwen2.py * delete the copied from after property * fix style * update tests * update tests * add copied from * fix bugs * update doc * add warning for sliding window attention * update qwen2 tokenization * fix style * Update src/transformers/models/qwen2/modeling_qwen2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix tokenizer fast --------- Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com> Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-01-17 16:02:22 +01:00
Gustavo de Rosa	d93ef7d751	Fixes default value of `softmax_scale` in `PhiFlashAttention2`. (#28537 ) * fix(phi): Phi does not use softmax_scale in Flash-Attention. * chore(docs): Update Phi docs.	2024-01-17 14:22:44 +01:00
Hamza FILALI	002566f398	Improving Training Performance and Scalability Documentation (#28497 ) * Improving Training Performance and Scaling documentation by adding PEFT techniques to suggestions to reduce memory requirements for training * Update docs/source/en/perf_train_gpu_one.md Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-01-16 11:30:26 +01:00
Joao Gante	7e0ddf89f4	Generate: consolidate output classes (#28494 )	2024-01-15 17:04:08 +00:00
thedamnedrhino	366c03271e	Tokenizer kwargs in textgeneration pipe (#28362 ) * added args to the pipeline * added test * more sensical tests * fixup * docs * typo ; * docs * made changes to support named args * fixed test * docs update * styles * docs * docs	2024-01-15 16:52:18 +01:00
Francisco Kurucz	121641cab1	Fix paths to AI Sweden Models reference and model loading (#28423 ) Fix URL to Ai Sweden Models reference and model loading	2024-01-15 09:09:22 +01:00
Joao Gante	4fb3d3a0f6	TF: purge `TFTrainer` (#28483 )	2024-01-12 16:56:34 +00:00
Hankyeol Kyung	995a7ce9a8	Fix broken link on page (#28451 ) * [docs] Fix broken link Signed-off-by: Hankyeol Kyung <kghnkl0103@gmail.com> * [docs] Use shorter domain Signed-off-by: Hankyeol Kyung <kghnkl0103@gmail.com> --------- Signed-off-by: Hankyeol Kyung <kghnkl0103@gmail.com>	2024-01-11 09:26:13 -08:00
jiqing-feng	19e83d174c	Doc (#28431 ) * update version for cpu training * update docs for cpu training * fix readme * fix readme	2024-01-11 08:55:48 -08:00
Francisco Kurucz	3724156b4d	Fix load correct tokenizer in Mixtral model documentation (#28437 )	2024-01-10 18:09:06 +01:00
Susnato Dhar	fff8ca8e59	update docs to add the `phi-2` example (#28392 ) * update docs * added Tip	2024-01-10 16:07:47 +01:00

1 2 3 4 5 ...

2369 Commits