transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Leo Tronchon	869733ab62	IDEFICS: allow interpolation of vision's pos embeddings (#26029 ) * add pos embed interpolation for vision encoder * style * update config with interpolate_pos_encoding arg * fix imports formatting * take off copied from on vision embeddings * add test for image embeddings interpolation * add credit for interpolation code * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics/vision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix condition to check nbr image patches match shape of pos embeddings * use kwargs in the forward methods for interpolation * fix tests * have interpolate_pos_encoding default to False instead of None * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * take off for loop meant to print k,v * add interpolate_pos_encoding arg in prepare_inputs_for_generation * add test for interpolated generation * fix edge case num_patches == num_positions and height == width * add test for edge case * fix pos_embed in interpolate * allow interpolation in bf16 with upcasting * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add multiple images tests for interpolation and generation --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-14 19:27:40 -04:00
NielsRogge	5469c18762	[BLIP-2] Improve conversion script (#24854 ) * Improve conversion script * Add int8 code example * Update tip * Fix code * Fix code snippet * Add nucleus sampling * More improvements * Address comments * Address comments	2023-09-14 19:42:20 +01:00
Jinho Park	17fdd35481	Add BROS (#23190 ) * add Bros boilerplate * copy and pasted modeling_bros.py from official Bros repo * update copyright of bros files * copy tokenization_bros.py from official repo and update import path * copy tokenization_bros_fast.py from official repo and update import path * copy configuration_bros.py from official repo and update import path * remove trailing period in copyright line * copy and paste bros/__init__.py from official repo * save formatting * remove unused unnecessary pe_type argument - using only crel type * resolve import issue * remove unused model classes * remove unnecessary tests * remove unused classes * fix original code's bug - layer_module's argument order * clean up modeling auto * add bbox to prepare_config_and_inputs * set temporary value to hidden_size (32 is too low because of the of the Bros' positional embedding) * remove decoder test, update create_and_check* input arguemnts * add missing variable to model tests * do make fixup * update bros.mdx * add boilerate plate for no_head inference test * update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix) * add prepare_bros_batch_inputs function * update modeling_common to add bbox inputs in Bros Model Test * remove unnecessary model inference * add test case * add model_doc * add test case for token_classification * apply fixup * update modeling code * update BrosForTokenClassification loss calculation logic * revert logits preprocessing logic to make sure logits have original shape * - update class name * - add BrosSpadeOutput - update BrosConfig arguments * add boilerate plate for no_head inference test * add prepare_bros_batch_inputs function * add test case * add test case for token_classification * update modeling code * update BrosForTokenClassification loss calculation logic * revert logits preprocessing logic to make sure logits have original shape * apply masking on the fly * add BrosSpadeForTokenLinking * update class name put docstring to the beginning of the file * separate the logits calculation logic and loss calculation logic * update logic for loss calculation so that logits shape doesn't change when return * update typo * update prepare_config_and_inputs * update dummy node initialization * update last_hidden_states getting logic to consider when return_dict is False * update box first token mask param * bugfix: remove random attention mask generation * update keys to ignore on load missing * run make style and quality * apply make style and quality of other codes * update box_first_token_mask to bool type * update index.md * apply make style and quality * apply make fix-copies * pass check_repo * update bros model doc * docstring bugfix fix * add checkpoint for doc, tokenizer for doc * Update README.md * Update docs/source/en/model_doc/bros.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update bros.md * Update src/transformers/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/bros.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply suggestions from code review * apply suggestions from code review * revert test_processor_markuplm.py * Update test_processor_markuplm.py * apply suggestions from code review * apply suggestions from code review * apply suggestions from code review * update BrosSpadeELForTokenClassification head name to entity linker * add doc string for config params * update class, var names to more explicit and apply suggestions from code review * remove unnecessary keys to ignore * update relation extractor to be initialized with config * add bros processor * apply make style and quality * update bros.md * remove bros tokenizer, add bros processor that wraps bert tokenizer * revert change * apply make fix-copies * update processor code, update itc -> initial token, stc -> subsequent token * add type hint * remove unnecessary condition branches in embedding forward * fix auto tokenizer fail * update docstring for each classes * update bbox input dimension as standard 2 points and convert them to 4 points in forward pass * update bros docs * apply suggestions from code review : update Bros -> BROS in bros.md * 1. box prefix var -> bbox 2. update variable names to be more explicit * replace einsum with torch matmul * apply style and quality * remove unused argument * remove unused arguments * update docstrings * apply suggestions from code review: add BrosBboxEmbeddings, replace einsum with classical matrix operations * revert einsum update * update bros processor * apply suggestions from code review * add conversion script for bros * Apply suggestions from code review * fix readme * apply fix-copies --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-14 18:02:37 +01:00
Joshua Lochner	95fe0f5d80	[Whisper] Fix word-level timestamps for audio < 30 seconds (#25607 ) * Fix word-level timestamps for audio < 30 seconds * Fix code quality * fix unit tests * Fix unit tests * Fix unit test * temp: print out result * temp: set max diff to None * fix unit tests * fix typo * Fix typo Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Use generation config for `num_frames` * fix docs * Move `num_frames` to kwargs * compute stride/attn_mask once * mark test as slow --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>	2023-09-14 17:42:35 +01:00
Sanchit Gandhi	44a0490d3c	[MusicGen] Add sampling rate to config (#26136 ) * [MusicGen] Add sampling rate to config * remove tiny * make property * Update tests/pipelines/test_pipelines_text_to_audio.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-14 16:57:06 +01:00
Dong-Yong Lee	8881f38a4f	Fix beam search when using model parallel (#24969 ) * Fix GPTNeoX beam search when using parallelize * Fix beam search idx device when using model parallel * remove onnx related stuff Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: move test_beam_search_on_multi_gpu to GenerationTesterMixin * fix: add right item to _no_split_modules of MegaPreTrainedModel * fix: add num_beams within parallelized beam_search test Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-14 11:00:52 -04:00
Sanchit Gandhi	0dd06c3f78	[MusicGen] Add streamer to generate (#25320 ) * [MusicGen] Add streamer to generate * add to for cond generation * add test * finish * torch only * fix type hint * yield audio chunks * fix typehint * remove test	2023-09-14 15:59:09 +01:00
Matt	866df66fe4	Overhaul Conversation class and prompt templating (#25323 ) * First commit while I figure this out * make fixup * Remove unused method * Store prompt attrib * Fix prompt argument for tests * Make same changes in fast tokenizer * Remove global prompts from fast tokenizer too * stash commit * stash commit * Migrate PromptConfig to its True Final Location * Replace Conversation entirely with the new class * Import/dependency fixes * Import/dependency fixes * Change format for lots of default prompts * More default prompt fixups * Revert llama old methods so we can compare * Fix some default configs * Fix some default configs * Fix misspelled kwarg * Fixes for Blenderbot * make fixup * little rebase cleanup * Add basic documentation * Quick doc fix * Truncate docstring for now * Add handling for the case when messages is a single string * Quick llama merges * Update conversational pipeline and tests * Add a couple of legacy properties for backward compatibility * More legacy handling * Add docstring for build_conversation_input_ids * Restructure PromptConfig * Let's start T E M P L A T I N G * Refactor all default configs to use templates instead * Revert changes to the special token properties since we don't need them anymore * More class templates * Make the sandbox even sandier * Everything replaced with pure templating * Remove docs for PromptConfig * Add testing and optional requirement boilerplate * Fix imports and make fixup * Fix LLaMA tests and add Conversation docstring * Finally get LLaMA working with the template system * Finally get LLaMA working with the template system * make fixup * make fixup * fmt-off for the long lists of test tokens * Rename method to apply_chat_template for now * Start on documentation * Make chat_template a property that reads through to the default if it's not set * Expand docs * Expand chat templating doc some more * trim/lstrip blocks by default and update doc * Few doc tweaks * rebase cleanup * Clarify docstring * rebase cleanup * rebase cleanup * make fixup * Quick doc edit * Reformat the standard template to match ChatML * Re-add PEFT check * Update docs/source/en/chat_templating.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add apply_chat_template to the tokenizer doc * make fixup * Add doc links * Fix chat links * Fix chat links * Explain system messages in the doc * Add chat template test * Proper save-loading for chat template attribute * Add test skips for layout models * Remove _build_conversation_input_ids, add default_chat_template to code_llama * Make sure all LLaMA models are using the latest template * Remove default_system_prompt block in code_llama because it has no default prompt * Update ConversationPipeline preprocess * Add correct #Copied from links to the default_chat_templates * Remove unneeded type checking line * Add a dummy mark_processsed method * Reorganize Conversation to have *deprecated_kwargs Update chat_templating.md * Quick fix to LLAMA tests * Small doc tweaks * Add proper docstrings and "copied from" statements to all default chat templates * Merge use_default_system_prompt support for code_llama too * Improve clarity around self.chat_template * Docstring fix * Fix blenderbot default template * More doctest fix * Break out some tokenizer kwargs * Update doc to explain default templates * Quick tweaks to tokenizer args * Cleanups for tokenizer args * Add note about cacheing * Quick tweak to the chat-templating doc * Update the LLaMA template with error checking and correct system message embedding * make fixup * make fixup * add requires_jinja * Cleanup to expected output formatting * Add cacheing * Fix typo in llama default template * Update LLaMA tests * Update documentation * Improved legacy handling in the Conversation class * Update Jinja template with proper error handling * Quick bugfix * Proper exception raising * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env * make fixup * rebase cleanup --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-09-14 15:10:34 +01:00
Younes Belkada	7c63e6fc8c	[`PEFT`] Fix PEFT + gradient checkpointing (#25846 ) * fix PEFT + gradient checkpointing * add disable RG * polish tests * fix comment * Revert "fix comment" This reverts commit `b85386f50d`. * final explanations and tests	2023-09-14 13:01:58 +02:00
Sanchit Gandhi	ac957f69cc	[Whisper Tokenizer] Encode timestamps (#26054 ) * [Whisper Tokenizer] Fix tests after adding timestamps * fix s2t tokenizer tests * fix vocab test * backwards comp * fix tests * comment * style * fix last test * fix fast * make faster * move logic to decode * remove skip test * fix decode with offsets * fix special tokens * empty commit to re-trigger ci * use lru cache	2023-09-14 12:00:43 +01:00
Sam Denton	6d49b9dcbf	Fix eval accumulation when `accelerate` > 0.20.3 (#26060 ) As mentioned in: https://github.com/huggingface/transformers/issues/25641 Eval accumulation will never happen with `accelerate > 0.20.3`, so this change ensures that `sync_gradients` is ignored if accelerate is > 0.20.3	2023-09-14 10:57:47 +01:00
Craig Chan	d7bd325b5a	Add missing Maskformer dataclass decorator, add dataclass check in ModelOutput for subclasses (#25638 ) * Add @dataclass to MaskFormerPixelDecoderOutput * Add dataclass check if subclass of ModelOutout * Use unittest assertRaises rather than pytest per contribution doc * Update src/transformers/utils/generic.py per suggested change Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-14 10:30:49 +01:00
Abhilash Majumder	05de038f3d	Flex xpu bug fix (#26135 ) flex gpu bug fix	2023-09-13 21:03:52 +01:00
Maria Khalusova	9709ab116c	[docs] last hidden state vs hidden_states[-1] (#26142 ) * last hidden state clarification * feedback addressed	2023-09-13 14:35:42 -04:00
Serizao	e52f1cb669	Update training_args.py - addition of self.distributed_state when using XPU (#25999 ) * Update training_args.py Missing distributed state so lign 1813-1814 failed because value is undefined * Update training_args.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2023-09-13 19:21:46 +01:00
BakerBunker	0fced06788	Fix `beam_scores` shape when token scores shape changes after `logits_processor` (#25980 )	2023-09-13 19:12:47 +01:00
Joao Gante	a796f7eea6	Falcon: batched generation (#26137 )	2023-09-13 17:00:52 +01:00
Yih-Dar	95a904104e	Fix `test_finetune_bert2bert` (#25984 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-13 16:53:43 +01:00
Joao Gante	86ffef87b6	Generate: ignore warning when `generation_config.max_length` is set to `None` (#26147 )	2023-09-13 16:50:58 +01:00
김준재_T3056	a6ae2bd059	docs: feat: add llama2 notebook resources from OSSCA community (#26076 )	2023-09-13 08:27:41 -07:00
Younes Belkada	7ccac73f74	[`RWKV`] Final fix RWMV 4bit (#26134 ) * Final fix RWMV 4bit * fixup * add a test * add more clarifications	2023-09-13 16:30:20 +02:00
Vaibhav Srivastav	32ec7345f2	Update spectrogram and waveform model mapping for TTS/A pipeline (#26114 ) update names mapping for spectrogram and waveform models	2023-09-13 09:05:11 -04:00
Juarez Bochi	a9b63ca989	Add missing space in generation/utils.py (#26121 ) Add missing space in utils.py Warning now reads as "... to control thegeneration length. We ..."	2023-09-13 13:45:55 +01:00
Younes Belkada	c8b26096d4	[`core`] fix 4bit `num_parameters` (#26132 ) * fix 4bit `num_parameters` * stronger check	2023-09-13 14:12:35 +02:00
amyeroberts	7db1ad63d9	Fix AutoTokenizer docstring typo (#26117 ) Fix docstring typo	2023-09-13 11:12:27 +01:00
Sourab Mangrulkar	b477327394	fix the deepspeed tests (#26021 ) * fix the deepspeed tests * resolve comment	2023-09-13 10:26:53 +05:30
Sourab Mangrulkar	73b13ac099	safeguard torch distributed check (#26056 )	2023-09-13 10:26:37 +05:30
Tanay Mehta	12f043eaea	Fix `MarianTokenizer` to remove metaspace character in `decode` (#26091 ) * add: check to remove metaspace from marian tokenizer * fix: metaspace character being removed from everywhere * fix: remove redundant check at top * add: test for marian tokenizer decode fix * fix: simplified the test	2023-09-12 21:53:31 +02:00
Joao Gante	03e309d58e	Text2text pipeline: don't parameterize from the config (#26118 )	2023-09-12 18:40:45 +01:00
Phuc Van Phan	4fb64e285a	chore: correct update_step and correct gradient_accumulation_steps (#26068 )	2023-09-12 18:31:23 +01:00
Wang, Yi	8f609ab9e0	enable optuna multi-objectives feature (#25969 ) * enable optuna multi-objectives feature Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update hpo doc * update docstring Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * extend direction to List[str] type Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Update src/transformers/integrations/integration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-12 18:01:22 +01:00
MinJae Kang	92f2fbad50	🌐 [i18n-KO] Translated `contributing.md` to Korean (#25877 ) * docs: ko-contributing.md * feat: chatGPT draft * feat: manual edits * feat: change linked document * fix: resolve suggestion Co-authored-by: Haewon Kim <ehdvkf02@naver.com> * fix: resolve suggestion Co-authored-by: Haewon Kim <ehdvkf02@naver.com> * fix: resolve suggestion Co-authored-by: Haewon Kim <ehdvkf02@naver.com> * fix: resolve suggestion Co-authored-by: Haewon Kim <ehdvkf02@naver.com> * fix: resolve suggestion Co-authored-by: Haewon Kim <ehdvkf02@naver.com> * fix: resolve suggestion Co-authored-by: Haewon Kim <ehdvkf02@naver.com> * fix: resolve suggestion Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: resolve suggestion Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: resolve suggestion Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: resolve suggestion * fix: resolve suggestion * feat: delete file to resolve error --------- Co-authored-by: Haewon Kim <ehdvkf02@naver.com> Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>	2023-09-12 08:35:29 -07:00
Maria Khalusova	1fe7ce48f1	[docs] Updates to TTS task guide with regards to the new TTS pipeline (#26095 ) * tts guide updates with a pipeline * Apply suggestions from code review Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update docs/source/en/tasks/text-to-speech.md Co-authored-by: Vaibhav Srivastav <vaibhavs10@gmail.com> --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: Vaibhav Srivastav <vaibhavs10@gmail.com>	2023-09-12 11:29:06 -04:00
MinJae Kang	be9438ed43	🌐 [i18n-KO] Translated `llama2.md` to Korean (#26047 ) * docs: ko-llama2.md * feat: chatGPT draft and manul edits * feat: added inline TOC * fix: inline TOC * fix: resolve suggestions Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: resolve suggestion Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: resolve suggestion Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> --------- Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>	2023-09-12 08:04:26 -07:00
pokjay	6acc27eea8	Fix ExponentialDecayLengthPenalty negative logits issue (#25594 ) * Fix issues in test_exponential_decay_length_penalty Fix tests which were broken and add validation of negative scores. Current test didn't take into account that ExponentialDecayLengthPenalty updates the score inplace, resulting in updates to base tested Tensor. In addition, the gt assert had empty Tensors due to indexing along the batch dimension. Test is currently expected to fail to show ExponentialDecayLengthPenalty issues with negative scores * Fix ExponentialDecayLengthPenalty negative logits issue In cases where the scores are negative, ExponentialDecayLengthPenalty decreases the score of eos_token_id instead of increasing it. To fix this issue we compute the penalty of the absolute value and add it to the original score. * Add examples for ExponentialDecayLengthPenalty * Fix styling issue in ExponentialDecayLengthPenalty doc * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Style and quality fix * Fix example outputs --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-12 12:50:41 +01:00
larekrow	d65c4a4fed	Update logits_process.py docstrings (#25971 )	2023-09-12 12:36:31 +01:00
Joao Gante	3319eb5490	Generate: legacy mode is only triggered when `generation_config` is untouched (#25962 )	2023-09-12 12:08:17 +01:00
Younes Belkada	18abc756c5	[`core`] Import tensorflow inside relevant methods in `trainer_utils` (#26106 ) import tensorflow inside relevant methods in trainer_utils	2023-09-12 11:49:06 +02:00
Arthur	9cccb3a838	[`Persimmon`] Add support for persimmon (#26042 ) * intiial commit * updates * nits * update conversion script * update conversion script * use path to load * add tips etc * some modeling logic * modeling update * more nits * nits * normal layer norm * update config and doc * nits * update doc remove unused * update * fix inits and stuff * fixup * revert wrong changes * updates * more nits * add default config values to the configuration file * fixup happy * update * 2 tests left * update readmes * more nits * slow test and more documentation * update readme * fix licences * styling * use fast if possible when saving tokenizer * remove todo * remove tokenization tests * small last nits * Apply suggestions from code review Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * nits to skip the timout doctest * fix integration test * fix test * update eos token * update to allow fast tokenization * styling * fix codeLlama as well for the update post processor * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add more copied from statements * update * doc passes doctest * remove `# final layer norm?` * change docstring prompot * update * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * don't doctest the conversion script as it requires more packages * don't init a model in the config * oups * fix doctest --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-09-12 11:33:27 +02:00
Phuc Van Phan	5af2c62696	docs: add space to docs (#26067 ) * docs: add space to docs * docs: remove reduntant space	2023-09-11 22:03:26 +01:00
Patrick von Platen	ce2e7ef3d9	[Core] Add lazy import structure to imports (#26090 ) * improve import time * Update src/transformers/integrations/__init__.py * sort import	2023-09-11 17:20:29 +02:00
Phuc Van Phan	9cebae64ad	docs: update link huggingface map (#26077 )	2023-09-11 12:57:04 +01:00
Hang	7fd2d68613	only main process should call _save on deepspeed zero3 (#25959 ) only main process should call _save when deepspeed zero3	2023-09-11 12:56:36 +01:00
Arthur	95b374952d	[`CITests`] skip failing tests until #26054 is merged (#26063 ) * skip failing tests until #26054 is merged * fixup	2023-09-09 05:43:26 +02:00
Arthur	09b2de6eb7	[`CodeLlamaTokenizerFast`] Fix fix `set_infilling_processor` to properly reset (#26041 ) * fix `set_infilling_processor` to properly reset * Add docstring! * fixups * more details in the docuemtation about the tokenization * styl;e	2023-09-08 22:03:09 +02:00
Harheem Kim	d53606031f	🌐 [i18n-KO] Translated `llama.md` to Korean (#26044 ) * docs: ko-llama.md * fix: chatgpt draft * feat: manual edits * fix: resolve suggestions	2023-09-08 12:38:41 -07:00
Angela Yi	6c26faa159	Skip warning if tracing with dynamo (#25581 ) * Ignore warning if tracing with dynamo * fix import error * separate to function * add test	2023-09-08 21:13:33 +02:00
Thien Tran	18ee1fe762	Update missing docs on `activation_dropout` and fix DropOut docs for SEW-D (#26031 ) * add missing doc for activation dropout * fix doc for SEW-D dropout * deprecate hidden_dropout for SEW-D	2023-09-08 14:51:54 +01:00
Alexander Krauck	0c67a72c9a	Fix Dropout Implementation in Graphormer (#24817 ) This commit corrects the dropout implementation in Graphormer, aligning it with the original implementation and improving performance. Specifically: 1. The `attention_dropout` variable, intended for use in GraphormerMultiheadAttention, was defined but not used. This has been corrected to use `attention_dropout` instead of the regular `dropout`. 2. The `activation_dropout` for the activations in the feed-forward layers was missing. Instead, the regular `dropout` was used. This commit adds `activation_dropout` to the feed-forward layers. These changes ensure the dropout implementation matches the original Graphormer and delivers empirically better performance.	2023-09-08 12:49:39 +01:00
dumpmemory	fb7d246951	Try to fix training Loss inconsistent after resume from old checkpoint (#25872 ) * fix loss inconsistent after resume #25340 * fix typo * clean code * reformatted code * adjust code according to comments * adjust check_dataloader_randomsampler location * return sampler only * handle sampler is None * Update src/transformers/trainer_pt_utils.py thanks @amyeroberts Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-07 20:00:22 +01:00

1 2 3 4 5 ...

13982 Commits