* First commit while I figure this out
* make fixup
* Remove unused method
* Store prompt attrib
* Fix prompt argument for tests
* Make same changes in fast tokenizer
* Remove global prompts from fast tokenizer too
* stash commit
* stash commit
* Migrate PromptConfig to its True Final Location
* Replace Conversation entirely with the new class
* Import/dependency fixes
* Import/dependency fixes
* Change format for lots of default prompts
* More default prompt fixups
* Revert llama old methods so we can compare
* Fix some default configs
* Fix some default configs
* Fix misspelled kwarg
* Fixes for Blenderbot
* make fixup
* little rebase cleanup
* Add basic documentation
* Quick doc fix
* Truncate docstring for now
* Add handling for the case when messages is a single string
* Quick llama merges
* Update conversational pipeline and tests
* Add a couple of legacy properties for backward compatibility
* More legacy handling
* Add docstring for build_conversation_input_ids
* Restructure PromptConfig
* Let's start T E M P L A T I N G
* Refactor all default configs to use templates instead
* Revert changes to the special token properties since we don't need them anymore
* More class templates
* Make the sandbox even sandier
* Everything replaced with pure templating
* Remove docs for PromptConfig
* Add testing and optional requirement boilerplate
* Fix imports and make fixup
* Fix LLaMA tests and add Conversation docstring
* Finally get LLaMA working with the template system
* Finally get LLaMA working with the template system
* make fixup
* make fixup
* fmt-off for the long lists of test tokens
* Rename method to apply_chat_template for now
* Start on documentation
* Make chat_template a property that reads through to the default if it's not set
* Expand docs
* Expand chat templating doc some more
* trim/lstrip blocks by default and update doc
* Few doc tweaks
* rebase cleanup
* Clarify docstring
* rebase cleanup
* rebase cleanup
* make fixup
* Quick doc edit
* Reformat the standard template to match ChatML
* Re-add PEFT check
* Update docs/source/en/chat_templating.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add apply_chat_template to the tokenizer doc
* make fixup
* Add doc links
* Fix chat links
* Fix chat links
* Explain system messages in the doc
* Add chat template test
* Proper save-loading for chat template attribute
* Add test skips for layout models
* Remove _build_conversation_input_ids, add default_chat_template to code_llama
* Make sure all LLaMA models are using the latest template
* Remove default_system_prompt block in code_llama because it has no default prompt
* Update ConversationPipeline preprocess
* Add correct #Copied from links to the default_chat_templates
* Remove unneeded type checking line
* Add a dummy mark_processsed method
* Reorganize Conversation to have **deprecated_kwargs
* Update chat_templating.md
* Quick fix to LLAMA tests
* Small doc tweaks
* Add proper docstrings and "copied from" statements to all default chat templates
* Merge use_default_system_prompt support for code_llama too
* Improve clarity around self.chat_template
* Docstring fix
* Fix blenderbot default template
* More doctest fix
* Break out some tokenizer kwargs
* Update doc to explain default templates
* Quick tweaks to tokenizer args
* Cleanups for tokenizer args
* Add note about cacheing
* Quick tweak to the chat-templating doc
* Update the LLaMA template with error checking and correct system message embedding
* make fixup
* make fixup
* add requires_jinja
* Cleanup to expected output formatting
* Add cacheing
* Fix typo in llama default template
* Update LLaMA tests
* Update documentation
* Improved legacy handling in the Conversation class
* Update Jinja template with proper error handling
* Quick bugfix
* Proper exception raising
* Change cacheing behaviour so it doesn't try to pickle an entire Jinja env
* make fixup
* rebase cleanup
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Whisper Tokenizer] Fix tests after adding timestamps
* fix s2t tokenizer tests
* fix vocab test
* backwards comp
* fix tests
* comment
* style
* fix last test
* fix fast
* make faster
* move logic to decode
* remove skip test
* fix decode with offsets
* fix special tokens
* empty commit to re-trigger ci
* use lru cache
* Add @dataclass to MaskFormerPixelDecoderOutput
* Add dataclass check if subclass of ModelOutout
* Use unittest assertRaises rather than pytest per contribution doc
* Update src/transformers/utils/generic.py per suggested change
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add: check to remove metaspace from marian tokenizer
* fix: metaspace character being removed from everywhere
* fix: remove redundant check at top
* add: test for marian tokenizer decode fix
* fix: simplified the test
* enable optuna multi-objectives feature
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update hpo doc
* update docstring
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* extend direction to List[str] type
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix issues in test_exponential_decay_length_penalty
Fix tests which were broken and add validation of negative scores.
Current test didn't take into account that ExponentialDecayLengthPenalty updates the score inplace, resulting in updates to base tested Tensor.
In addition, the gt assert had empty Tensors due to indexing along the batch dimension.
Test is currently expected to fail to show ExponentialDecayLengthPenalty issues with negative scores
* Fix ExponentialDecayLengthPenalty negative logits issue
In cases where the scores are negative, ExponentialDecayLengthPenalty decreases the score of eos_token_id instead of increasing it.
To fix this issue we compute the penalty of the absolute value and add it to the original score.
* Add examples for ExponentialDecayLengthPenalty
* Fix styling issue in ExponentialDecayLengthPenalty doc
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Style and quality fix
* Fix example outputs
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* intiial commit
* updates
* nits
* update conversion script
* update conversion script
* use path to load
* add tips etc
* some modeling logic
* modeling update
* more nits
* nits
* normal layer norm
* update config and doc
* nits
* update doc remove unused
* update
* fix inits and stuff
* fixup
* revert wrong changes
* updates
* more nits
* add default config values to the configuration file
* fixup happy
* update
* 2 tests left
* update readmes
* more nits
* slow test and more documentation
* update readme
* fix licences
* styling
* use fast if possible when saving tokenizer
* remove todo
* remove tokenization tests
* small last nits
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* nits to skip the timout doctest
* fix integration test
* fix test
* update eos token
* update to allow fast tokenization
* styling
* fix codeLlama as well for the update post processor
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more copied from statements
* update
* doc passes doctest
* remove `# final layer norm?`
* change docstring prompot
* update
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* don't doctest the conversion script as it requires more packages
* don't init a model in the config
* oups
* fix doctest
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit corrects the dropout implementation in Graphormer, aligning it with the original implementation and improving performance. Specifically:
1. The `attention_dropout` variable, intended for use in GraphormerMultiheadAttention, was defined but not used. This has been corrected to use `attention_dropout` instead of the regular `dropout`.
2. The `activation_dropout` for the activations in the feed-forward layers was missing. Instead, the regular `dropout` was used. This commit adds `activation_dropout` to the feed-forward layers.
These changes ensure the dropout implementation matches the original Graphormer and delivers empirically better performance.
* Added HerBERT to README.md
* Update README.md to contain HerBERT (#26016)
* Resolved#26016: Updated READMEs and index.md to contain Herbert
Updated READMEs and ran make fix-copies