* Add FlaxClipTextModelWithProjection
This is necessary to support the Flax port of Stable Diffusion XL: fb6d705fb5/text_encoder_2/config.json (L3)
Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>
* Use FlaxCLIPTextModelOutput
* make fix-copies again
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Use `return_dict` for consistency with other uses.
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Fix docstring example.
* Add new model to FlaxCLIPTextModelTest
* Add to IGNORE_NON_AUTO_CONFIGURED list
* Fix naming convention.
---------
Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* properly support Sequence of pretokenizers
* actual fix
* make sure the fix works. Tests are not working for sure!
* hacky way
* add TODO
* update
* add a todo
* nits
* rename test
* nits
* rename test
* add: NumberNormalizer works for integers, floats, common currencies, negative numbers and percentages
* fix: renamed number normalizer class and added normalization to SpeechT5Processor
* fix: restyled with black and ruff, should pass code quality tests
* fix: moved normalization to tokenizer and other small changes to normalizer
* add: test for normalization and changed the existing full tokenizer test
* fix: tokenization tests now pass, made changes to existing tokenization where normalization is covered; added normalize arg to func signature
* fix: changed default normalize setting to False, modified the tests a bit
* fix: added support for comma separated numbers, tokenization on the fly with kwargs and normalizer getter setter funcs
* init commit
* config updated also some modeling
* Processor and Model config combined
* extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested
* model loading successful!
* feature extractor done!
* FE can now be called from HF
* postprocessing added in fe file
* same as prev commit
* Pop2PianoConfig doc done
* cfg docs slightly changed
* fe docs done
* batched
* batched working!
* temp
* v1
* checking
* trying to go with generate
* with generate and model tests passed
* before rebasing
* .
* tests done docs done remaining others & nits
* nits
* LogMelSpectogram shifted to FeatureExtractor
* is_tf rmeoved from pop2piano/init
* import solved
* tokenization tests added
* minor fixed regarding modeling_pop2piano
* tokenizer changed to only return midi_object and other changes
* Updated paper abstract(Camera-ready version) (#2)
* more comments and nits
* ruff changes
* code quality fix
* sg comments
* t5 change added and rebased
* comments except batching
* batching done
* comments
* small doc fix
* example removed from modeling
* ckpt
* forward it compatible with fe and generation done
* comments
* comments
* code-quality fix(maybe)
* ckpts changed
* doc file changed from mdx to md
* test fixes
* tokenizer test fix
* changes
* nits done main changes remaining
* code modified
* Pop2PianoProcessor added with tests
* other comments
* added Pop2PianoProcessor to dummy_objects
* added require_onnx to modeling file
* changes
* update .md file
* remove extra line in index.md
* back to the main index
* added pop2piano to index
* Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too
* changes
* added return types to 2 tokenizer methods
* the PR build test might work now
* added backends
* PR build fix
* vocab added
* comments
* refactored vocab into 1 file
* added conversion script
* comments
* essentia version changed in .md
* comments
* more tokenizer tests added
* minor fix
* tests extended for outputs acc check
* small fix
---------
Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>
* a draft version
* v2 integration
* fix
* make it more generic and works for IA3
* add set adapter and multiple adapters support
* fixup
* adapt a bit
* oops
* oops
* oops
* adapt more
* fix
* add more refactor
* now works with model class
* change it to instance method as it causes issues with `jit`.
* add CR
* change method name
* add `add_adapter` method
* clean up
* Update src/transformers/adapters/peft_mixin.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add moe utils
* fixup
* Update src/transformers/adapters/peft_mixin.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* adapt
* oops
* fixup
* add is_peft_available
* remove `requires_backend`
* trainer compatibility
* fixup + docstring
* more details
* trigger CI
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
* fixup + is_main_process
* added `save_peft_format` in save_pretrained
* up
* fix nits here and there
* nits here and there.
* docs
* revert `encoding="utf-8"`
* comment
* added slow tests before the PEFT release.
* fixup and nits
* let's be on the safe zone
* added more comments
* v1 docs
* add remaining docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* move to `lib_integrations`
* fixup
* this time fixup
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* address final comments
* refactor to use `token`
* add PEFT to DockerFile for slow tests.
* added pipeline support.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* draft changes
* update and add tests
* styling for no
* move test
* path to usable model
* update test
* small update
* update bertbased tokenizers
* don'tuse kwargs for _tokenize
* don'tuse kwargs for _tokenize
* fix copies
* update
* update test for special tokenizers
* fixup
* skip two tests
* remove pdb breakpiont()
* wowo
* rewrite custom tests
* nits
* revert chang in target keys
* fix markup lm
* update documentation of the argument
* Replaces calls to `.cuda` with `.to(torch_device)` in tests
`torch.Tensor.cuda()` is a pre-0.4 solution to changing a tensor's device. It is recommended to prefer `.to(...)` for greater flexibility and error handling. Furthermore, this makes it more consistent with other tests (that tend to use `.to(torch_device)`) and ensures the correct device backend is used (if `torch_device` is neither `cpu` or `cuda`).
* addressing review comments
* more formatting changes in Bloom test
* `make style`
* Update tests/models/bloom/test_modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixes style failures
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add AutoModelForTextToSpeech class
* add TTS pipeline and tessting
* add docstrings to text_to_speech pipeline
* fix torch dependency
* corrector 'processor is None' case in Pipeline
* correct repo id
* modify text-to-speech -> text-to-audio
* remove processor
* rename text_to_speech pipelines files to text_audio
* add textToWaveform and textToSpectrogram instead of textToAudio classes
* update TTS pipeline to the bare minimum
* update tests TTS pipeline
* make style and erase useless import torch in TTS pipeline tests
* modify how to check if generate or forward in TTS pipeline
* remove unnecessary extra new lines
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* refactor input_texts -> text_inputs
* correct docstrings of TTS.__call__
* correct the shape of generated waveform
* take care of Bark tokenizer special case
* correct run_pipeline_test TTS
* make style
* update TTS docstrings
* address Sylvain nit refactors
* make style
* refactor into one liners
* correct squeeze
* correct way to test if forward or generate
* Update output audio waveform shape
* make style
* correct import
* modify how the TTS pipeline test if a model can generate
* align shape output of TTS pipeline with consistent shape
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fix EVERYTHING
* more fixes
* ⚗️⚗️ Tokenizer magic ⚗️⚗️
* wrong value but test passes for the TODO
* update
* updat
* safe protobuf import?
* style
* non gated repo
* update
* fixup
* Update src/transformers/models/llama/tokenization_llama.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llama/tokenization_llama.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/t5/test_tokenization_t5.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* nits
* fix t5 too
* use assert equal
* fix llama decoding
* nits on t5
* fixup
* only remove the prefix space, not other spaces
* more deconding tests and more todos
* fix CI as well
* fixup
* skip failing test on CI (its tf its ok)
* skip test_subword_regularization_tokenizer that is also crashing on the CI for TF
* update llama
* revert good fixes
* fixup
* empty
* explain why we need to encode with an additional token
* better warning?
* nits
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix
* revert cahnges and update resizing of embedding layer
* use wraning
* fixup
* more styling nits
* fix all tests that overload the embedding tests
* 👀👀 remove breakpoint
* remove useless overload + overload correctly where needed
* resize lm head with new vocab size
* reverse not necessary changes
* style
* fix CIs!
* fix last CI tests, adapt bark and Marian
* fixup
* [ASR Pipeline] Fix init
* refactor test
* change default kwarg setting
* only perform checks if we have to
* override init
* move pre/forward/post checks to sanitize
* Add copied from statements for image processors
* Move out rescale and normalize to base image processor
* Remove rescale and normalize from vit (post rebase)
* Update docstrings and tidy up
* PR comments
* Add input_data_format as preprocess argument
* Resolve tests and tidy up
* Remove num_channels argument
* Update doc strings -> default ints not in code formatting
* Make training args fully immutable
* Working tests, PyTorch
* In test_trainer
* during testing
* Use proper dataclass way
* Fix test
* Another one
* Fix tf
* Lingering slow
* Exception
* Clean
* Refactor image processor test mixin
- Move test_call_numpy, test_call_pytorch, test_call_pil to mixin
- Rename mixin to reflect handling of logic more than saving
- Add prepare_image_inputs, expected_image_outputs for tests
* Fix for oneformer
* Register ModelOutput subclasses as supported torch.utils._pytree nodes
Fixes#25357 where DDP with static_graph=True does not sync gradients when calling backward() over tensors contained in ModelOutput subclasses
* Add test for torch pytree ModelOutput serialization and deserialization
* Deal better with nested configs
* Fixes
* More fixes
* Fix last test
* Clean up existing configs
* Remove hack in MPT Config
* Update src/transformers/configuration_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Fix setting a nested config via dict in the kwargs
* Adapt common test
* Add test for nested config load with dict
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update InstructBLIP values
Note: the tests are not independent. Running the test independentely produces different logits compared to running all the integration tests
* Update test values after rescale update
* Remove left over commented out code
* Revert to previous rescaling logic
* Update rescale tests
* Fix rescaling bug
* Add tests
* Update integration tests
* Fix up
* Update src/transformers/image_transforms.py
* Update test - new possible order in list
* Initial addition of t5forsequenceclassification
* Adding imports and adding tests
* Formatting
* Running make fix-copies
* Adding mt5forseq
* Formatting
* run make fix-copies
* Adding to docs
* Add model_parallel
* Fix bug
* Fix
* Remove TODO
* Fixing tests for T5ForSequenceClassification
* Undo changes to dependency_versions_table.py
* Change classification head to work with T5Config directly
* Change seq length to let tests pass
* PR comments for formatting
* Formatting
* Initial addition of UMT5ForSequenceClassification
* Adding to inits and formatting
* run make fix-copies
* Add doc for UMT5ForSeqClass
* Update UMT5 config
* Fix docs
* Skip torch fx test for SequenceClassification
* Formatting
* Add skip to UMT5 tests as well
* Fix umt5 tests
* Running make fix-copies
* PR comments
* Fix for change to sentence_representation
* Rename seq_len to hidden_size since that's what it is
* Use base_model to follow format of the rest of the library
* Update docs
* Extract the decoder_input_ids changes and make one liner
* Make one-liner
* pull and push updates
* add docs
* fix modeling
* Add and run test
* make copies
* add task
* fix tests and fix small issues
* Checks on a Pull Request
* fix docs
* add desc pvt.md
* Resolve typo in check_repo.py
* Specify encoding when opening modeling files
* Deprecate the OpenLlama architecture
* Add disclaimer pointing to Llama
I'm open to different wordings here
* Match the capitalisation of LLaMA
* add llama
* add other readmes
* update padding id in readme
* add link to paper
* fix paths and tokenizer
* more nits
* styling
* fit operation in 2 lines when possible
* nits
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add form
* update reademe
* update readme, we don't have a default pad token
* update test and tokenization
* LLaMA instead of Llama
* nits
* add expected text
* add greeedy output
* styling
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* sequential device map
* skip relevant changes
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* first raw version of the bark integration
* working code on small models with single run
* add converting script from suno weights 2 hf
* many changes
* correct past_kv output
* working implementation for inference
* update the converting script according to the architecture changes
* add a working end-to-end inference code
* remove some comments and make small changes
* remove unecessary comment
* add docstrings and ensure no unecessary intermediary output during audio generation
* remove done TODOs
* make style + add config docstrings
* modification for batch inference support on the whole model
* add details to .generation_audio method
* add copyright
* convert EncodecModel from original library to transformers implementation
* add two class in order to facilitate model and sub-models loading from the hub
* add support of loading the whole model
* add BarkProcessor
* correct modeling according to processor output
* Add proper __init__ and auto support
* Add up-to-date copyright/license message
* add relative import instead of absolute
* cleaner head_dim computation
* small comment removal or changes
* more verbose LayerNorm init method
* specify eps for clearer comprehension
* more verbose variable naming in the MLP module
* remove unecessary BarkBlock parameter
* clearer code in the forward pass of the BarkBlock
* remove _initialize_modules method for cleaner code
* Remove unnecessary methods from sub-models
* move code to remove unnecessary function
* rename a variable for clarity and change an assert
* move code and change variable name for clarity
* remove unnecessary asserts
* correct small bug
* correct a comment
* change variable names for clarity
* remove asserts
* change import from absolute to relative
* correct small error due to comma missing + correct import
* Add attribute Bark config
* add first version of tests
* update attention_map
* add tie_weights and resize_token_embeddings for fineModel
* correct getting attention_mask in generate_text_semantic
* remove Bark inference trick
* leave more choices in barkProcessor
* remove _no_split_modules
* fixe error in forward of block and introduce clearer notations
* correct converting script with last changes
* make style + add draft bark.mdx
* correct BarkModelTest::test_generate_text_semantic
* add Bark in main README
* add dummy_pt_objects for Bark
* add missing models in the main init
* correct test_decoder_model_past_with_large_inputs
* disable torchscript test
* change docstring of BarkProcessor
* Add test_processor_bark
* make style
* correct copyrights
* add bark.mdx + make style, quality and consistency
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Remove unnecessary test method
* simply logic of a test
* Only check first ids for slow audio generation
* split full end-to-end generation tests
* remove unneccessary comment
* change submodel names for clearer naming
* remove ModuleDict from modeling_bark
* combine two if statements
* ensure that an edge misued won't happen
* modify variable name
* move code snippet to the right place (coarse instead of semantic)
* change BarkSemanticModule -> BarkSemanticModel
* align BarkProcessor with transformers paradigm
* correct BarkProcessor tests with last commit changes
* change _validate_voice_preset to an instance method instead of a class method
* tie_weights already called with post_init
* add codec_model config to configuration
* update bark modeling tests with recent BarkProcessor changes
* remove SubModelPretrainedModel + change speakers embeddings prompt type in BarkModel
* change absolute imports to relative
* remove TODO
* change docstrings
* add examples to docs and docstrings
* make style
* uses BatchFeature in BarkProcessor insteads of dict
* continue improving docstrings and docs + make style
* correct docstrings examples
* more comprehensible speaker_embeddings load/Save
* rename speaker_embeddings_dict -> speaker_embeddings
* correct bark.mdx + add bark to documentation_tests
* correct docstrings configuration_bark
* integrate last nit suggestions
* integrate BarkGeneration configs
* make style
* remove bark tests from documentation_tests.txt because timeout - tested manually
* add proper generation config initialization
* small bark.mdx documentation changes
* rename bark.mdx -> bark.md
* add torch.no_grad behind BarkModel.generate_audio()
* replace assert by ValueError in convert_suno_to_hf.py
* integrate a series of short comments from reviewer
* move SemanticLogitsProcessors and remove .detach() from Bark docs and docstrings
* actually remove SemanticLogitsProcessor from modeling_bark.oy
* BarkProcessor returns a single output instead of tuple + correct docstrings
* make style + correct bug
* add initializer_range to BarkConfig + correct slow modeling tests
* add .clone() to history_prompt.coarse_prompt to avoid modifying input array
* Making sure no extra "`" are present
* remove extra characters in modeling_bark.py
* Correct output if history_prompt is None
* remove TODOs
* remove ravel comment
* completing generation_configuration_bark.py docstrings
* change docstrings - number of audio codebooks instead of Encodec codebooks
* change 'bias' docstrings in configuration_bark.py
* format code
* rename BarkModel.generate_audio -> BarkModel.generate_speech
* modify AutoConfig instead of EncodecConfig in BarkConfig
* correct AutoConfig wrong init
* refactor BarkModel and sub-models generate_coarse, generate_fine, generate_text_semantic
* remove SemanticLogitsProcessor and replace it with SuppressTokensLogitsProcessor
* move nb_codebook related config arguments to BarkFineConfig
* rename bark.mdx -> bark.md
* correcting BarkModelConfig from_pretrained + remove keys_to_ignore
* correct bark.md with correct hub path
* correct code bug in bark.md
* correct list tokens_to_suppress
* modify Processor to load nested speaker embeddings in a safer way
* correct batch sampling in BarkFineModel.generate_fine
* Apply suggestions from code review
Small docstrings correction and code improvements
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* give more details about num_layers in docstrings
* correct indentation mistake
* correct submodelconfig order of docstring variables
* put audio models in alphabetical order in utils/check_repo.my
* remove useless line from test_modeling_bark.py
* makes BarkCoarseModelTest inherits from (ModelTesterMixin, GenerationTesterMixin, unittest.TestCase) instead of BarkSemanticModelTest
* make a Tester class for each sub-model instead of inheriting
* add test_resize_embeddings=True for Bark sub-models
* add Copied from transformers.models.gpt_neo.modeling_gpt_neo.GPTNeoSelfAttention._split_heads
* remove 'Copied fom Bark' comment
* remove unneccessary comment
* change np.min -> min in modeling_bark.py
* refactored all custom layers to have Bark prefix
* add attention_mask as an argument of generate_text_semantic
* refactor sub-models start docstrings to have more precise config class definition
* move _tied_weights_keys overriding
* add docstrings to generate_xxx in modeling_bark.py
* add loading whole BarkModel to convert_suno_to_hf
* refactor attribute and variable names
* make style convert_suno
* update bark checkpoints
* remove never entered if statement
* move bark_modeling docstrings after BarkPretrainedModel class definition
* refactor modeling_bark.py: kv -> key_values
* small nits - code refactoring and removing unecessary lines from _init_weights
* nits - replace inplace method by variable assigning
* remove *optional* when necessary
* remove some lines in generate_speech
* add default value for optional parameter
* Refactor preprocess_histories_before_coarse -> preprocess_histories
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct usage after refactoring
* refactor Bark's generate_xxx -> generate and modify docstrings and tests accordingly
* update docstrings python in configuration_bark.py
* add bark files in utils/documentation_test.txt
* correct docstrings python snippet
* add the ability to use parameters in the form of e.g coarse_temperature
* add semantic_max_new_tokens in python snippet in docstrings for quicker generation
* Reformate sub-models kwargs in BakModel.generate
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct kwargs in BarkModel.generate
* correct attention_mask kwarg in BarkModel.generate
* add tests for sub-models args in BarkModel.generate and correct BarkFineModel.test_generate_fp16
* enrich BarkModel.generate docstrings with a description of how to use the kwargs
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* dim, and rm copy
* Don't rm copy for now
* Oops
* pad index
* Should be a working test
* Tickle down ddp timeout
* Put fix back in now that testing locally is done
* Better comment specifying timeout
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix: Apostraphe splitting in the BasicTokenizer for CLIPTokenizer
* account for apostrophe at start of new word
* remove _run_split_on_punc, use re.findall instead
* remove debugging, make style and quality
* use pattern and punc splitting, repo-consistency will fail
* remove commented out debugging
* adds bool args to BasicTokenizer, remove pattern
* do_split_on_punc default True
* clean stray comments and line breaks
* rebase, repo-consistency
* update to just do punctuation split
* add unicode normalizing back
* remove redundant line
* Initial commit
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Cleanup config docstring
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Convert to relative imports
* Remove torch < 1.8 warning
* Restructure cos_sin header
* qkv -> query, key, value
* Refactor attention calculation
* Add a couple of config variables to account for the different checkpoints
* Successful merging of the code paths!
* Fix misplaced line in the non-parallel attention path
* Update config and tests
* Add a pad_token_id when testing
* Support output_attentions when alibi is None
* make fixup
* Skip KV cache shape test
* No more _keys_to_ignore_on_load_missing
* Simplify self attention a bit
* Simplify self attention a bit
* make fixup
* stash commit
* Some more attention mask updates
* Should pass all tests except assisted generation!
* Add big model generation test
* make fixup
* Add temporary workaround for test
* Test overrides for assisted generation
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Test overrides for assisted generation
* Add generation demo
* Update copyright
* Make the docstring model actually small
* Add module-level docstring
* Remove all assertions
* Add copied from bloom
* Reformat the QKV layer
* Add copied from bloom
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove unused line and reformat
* No single letter variables
* Cleanup return names
* Add copied from line
* Remove the deprecated arguments blocks
* Change the embeddings test to an alibi on/off test
* Remove position_ids from FalconForQA
* Remove old check for token type IDs
* Fix the alibi path when multi_query is False
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update config naming
* Fix typo for new_decoder_architecture
* Add some comments
* Fix docstring
* Fix docstring
* Create range in the right dtype from the start
* Review comment cleanup
* n_head_kv -> num_kv_heads
* self.alibi -> self.use_alibi
* self.num_kv -> self.num_kv_heads
* Reorder config args
* Made alibi arguments Optional
* Add all model docstrings
* Add extra checkpoints
* Add author info for Falcon
* Stop removing token_type_ids because our checkpoints shouldn't return it anymore
* Add one hopeful comment for the future
* Fix typo
* Update tests, fix cache issue for generation
* Use -1e9 instead of -inf to avoid float overflow
* Recompute the rotary embeddings much less often
* Re-enable disabled tests
* One final fix to attention mask calculation, and update tests
* Cleanup targeting falcon-40b equivalency
* Post-rebase docs update
* Update docstrings, especially in the config
* More descriptive variable names, and comments where we can't rename them
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* hidden layers, huh, what are they good for (absolutely nothing)
* Some tests break with 1 hidden layer, use 2
* Use 1 hidden layer in a few slow models
* Use num_hidden_layers=2 everywhere
* Slightly higher tol for groupvit
* Slightly higher tol for groupvit
* Adding warning messages to BERT for missing attention masks
These warning messages when there are pad tokens within the input ids and
no attention masks are given. The warning message should only show up once.
* Adding warning messages to BERT for missing attention masks
These warning messages are shown when the pad_token_id is not None
and no attention masks are given. The warning message should only
show up once.
* Ran fix copies to copy over the changes to some of the other models
* Add logger.warning_once.cache_clear() to the test
* Shows warning when there are no attention masks and input_ids start/end with pad tokens
* Using warning_once() instead and fix indexing in input_ids check
---------
Co-authored-by: JB Lau <hckyn@voyager2.local>
* don't add space before single letter chars that don't have a merge
* fix the fix
* fixup
* add a test
* more testing
* fixup
* hack to make sure fast is also fixed
* update switch transformers test
* revert convert slow
* Update src/transformers/models/t5/tokenization_t5.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add typechecking
* quality
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Preliminary work on some models
* Fix test load missing and make sure nonpersistent buffers are tested
* Always ignore nonpersistent buffers if in state_dict
* Treat models
* More models
* Treat remaining models
* Fix quality
* Fix tests
* Remove draft
* This test is not needed anymore
* Fix copies
* Fix last test
* Newly added models
* Fix last tests
* Address review comments
* Fix TypeError: Object of type int64 is not JSON serializable
* Convert numpy.float64 and numpy.int64 to float and int for json serialization
* Black reformatted examples/pytorch/token-classification/run_ner_no_trainer.py
* * make style
* Squash 88 commits
* Use markdown
* Remove mdx files due to bad rebase
* Fix modeling files due to bad rebase
* Fix style
* Update comment
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Allow dict input for audio classification pipeline
* make style
* Empty commit to trigger CI
* Empty commit to trigger CI
* check for torchaudio
* add pip instructions
Co-authored-by: Sylvain <sylvain.gugger@gmail.com>
* Update src/transformers/pipelines/audio_classification.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* asr -> audio class
* asr -> audio class
---------
Co-authored-by: Sylvain <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Replace python random with torch.rand to enable dynamo.export
* revert changes to flax model code
* Remove unused random import
* Fix torch template
* Move torch.manual_seed(0) to right location
* Refactor hyperparameter search backends
* Simpler refactoring without abstract base class
* black
* review comments:
specify name in class
use methods instead of callable class attributes
name constant better
* review comments: safer bool checking, log multiple available backends
* test ALL_HYPERPARAMETER_SEARCH_BACKENDS vs HPSearchBackend in unit test, not module. format with black.
* copyright
* let's go!
* initial implementation of token-level timestamps
* only return a single timestamp per token
* remove token probabilities
* fix return type
* fix doc comment
* strip special tokens
* rename
* revert to not stripping special tokens
* only support models that have alignment_heads
* add integration test
* consistently name it token-level timestamps
* small DTW tweak
* initial support for ASR pipeline
* fix pipeline doc comments
* resolve token timestamps in pipeline with chunking
* change warning when no final timestamp is found
* return word-level timestamps
* fixup
* fix bug that skipped final word in each chunk
* fix failing unit tests
* merge punctuations into the words
* also return word tokens
* also return token indices
* add (failing) unit test for combine_tokens_into_words
* make combine_tokens_into_words private
* restore OpenAI's punctuation rules
* add pipeline tests
* make requested changes
* PR review changes
* fix failing pipeline test
* small stuff from PR
* only return words and their timestamps, not segments
* move alignment_heads into generation config
* forgot to set alignment_heads in pipeline tests
* tiny comment fix
* grr
* Fix saved_model_creation_extended
* Skip the BLIP model creation test for now
* Fix TF SAM test
* Fix longformer tests
* Fix Wav2Vec2
* Add a skip for XLNet
* make fixup
* make fix-copies
* Add comments
* Add test for proper input signatures
* No more signature pruning
* Test the dummy inputs are valid too
* fine-tine -> fine-tune
* Fix indent in test_dataset_conversion
* Use tied weight keys
* More
* Fix tied weight missing warning
* Only give info on unexpected keys with different classes
* Deal with empty archs
* Fix tests
* Refine test
* Fix one BLIP arg not being optional, remove misspelled arg
* Remove the lxmert test overrides and just use the base test_saved_model_creation
* saved_model_creation fixes and re-enabling tests across the board
* Remove unnecessary skip
* Stop caching sinusoidal embeddings in speech_to_text
* Fix transfo_xl compilation
* Fix transfo_xl compilation
* Fix the conditionals in xglm
* Set the save spec only when building
* Clarify comment
* Move comment correctly
* Correct embeddings generation for speech2text
* Mark RAG generation tests as @slow
* Remove redundant else:
* Add comment to clarify the save_spec line in build()
* Fix size tests for XGLM at last!
* make fixup
* Remove one band_part operation
* Mark test_keras_fit as @slow
* Revert whisper change and modify the test_compile_tf_model test
* make fixup
* Tweak test slightly
* Add functional model saving to test
* Ensure TF can infer shapes for data2vec
* Add override for efficientformer
* Mark test as slow
* Stop storing references to bound methods in tf.functions
* Remove the gc.collect calls now that we resolved the underlying problem
* Remove the default signature from model.serving entirely, big cleanup
* Remove _prune_signature as self.input_signature can prune itself
* Restore serving docstring
* Update int support test to check the input signature
* Make sure other tests also use model.input_signature and not serving.input_signature
* Restore _prune_signature
* Remove the doctest GC now it's no longer needed
* Correct core tests to use the pruned sig
* order lines correctly in core tests
* Add eager_serving back with a deprecation warning
* First test
* Add info for all models
* style
* Repo consistency
* Fix last model and cleanup prints
* Repo consistency
* Use consistent function for detecting tied weights
* Fix model load when it has both code on the Hub and locally
* Add input check with timeout
* Add tests
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Some non-saved stuff
* Add feature extractors
* Add image processor
* Add model
* Add processor and tokenizer
* Reduce timeout
---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* A fun new PR where I break the entire codebase again
* A fun new PR where I break the entire codebase again
* Handle cross-attention
* Move calls to model(model.dummy_inputs) to the new build() method
* Seeing what fails with the build context thing
* make fix-copies
* Let's see what fails with new build methods
* Fix the pytorch crossload build calls
* Fix the overridden build methods in vision_text_dual_encoder
* Make sure all our build methods set self.built or call super().build(), which also sets it
* make fix-copies
* Remove finished TODO
* Tentatively remove unneeded (?) line
* Transpose b in deberta correctly and remove unused threading local
* Get rid of build_with_dummies and all it stands for
* Rollback some changes to TF-PT crossloading
* Correctly call super().build()
* Add test_backbone for convnext
* Add TimmBackbone model
* Add check for backbone type
* Tidying up - config checks
* Update convnextv2
* Tidy up
* Fix indices & clearer comment
* Exceptions for config checks
* Correclty update config for tests
* Safer imports
* Safer safer imports
* Fix where decorators go
* Update import logic and backbone tests
* More import fixes
* Fixup
* Only import all_models if torch available
* Fix kwarg updates in from_pretrained & main rebase
* Tidy up
* Add tests for AutoBackbone
* Tidy up
* Fix import error
* Fix up
* Install nattan in doc_test_job
* Revert back to setting self._out_xxx directly
* Bug fix - out_indices mapping from out_features
* Fix tests
* Dont accept output_loading_info for Timm models
* Set out_xxx and don't remap
* Use smaller checkpoint for test
* Don't remap timm indices - check out_indices based on stage names
* Skip test as it's n/a
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Cleaner imports / spelling is hard
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix for ragged list
* unpin numba
* make style
* np.object -> object
* propagate changes to tokenizer as well
* np.long -> "long"
* revert tokenization changes
* check with tokenization changes
* list/tuple logic
* catch numpy
* catch else case
* clean up
* up
* better check
* trigger ci
* Empty commit to trigger CI
* mixed precision support via accelerate
* fix issues
* fix for the sharded ddp case
* fix flax and tf failing tests
* `refactor the place to create `Accelerator` object
* move ddp prep to accelerate
* fix 😅
* resolving comments
* move fsdp handling to accelerate
* fixex
* fix saving
* shift torch dynamo handling to accelerate
* shift deepspeed integration and save & load utils to accelerate
* fix accelerate launcher support
* oops
* fix 🐛
* save ckpt fix
* Trigger CI
* nasty 🐛😅
* as deepspeed needs grad_acc fixes, transfer grad_acc to accelerate
* make tests happy
* quality ✨
* loss tracked needs to account for grad_acc
* fixing the deepspeed tests
* quality ✨
* 😅😅😅
* tests 😡
* quality ✨
* Trigger CI
* resolve comments and fix the issue with the previous merge from branch
* Trigger CI
* accelerate took over deepspeed integration
---------
Co-authored-by: Stas Bekman <stas@stason.org>
* Add tf code for efficientformer
* Fix return dict bug - return last hidden state after last stage
* Fix corresponding return dict bug
* Override test tol
* Change default values of training to False
* Set training to default False X3
* Rm axis from ln
* Set init in dense projection
* Rm debug stuff
* Make style; all tests pass.
* Modify year to 2023
* Fix attention biases codes
* Update the shape list logic
* Add a batch norm eps config
* Remove extract comments in test files
* Add conditional attn and hidden states return for serving output
* Change channel dim checking logic
* Add exception for withteacher model in training mode
* Revert layer count for now
* Add layer count for conditional layer naming
* Transpose for conv happens only in main layer
* Make tests smaller
* Make style
* Update doc
* Rm from_pt
* Change to actual expect image class label
* Remove stray print in tests
* Update image processor test
* Remove the old serving output logic
* Make style
* Make style
* Complete test
* Let's try autodetecting serving sigs
* Don't clobber existing sigs
* Change shapes for multiplechoice models
* Make default dummy inputs smarter too
* Fix missing f-string
* Let's YOLO a serving output too
* Read __class__.__name__ properly
* Don't just pass naked lists in there and expect it to be okay
* Code cleanup
* Update default serving sig
* Clearer error messages
* Further updates to the default serving output
* make fixup
* Update the serving output a bit more
* Cleanups and renames, raise errors appropriately when we can't infer inputs
* More renames
* we're building in a functional context again, yolo
* import DUMMY_INPUTS from the right place
* import DUMMY_INPUTS from the right place
* Support cross-attention in the dummies
* Support cross-attention in the dummies
* Complete removal of dummy/serving overrides in BERT
* Complete removal of dummy/serving overrides in RoBERTa
* Obliterate lots and lots of serving sig and dummy overrides
* merge type hint changes
* Fix for token_type_ids with vocab_size 1
* Add missing property decorator
* Fix T5 and hopefully some models that take conv inputs
* More signature pruning
* Fix T5's signature
* Fix Wav2Vec2 signature
* Fix LongformerForMultipleChoice input signature
* Fix BLIP and LED
* Better default serving output error handling
* Fix BART dummies
* Fix dummies for cross-attention, esp encoder-decoder models
* Fix visionencoderdecoder signature
* Fix BLIP serving output
* Small tweak to BART dummies
* Cleanup the ugly parameter inspection line that I used in a few places
* committed a breakpoint again
* Move the text_dims check
* Remove blip_text serving_output
* Add decoder_input_ids to the default input sig
* Remove all the manual overrides for encoder-decoder model signatures
* Tweak longformer/led input sigs
* Tweak default serving output
* output.keys() -> output
* make fixup
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Don't forget the imports
* Add the imports to tests too
* make fixup
* Refactor tests that depended on get_type_hints
* Better test refactor
* Fix an old hidden bug in the test_keras_fit input creation code
* Fix for the Deit tests
* Added lion and paged optimizers and made original tests pass.
* Added tests for paged and lion optimizers.
* Added and fixed optimizer tests.
* Style and quality checks.
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
* Added lion and paged optimizers and made original tests pass.
* Added tests for paged and lion optimizers.
* Added and fixed optimizer tests.
* Style and quality checks.
* Initial draft. Some tests fail.
* Fixed dtype bug.
* Fixed bug caused by torch_dtype='auto'.
* All test green for 8-bit and 4-bit layers.
* Added fix for fp32 layer norms and bf16 compute in LLaMA.
* Initial draft. Some tests fail.
* Fixed dtype bug.
* Fixed bug caused by torch_dtype='auto'.
* All test green for 8-bit and 4-bit layers.
* Added lion and paged optimizers and made original tests pass.
* Added tests for paged and lion optimizers.
* Added and fixed optimizer tests.
* Style and quality checks.
* Fixing issues for PR #23479.
* Added fix for fp32 layer norms and bf16 compute in LLaMA.
* Reverted variable name change.
* Initial draft. Some tests fail.
* Fixed dtype bug.
* Fixed bug caused by torch_dtype='auto'.
* All test green for 8-bit and 4-bit layers.
* Added lion and paged optimizers and made original tests pass.
* Added tests for paged and lion optimizers.
* Added and fixed optimizer tests.
* Style and quality checks.
* Added missing tests.
* Fixup changes.
* Added fixup changes.
* Missed some variables to rename.
* revert trainer tests
* revert test trainer
* another revert
* fix tests and safety checkers
* protect import
* simplify a bit
* Update src/transformers/trainer.py
* few fixes
* add warning
* replace with `load_in_kbit = load_in_4bit or load_in_8bit`
* fix test
* fix tests
* this time fix tests
* safety checker
* add docs
* revert torch_dtype
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* multiple fixes
* update docs
* version checks and multiple fixes
* replace `is_loaded_in_kbit`
* replace `load_in_kbit`
* change methods names
* better checks
* oops
* oops
* address final comments
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First commit
* Add auto-translation with GPT-4
* make fixup
* Add a functional layernorm for TF
* Add all the auxiliary imports etc.
* Add the extra processor and tests
* rebase to main
* Add all the needed fixes to the GPT code
* make fixup
* Make convolutions channels-last so they run on CPU
* make fixup
* Fix final issues
* Fix other models affected by test change
* Clarify comment on the sparse_prompt_embeddings check
* Refactor functional_layernorm, use shape_list in place of .shape in some places
* Remove deprecated torch-alike code
* Update tests/models/sam/test_modeling_tf_sam.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/sam/test_modeling_tf_sam.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor processor with common methods and separated private methods
* make fixup
* Quietly delete the file that didn't do anything (sorry Sylvain)
* Refactor the processor tests into one file
* make fixup
* Clean up some unnecessary indirection
* Fix TF mask postprocessing
* Add more processor equivalence tests
* Refactor generate_crop_boxes to use framework-neutral np code
* Make the serving output correctly conditional
* Fix error message line length
* Use dict keys rather than indices internally in both TF and PT SAM call/forward
* Return dicts internally in the call/forward methods
* Revert changes to common tests and just override check_pt_tf_outputs
* Revert changes to other model tests
* Clarify comments for functional layernorm
* Add missing transpose from PT code
* Removed unused copied from in PT code
* Remove overrides for tests that don't exist in TF
* Fix transpose and update tests for PT and TF to check pred_masks
* Add training flag
* Update tests to use TF checkpoints
* Update index.mdx
* Add missing cross-test decorator
* Remove optional extra asterisks
* Revert return_dict changes in PT code
* Update src/transformers/models/sam/modeling_tf_sam.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove None return annotations on init methods
* Update tests/models/sam/test_processor_sam.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix input_boxes shapes
* make fixup
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* initial working additions
* clean and rename, add cond stripping initial prompt to decode
* cleanup, edit create_initial_prompt_ids, add tests
* repo consistency, flip order of conditional
* fix error, move the processor fn to the tokenizer
* repo consistency, update test ids to corresponding tokenizer
* use convert_tokens_to_ids not get_vocab...
* use actual conditional in generate
* make sytle
* initial address comments
* initial working add new params to pipeline
* first draft of sequential generation for condition_on_previous_text
* add/update tests, make compatible with timestamps
* make compatible with diff. input kwargs and max length
* add None check
* add temperature check
* flip temp check operand
* refocusing to prev pr scope
* remove the params too
* make style
* edits, move max length incorporating prompt to whisper
* address comments
* remove asr pipeline prompt decoding, fix indexing
* address comments (more tests, validate prompt)
* un-comment out tests (from debug)
* remove old comment
* address comments
* fix typo
* remove timestamp token from test
* make style
* cleanup
* copy method to fast tokenizer, set max_new_tokens for test
* prompt_ids type just pt
* address Amy's comments
* make style
* Remove nestedness in tool config
* Really do it
* Use remote tools descriptions
* Work
* Clean up eval
* Changes
* Tools
* Tools
* tool
* Fix everything
* Use last result/assign for evaluation
* Prompt
* Remove hardcoded selection
* Evaluation for chat agents
* correct some spelling
* Small fixes
* Change summarization model (#23172)
* Fix link displayed
* Update description of the tool
* Fixes in chat prompt
* Custom tools, custom prompt
* Tool clean up
* save_pretrained and push_to_hub for tool
* Fix init
* Tests
* Fix tests
* Tool save/from_hub/push_to_hub and tool->load_tool
* Clean push_to_hub and add app file
* Custom inference API for endpoints too
* Clean up
* old remote tool and new remote tool
* Make a requirements
* return_code adds tool creation
* Avoid redundancy between global variables
* Remote tools can be loaded
* Tests
* Text summarization tests
* Quality
* Properly mark tests
* Test the python interpreter
* And the CI shall be green.
* fix loading of additional tools
* Work on RemoteTool and fix tests
* General clean up
* Guard imports
* Fix tools
* docs: Fix broken link in 'How to add a model...' (#23216)
fix link
* Get default endpoint from the Hub
* Add guide
* Simplify tool config
* Docs
* Some fixes
* Docs
* Docs
* Docs
* Fix code returned by agent
* Try this
* Match args with signature in remote tool
* Should fix python interpreter for Python 3.8
* Fix push_to_hub for tools
* Other fixes to push_to_hub
* Add API doc page
* Docs
* Docs
* Custom tools
* Pin tensorflow-probability (#23220)
* Pin tensorflow-probability
* [all-test]
* [all-test] Fix syntax for bash
* PoC for some chaining API
* Text to speech
* J'ai pris des libertés
* Rename
* Basic python interpreter
* Add agents
* Quality
* Add translation tool
* temp
* GenQA + LID + S2T
* Quality + word missing in translation
* Add open assistance, support f-strings in evaluate
* captioning + s2t fixes
* Style
* Refactor descriptions and remove chain
* Support errors and rename OpenAssistantAgent
* Add setup
* Deal with typos + example of inference API
* Some rename + README
* Fixes
* Update prompt
* Unwanted change
* Make sure everyone has a default
* One prompt to rule them all.
* SD
* Description
* Clean up remote tools
* More remote tools
* Add option to return code and update doc
* Image segmentation
* ControlNet
* Gradio demo
* Diffusers protection
* Lib protection
* ControlNet description
* Cleanup
* Style
* Remove accelerate and try to be reproducible
* No randomness
* Male Basic optional in token
* Clean description
* Better prompts
* Fix args eval in interpreter
* Add tool wrapper
* Tool on the Hub
* Style post-rebase
* Big refactor of descriptions, batch generation and evaluation for agents
* Make problems easier - interface to debug
* More problems, add python primitives
* Back to one prompt
* Remove dict for translation
* Be consistent
* Add prompts
* New version of the agent
* Evaluate new agents
* New endpoints agents
* Make all tools a dict variable
* Typo
* Add problems
* Add to big prompt
* Harmonize
* Add tools
* New evaluation
* Add more tools
* Build prompt with tools descriptions
* Tools on the Hub
* Let's chat!
* Cleanup
* Temporary bs4 safeguard
* Cache agents and clean up
* Blank init
* Fix evaluation for agents
* New format for tools on the Hub
* Add method to reset state
* Remove nestedness in tool config
* Really do it
* Use remote tools descriptions
* Work
* Clean up eval
* Changes
* Tools
* Tools
* tool
* Fix everything
* Use last result/assign for evaluation
* Prompt
* Remove hardcoded selection
* Evaluation for chat agents
* correct some spelling
* Small fixes
* Change summarization model (#23172)
* Fix link displayed
* Update description of the tool
* Fixes in chat prompt
* Custom tools, custom prompt
* Tool clean up
* save_pretrained and push_to_hub for tool
* Fix init
* Tests
* Fix tests
* Tool save/from_hub/push_to_hub and tool->load_tool
* Clean push_to_hub and add app file
* Custom inference API for endpoints too
* Clean up
* old remote tool and new remote tool
* Make a requirements
* return_code adds tool creation
* Avoid redundancy between global variables
* Remote tools can be loaded
* Tests
* Text summarization tests
* Quality
* Properly mark tests
* Test the python interpreter
* And the CI shall be green.
* Work on RemoteTool and fix tests
* fix loading of additional tools
* General clean up
* Guard imports
* Fix tools
* Get default endpoint from the Hub
* Simplify tool config
* Add guide
* Docs
* Some fixes
* Docs
* Docs
* Fix code returned by agent
* Try this
* Docs
* Match args with signature in remote tool
* Should fix python interpreter for Python 3.8
* Fix push_to_hub for tools
* Other fixes to push_to_hub
* Add API doc page
* Fixes
* Doc fixes
* Docs
* Fix audio
* Custom tools
* Audio fix
* Improve custom tools docstring
* Docstrings
* Trigger CI
* Mode docstrings
* More docstrings
* Improve custom tools
* Fix for remote tools
* Style
* Fix repo consistency
* Quality
* Tip
* Cleanup on doc
* Cleanup toc
* Add disclaimer for starcoder vs openai
* Remove disclaimer
* Small fixed in the prompts
* 4.29
* Update src/transformers/tools/agents.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Complete documentation
* Small fixes
* Agent evaluation
* Note about gradio-tools & LC
* Clean up agents and prompt
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Note about gradio-tools & LC
* Add copyrights and address review comments
* Quality
* Add all language codes
* Add remote tool tests
* Move custom prompts to other docs
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* TTS tests
* Quality
---------
Co-authored-by: Lysandre <hi@lyand.re>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
Co-authored-by: Connor Henderson <connor.henderson@talkiatry.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre <lysandre@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* First draft of RWKV-4
* Add support for generate
* Style post-rebase
* Properly use state
* Write doc
* Fix doc
* More math
* Add model to README, dummies and clean config
* Fix init
* multiple fixes:
- fix common tests
- fix configuraion default values
- add CI test for checking state computation
- fix some CI tests
* correct tokenizer
* some tweaks
- fix config docstring
- fix failing tests
* fix CI tests
- add output_attention / output_hidden_states
- override test_initialization
- fix failing CIs
* fix conversion script
- fix sharded case
- add new arguments
* add slow tests + more fixes on conversion script
* add another test
* final fixes
* change single name variable
* add mock attention mask for pipeline to work
* correct eos token id
* fix nits
* add checkpoints
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add `tie_word_embeddings` in docstring
* change tensor name
* fix final nits
* Trigger CI
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* initial commit
* formatting
* adding the class to many places
* towards less unhappy checks
* nearly there
* and gpt neox for qa
* use right model
* forgot this one
* base_model_prefix is "gpt_neox" for GPTNeoX* models
* unnecessary stuff
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* format
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* removed gpt2 stuff
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* initial commit
* formatting
* adding the class to many places
* towards less unhappy checks
* nearly there
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* avoid error
* moving to device of star/end_logits
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* make sure legacy code executes
* comment
* like this
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
* Add Trainer support for ReduceLROnPlateau
Fixes#16503
* Remove training argument and add default instance
---------
Co-authored-by: mmeloux <maxime.meloux@loria.fr>
Adds FocalNet by Microsoft to transformers
---------
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: alaradirik <alaradirik@gmail.com>
* initial work
* Add other classes
* Refactor code
* Move warning and fix dynamic pipeline
* Issue warning when necessary
* Add test
* Do not skip auto tests
* Fix failing tests
* Refactor and address review comments
* Address review comments
* wrong argument name
* append eos_token_id
* all tokenizers need mask and ctc_blank tokens
* remove reduction factor from feature extractor
* add proper TTS loss
* did shifting the wrong way around
* mask out padded portions
* remove logits again (don't really need it)
* fix unit tests
* fixup
* pad also returns the decoder attention mask, since that's useful to have
* clean up feature extractor logic
* pad can handle TTS task too
* remove stop_labels from loss calculation
* simplify logic
* fixup
* do -100 masking properly
* small STFT optimization (calculate mel filterbanks only once)
* replace torchaudio fbanks with audio_utils
* remove torchaudio dependency
* simplify & speed up the STFT
* don't serialize window and mel filters
* output cross attentions when generating speech
* add guided attention loss
* fix failing test
* Update src/transformers/models/speecht5/feature_extraction_speecht5.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/speecht5/modeling_speecht5.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* change type annotation of attention_mask to LongTensor
* extract loss into class
* remove unused frame_signal_scale argument
* use config object in loss class
* fix type annotations in doc comments
* change optional to just bool
* implement missing tokenizer method
* add deprecation warning
* Update src/transformers/models/speecht5/feature_extraction_speecht5.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/speecht5/feature_extraction_speecht5.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add deprecation warning for stop_labels
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add model to doc tests
* Remove generate and replace by prepare_inputs_for_generation
* More fixes
* Remove print statements
* Update integration tests
* Fix generate
* Remove model from auto mapping
* Use auto processor
* Fix integration tests
* Fix test
* Add inference code snippet
* Remove is_encoder_decoder
* Update docs
* Remove notebook link
* Fix docstrings for TFBLIP
* Fix missing line in TF port!
* Use values from torch tests now other bugs fixed
* Use values from torch tests now other bugs fixed
* Fix doctest string
* resolve conflicts
* rebase and make style
* test
* test
* test
* rebase and make style
* rebase and make style
* tests
* tests
* rewrite some functions
* rebase and make style
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* fix some bugs & docstring
* add models and tests
* solve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* tests
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* fix some bugs & docstring
* save resolution
* make style
* delete redefinition code
* reformat function
* reformat
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* tests
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* resolve conflicts
* make style
* fix bugs and refactor
* modify docstrings and make style
* unify import format in __init__.py
* fix import-altclp bug
* fix copies to update index.md
* fix unused config parameters
* fix unused config parameters
* fix unused config parameters
* update README_ja.md
* dummy commit for unit test
* fix attention mask
* add CPMAntTokenizer&-Fast to auto-mapping
* drop redundant changes in README_ko
* fix defaults in docstring
* fix use_cache and some docstring
* add missing args in tokenizer
* modify tester inheritance
* add is_jieba_available
* fix some bugs
* make style and fix-copies
* add doctests
* skip integration tests
* add is_jieba_available
* fix bugs in common tests
* adjust docstrings and make style
* add argument docstring
* adjust code to some specifications
* make style and fix-copies
* add fast tokenization test
* dummy commit for unit test
* dummy commit for unit test
* dummy commit for unit test
* normalize some comments and names
* Bert->CPMAnt
* camel names and drop redundant codes
* make style and fix-coies
* add CpmTokenizerFast _import_structure
* drop cpmanttokenizerfast in model_doc
* fix some problems
* fix CPMAnt tokenization for common test
* make style and fixup
* fix copies and fixup
* fix bugs in tokenization test
* dummy commit for connection failure in unittest
* fix copies
* drop trailing comma
* fix decorator in tests
* dummy commit for connection failure in unittest
---------
Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>
* Add out_indices to backbones, deprecate out_features
* Update - can specify both out_features and out_indices but not both
* Add backbone mixin tests
* Test tidy up
* Add test_backbone for convnext
* Remove redefinition of method
* Update for Dinat and Nat backbones
* Update tests
* Smarter indexing
* Add checks on config creation for backbone
* PR comments
* Adding Llama FastTokenizer support.
- Requires https://github.com/huggingface/tokenizers/pull/1183 version
- Only support byte_fallback for llama, raise otherwise (safety net).
- Lots of questions are special tokens
How to test:
```python
from transformers.convert_slow_tokenizer import convert_slow_tokenizer
from transformers import AutoTokenizer
from tokenizers import Tokenizer
tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")
if False:
new_tokenizer = Tokenizer.from_file("tok.json")
else:
new_tokenizer = convert_slow_tokenizer(tokenizer)
new_tokenizer.save("tok.json")
strings = [
"This is a test",
"生活的真谛是",
"生活的真谛是[MASK]。",
# XXX: This one is problematic because of special tokens
# "<s> Something something",
]
for string in strings:
encoded = tokenizer(string)["input_ids"]
encoded2 = new_tokenizer.encode(string).ids
assert encoded == encoded2, f"{encoded} != {encoded2}"
decoded = tokenizer.decode(encoded)
decoded2 = new_tokenizer.decode(encoded2)
assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
```
The converter + some test script.
The test script.
Tmp save.
Adding Fast tokenizer + tests.
Adding the tokenization tests.
Correct combination.
Small fix.
Fixing tests.
Fixing with latest update.
Rebased.
fix copies + normalized added tokens + copies.
Adding doc.
TMP.
Doc + split files.
Doc.
Versions + try import.
Fix Camembert + warnings -> Error.
Fix by ArthurZucker.
Not a decorator.
* Fixing comments.
* Adding more to docstring.
* Doc rewriting.
* Fix inverted conditional in TF common test!
* Make the same change in the PT tests file
* Make sure hidden states for GPT2 have the same output shape in PT/TF
* Minor fix to PT implementation of token classification loss
* Skip loss equivalence test for TFHubert because it keeps overflowing to inf
* Compute LM loss for TF the (weird) way it's computed in PT
* Skip loss equivalence test for Wav2Vec2 for the same reason as Hubert
* Fix - don't try to access the hidden states property when output is a tuple
* Initial commit
* more stash commit
* Yet another stash commit
* yet more stash commit
* Mostly working except for docs / repo consistency
* Stop importing model list from torch file
* Add TF BLIP models to docs
* Add auto classes
* Move get_text_features and get_image_features
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/models/blip/test_modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use channels_last convolutions in TF (better performance + compatibility)
* Remove _shape function
* Move multi-line statement to one line in PT + TF
* Specify tf.keras.layers instead of importing from it
* Remove test_gradient_checkpointing and empty test_training methods
* move some multi-line statements to one line
* Update docstring for generate
* Remove pruned heads set
* Remove self.seq_len_dim
* Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states
* ensure original model follows config in more cases
* Skip the same cross-attention tests in the PT tests - didn't realize we did it twice!
* Add training args throughout the models and layers
* make fixup
* Fix docstring for inputs_embeds
* Add docstring for is_decoder
* Add docstrings to text models
* Remove redundant computation
* Add unpack_inputs / keras_serializable
* Add modeling_tf_blip to doctests
* Add config classes for keras serialization
* Changes to allow model porting with pt-to-tf
* Quick fix to decoder head and test tweaks
* Revert an issue with masking the embeddings outputs
* Allow missing keys in some equivalence tests (for unused layers)
* Add tf-pt equivalence tests back in
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make fixup
* Refactor invert_attention_mask out into tf_utils
* Re-enable cross-tests on the PT side too
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix RoFormerEncoder postion embedding when generate as decoder
* make fixup
* add test case for check generate with past key values
* remove duplicating code
LayoutLMv3TokenizerFast produces empty 'Ġ' token with `offset_mapping = (0, 0)`.
Next token is wrongly assumed to also be beginning of word and isn't
correctly assigned `pad_token_label`.
Modify test with text that produce 'Ġ' token.
Remove copy check from LayoutLMv2TokenizerFast for `_batch_encode_plus`.
solves issue: #19978
* Making sure we can use safetensors to serialize all the time.
* Expanding the tests for increased coverage.
* Update the test.
* Getting current state of affairs.
* Tentative fix.
* Fixing black version.
* Fixing the worst offenders.
* Try to modify less files.
* Fixing blip_2 (Weird solution right now).
* Fixing deta.
* Fix blip ?
* Missing extra newline.
* No deta modification.
* Adding some comments.
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Addressing comments.
* Addressing comments.
* creating warn_once.
* Warning_once !
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add draft changes
* fix failing wav2vec
* style
* make sure that the argument is saved + add tests
* style
* fixup
* update test
* default clean_up_tokenization_spaces to False for Bloom and Llama
* Update code based on review
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>
* style
* quality
---------
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>
* Initial commit
* update modeling code
* update doc
* add functions necessary
* fix impotrs
* revert changes
* fixup
* more styling to get going
* remove standalone encoder
* update code
* styling
* fix config and model
* update code and some refactoring
* make more tests pass
* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes#21300
* fix mor common tests
* styke
* update testing file
* update
* update
* Router2 doc
* update check config with sparse layer
* add dummy router
* update current conversion script
* create on the fly conversion script
* Fixup
* style
* style 2
* fix empty return
* fix return
* Update default config sparse layers
* easier to create sparse layers
* update
* update conversion script
* update modeling
* add to toctree
* styling
* make ruff happy
* update docstring
* update conversion script
* update, will break tests but impelemting top2
* update
* ❗local groups are supported here
* ⚠️ Support for local groups is now removed ⚠️
This is because it has to work with model parallelism that we do not support
* finish simplificaiton
* Fix forward
* style
* fixup
* Update modelling and test, refactoring
* update tests
* remove final layer)norm as it is done in the FF
* routing works! Logits test added
* nit in test
* remove top1router
* style
* make sure sparse are tested. Had to change route_tokens a liottle bit
* add support for unslip models when converting
* fixup
* style
* update test s
* update test
* REFACTOR
* encoder outputs match!
* style
* update testing
* 🎉encoder and decoder logits match 🎉
* styleing
* update tests
* cleanup tests
* fix router test and CIs
* cleanup
* cleanup test styling
* fix tests
* Finally the generation tests match!
* cleanup
* update test
* style testing file
* remove script
* cleanup
* more cleanup
* nits
* update
* NLLB tokenizer is wrong and will be fixed soon
* use LongTensors
* update tests
* revert some small changes
* fix second expert sampling and batch prioritized routing
* update tests
* finish last tests
* make ruff happy
* update
* ruff again
* style
* Update docs/source/en/model_doc/nllb-moe.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updates based on review
* style and fix import issue
* nit
* more nits
* cleanup
* styling
* update test_seconde_expert_policy
* fix name
* last nit on the markdown examples
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft
* Fix integration test
* Remove script
* Fix test and typos
* Fix one more test
* Skip tied embeddings test
* Remove line
* Address comments
* add mega file structure and plain pytorch version of mega source code
* added config class with old naming conventions
* filled in mega documentation
* added config class and embeddings with optional token types
* updated notes
* starting the conversion process, deleted intermediate and added use_cache back to config
* renamed config attributes in modeling_mega.py
* checkpointing before refactoring incremental decoding functions
* removed stateful incremental key/values for EMA and self-attention
* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
* bug fix in attention mask handling in MovingAverageGatedAttention
* removed incremental state from GatedCrossAttention and removed IncrementalState class
* finished gated cross attention and got MegaLayer working
* fixed causal masking in mega decoder
* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching
* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids
* added optional dense hidden layer for masked and causal LM classes
* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention
* removed before_attn_fn in Mega class and updated docstrings and comments up to there
* bug fix in MovingAverageGatedAttention masking
* working conversion of MLM checkpoint in scratchpad script -- perfect matches
* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters
* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint
* finished checkpoint conversion script
* cleanup old class in mega config script
* removed 'copied from' statements and passing integration tests
* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing
* fixed tuple output of megamodel
* all common tests passing after fixing issues in decoder, gradient retention, and initialization
* added mega-specific tests, ready for more documentation and style checks
* updated docstrings; checkpoint before style fixes
* style and quality checks, fixed initialization problem in float_tensor, ready for PR
* added mega to toctree
* removed unnecessary arg in megaconfig
* removed unused arg and fixed code samples with leftover roberta models
* Apply suggestions from code review
Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA
* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms
* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention
* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()
* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files
* variable names in NFFN
* manual Mega->MEGA changes in docs
* Mega->MEGA in config auto
* style and quality fixes
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments
* commit before dealing with merge conflicts
* made new attention activation functions available in ACT2FN and added generation test from OPT
* style and quality in activations and tests
* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings
* style and quality fixes after latest updates, before rotary position ids
* causal mask in MegaBlock docstring + added missing device passing
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR
* style and quality fixes + readme updates pointing to main
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Chunkable classification pipeline
The TokenClassificationPipeline is now able to process sequences longer than 512. No matter the framework, the model, the tokenizer. We just have to pass process_all=True and a stride number (optional). The behavior remains the same if you don't pass these optional parameters. For overlapping parts when using stride above 0, we consider only the max scores for each overlapped token in all chunks where the token is.
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* update with latest black format
* update black format
* Update token_classification.py
* Update token_classification.py
* format correction
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update comments
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Update token_classification.py
Correct spaces, remove process_all and keep only stride. If stride is provided, the pipeline is applied to the whole text.
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update chunk aggregation
Update the chunk aggregation strategy based on entities aggregation.
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
Remove unnecessary pop from outputs dict
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add chunking tests
* correct formating
* correct formatting
* correct model id for test chunking
* update scores with nested simplify
* Update test_pipelines_token_classification.py
* Update test_pipelines_token_classification.py
* update model to a tiny one
* Update test_pipelines_token_classification.py
* Adding smaller test for chunking.
* Fixup
* Update token_classification.py
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list
Co-authored-by: dusejat <dusejat@amazon.com>
* time to say goodbye, torch 1.7 and 1.8
* clean up torch_int_div
* clean up is_torch_less_than_1_8-9
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Make sure CVT can be trained using mixed precision
* Add test for keras-fit with mixed-precision
* Update tests/models/cvt/test_modeling_tf_cvt.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Use return_loss for BridgeTowerForContrastiveLearning, add example
* fix tests
* Update example in BridgeTowerForContrastiveLearning
* Update test_modeling_bridgetower.py
* update model output format
* minor update
* Update src/transformers/models/bridgetower/modeling_bridgetower.py
* make style
---------
Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Don't rescale if in and in range 0-255
* Raise value error if int values too large
* Update tests/test_image_transforms.py
* Update tests/test_image_transforms.py
* add `get_input_embeddings` to `WhisperForAudioClassification`
* add common tests
* fix another common test
* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix style
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add new model of MGP-STR
* fix the check failings
* remove torch and numpy from mgp_tokenization
* remove unused import from modeling_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str.py
* add test_processing_mgp_str
* add test_processing_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str and add softmax outs to model
* rm test_processing_mgp_str and add softmax outs to model
* rewrite the code of mgp-str according to PR suggestions
* rewrite the code of mgp-str according to PR suggestions
* add new model of MGP-STR
* fix the check failings
* remove torch and numpy from mgp_tokenization
* remove unused import from modeling_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str.py
* add test_processing_mgp_str
* add test_processing_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str and add softmax outs to model
* rewrite the code of mgp-str according to PR suggestions
* rewrite the code of mgp-str according to PR suggestions
* remove representation_size from MGPSTRConfig
* reformat configuration_mgp_str.py
* format test_processor_mgp_str.py
* add test for tokenizer and complete model/processer test and model file
* rm Unnecessary tupple in modeling_mgp_str
* reduce hidden_size/layers/label_size in test_model
* add integration tests and change MGPSTR to Mgpstr
* add test for logit values
* reformat test model file
---------
Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
* added informer to gitignore
* added informer to gitignore
* WIP informer2020
* added checking that instantiate works
* added config using gluonTS by kashif
* WIP config
* adding informeConfig. need to remove FeatureEmbedder
* done InformerConfig, but need to change the names
* Done informer model init. working on enc-dec
* added things to address, after reading again enc-dec in the paper
* done modeling - checking initialization work
* added informer to gitignore
* WIP informer2020
* added checking that instantiate works
* added config using gluonTS by kashif
* WIP config
* adding informeConfig. need to remove FeatureEmbedder
* done InformerConfig, but need to change the names
* Done informer model init. working on enc-dec
* added things to address, after reading again enc-dec in the paper
* done modeling - checking initialization work
* moved enc-dec init to InformerEncoder/Decoder init
* added 'init_std' to config, now model init works!
* WIP conversion script, and added code sources
* WIP conversion script: loading original informer pth works
* WIP conversion script: change defaults in the config
* WIP conversion script: supporting Informer input embedding
* WIP conversion script: added parameters for the informer embed
* WIP conversion script: change dim_feedforward=2048
* WIP conversion script: remove unused args for loading checkpoint
* just cleaning up
* DataEmbedding removed, after thinking with Kashif
* working on forward pass
* WIP forward pass: trying to establish working batch for forward pass
* cleaning and finalizing
* adding HF names and docs
* init after cleaning works
* WIP in tests
* added docs for the informer specific args
* fix style
* undo change
* cleaning informer, now need to work only enc-dec
* initial enc-dec classes
* added encoder and decoder
* added todo
* add todos for conv_layers
* added decoder docs from vanilla
* added encoder docs from vanilla
* remove encoder decoder from the original informer
* removed AttentionLayer from the original paper
* removed TriangularCausalMask, same as decoder_attention_mask
* initial sparse attention
* use conv_layers
* fixed test_config test
* fix parenthesis when itearting zip(layers, conv_layers)
* error found in prob attention, added sizes as comments
* fix sizes
* added proposal for q_reduce indexing, and remove unused
* WIP ProbMask, and changed factor=2 for testing
* remove unused libs for this PR for creating the env
* fix checking the attn_weights.size() after bmm
* Q_reduce: changed from torch.gather to simple slicing
* WIP calculate final attn_output
* finish adding v_aggregated, attn_output ready
* changed tgt_len to u in attention_mask, need to fix the size error
* comment attention_mask for encoder, and fix if cond for v_agg
* added ProbMask support (wip), removed old original code
* finished ProbMask 😃
* Revert "remove unused libs for this PR for creating the env"
This reverts commit 11a081e09e.
* fixes
* make style
* fix initial tests
* fix more tests
* dry
* make style
* remove unused files
* style
* added integration tests
* fix num_static_real_features
* fix header
* remove unused function
* fix example
* fix docs
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/modeling_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* fixes for reviewer
* use prediction_length from model
* fix style
* fixed informer.mdx
* added to index
* updated readme
* undo
* make fix-copies
* typo
* fix copy
* added Informer to toctree
* in order
* fixed comments
* remove unneeded new lines in docs
* make static real and cat optional
* fix use of distil conv layers
* fixed integration test
* added checkpoint for convlayer
* make fix-copies
* updated from time series model
* make fix-copies
* copy decoder
* fix unit tests
* updated scaling config
* fix integration tests
* IGNORE_NON_TESTED
* IGNORE_NON_AUTO_CONFIGURED
* IGNORE_NON_AUTO_CONFIGURED
* updated check configs
* fix formatting
* undo change from time series
* prediction_length should not be None
* aliign with the blog: prettify ProbSparse and change attention_factor to sampling_factor
* make style
* make fix-copies
* niels CR: update contributed by
* niels CR: update configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* niels CR: update kashif -> huggingface
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* niels CR: `sampling_factor` only relevant when `attention_type`=prob
* make style
* fixed U_part: added multiplication by `L_Q`
* fixed bug: remove `is not None` from `if config.distil`
* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check
* fix integration tests
* updated model hub
* do not shift as in training
* undo
* fix make-copies
* make fix-copies
* added `if prediction_length is None`
* changed `ProbSparseAttention` to `InformerProbSparseAttention`
* changed `V_sum` -> `v_mean_dim_time`
* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`
* TimeSeriesTansformer->Informer in decoder's Copied from
* more descriptive in ProbSparse
* make style
* fix coped from
* Revert "added `if prediction_length is None`"
This reverts commit b4cbddfa05.
* fixed indent
* use InformerSinusoidalPositionalEmbedding
* make fix-style
* fix from #21860
* fix name
* make fix-copies
* use time series utils
* fix dec num_heads
* docstring
* added time series util doc
* _import_structure
* formatting
* changes from review
* make style
* fix docs
* fix doc
* removed NegativeLogLikelihood
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* [Whisper] Add model for audio classification
* make fix-copies
* add to docs
* add docstring
* empty returns
* add code example
* switch to fleurs
* stick everything on one line
* [WIP] whisper refacto to support language output.
* Handling merges.
* A bit more cleanup and comments.
* Many improvements.
Lots of details everywhere.
* Cleanup old code and tests.
* Handle lone timestamp tokens (just recover when something bad happens).
* Adding return_language example.
* No ffmpeg.
* Hmm.
* Some corrections.
* Both fast and slow.
* New black.
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove print.
* Undoing tests modifications.
* Smaller test modifications.
* Rename.
* Remove maxDiff.
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Make schedulers picklable by making lr_lambda fns global
* add unused _get_constant_schedule_lr_lambda arg
* remove unneeded _get_constant_schedule_lr_lamda
* add test
* make style
* rebase, remove torch dep, put lambda back
* repo-consistency and style
* Mark pipeline tests to skip them easily
* Mark the mixin as pipeline test
* Update src/transformers/testing_utils.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
* rounding_mode = "floor" instead of // to prevent behavioral change
* add other TODO
* use `torch_int_div` from pytrch_utils
* same for tests
* fix copies
* style
* use relative imports when needed
* Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* First commit for the improved PT-TF weight loading
* Remove workarounds from TFEncoderDecoder tests
* Allow a custom weight renaming function in from_pretrained and use that to clean up EncoderDecoder
* make fixup
* First attempt at visionencoderdecoder
* Disable tensorfloat32 in tests to get consistent outputs
* Quick fix to tf_vision_encoder_decoder tests
* make fixup
* Update Blenderbot tests
* Remove unused arg in modeling_tf_opt
* load_tf_sharded_weights had strict=True! This meant transfer learning was impossible, so I'm setting it to False.
* Support prefixes when loading sharded TF checkpoints
* make fixup
* Add test to load sharded models with a weight prefix
* Fix sharded weight loading test
* Add a test for transfer from a sharded checkpoint
* make fixup
* Add test to check that crossloading from PT with a prefix works
* Refactor from_pretrained in the encoderdecoder classes
* Refactor from_pretrained in the encoderdecoder classes
* missmatched -> mismatched
* Explicitly check for None
* No comments showing my very impressive and attractive knowledge of Py3.9+
* Disable TF32 across all TF tests