* More limited setup -> setupclass conversion
* make fixup
* Trigger tests
* Fixup UDOP
* Missed a spot
* tearDown -> tearDownClass where appropriate
* Couple more class fixes
* Fixups for UDOP and VisionTextDualEncoder
* Ignore errors when removing the tmpdir, in case it already got cleaned up somewhere
* CLIP fixes
* More correct classmethods
* Wav2Vec2Bert fixes
* More methods become static
* More class methods
* More class methods
* Revert changes for integration tests / modeling files
* Use a different tempdir for tests that actually write to it
* Remove addClassCleanup and just use teardownclass
* Remove changes in modeling files
* Cleanup get_processor_dict() for got_ocr2
* Fix regression on Wav2Vec2BERT test that was masked by this before
* Rework tests that modify the tmpdir
* make fix-copies
* revert clvp modeling test changes
* Fix CLIP processor test
* make fix-copies
* Add dithering to the `Speech2TextFeatureExtractor` API.
- in kaldi : 4a8b7f6732/src/feat/feature-window.cc (L145)
- with dithering without a seed, the features become non-deterministic due
to small Gaussian noise added to the audio (i.e. 2 runs lead to little
different outputs)
* update the PR
- add dithering also for WhisperFeatureExtractor
- not adding to Wav2Vec2FeatureExtractor (no FBANK computation)
* add unit-tests for dithering, fix docstrings
* ruff
* utils/check_copies.py --fix_and_overwrite
* update code, add seed to unit-test
* adding explanation of dithering
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* add more cases
* fix method not found in unittest
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
* fix more cases
* add more models
* add all
* no unittest.case
* remove for oneformer
* fix style
---------
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
* add tests
* fix whisper
* update
* nit
* add qwen2-vl
* more updates!
* better this way
* fix this one
* fix more tests
* fix final tests, hope so
* fix led
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* pr comments
* not pass pixels and extra for low-mem tests, very flaky because of visio tower
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* try to stylify using ruff
* might need to remove these changes?
* use ruf format andruff check
* use isinstance instead of type comparision
* use # fmt: skip
* use # fmt: skip
* nits
* soem styling changes
* update ci job
* nits isinstance
* more files update
* nits
* more nits
* small nits
* check and format
* revert wrong changes
* actually use formatter instead of checker
* nits
* well docbuilder is overwriting this commit
* revert notebook changes
* try to nuke docbuilder
* style
* fix feature exrtaction test
* remve `indent-width = 4`
* fixup
* more nits
* update the ruff version that we use
* style
* nuke docbuilder styling
* leve the print for detected changes
* nits
* Remove file I/O
Co-authored-by: charliermarsh
<charlie.r.marsh@gmail.com>
* style
* nits
* revert notebook changes
* Add # fmt skip when possible
* Add # fmt skip when possible
* Fix
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* NIts
* more fixes
* fix tapas
* Another way to skip
* Recommended way
* Fix two more fiels
* Remove asynch
Remove asynch
---------
Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>
* add audio_utils usage in the FE of SpeechToText
* clean unecessary parameters of AudioSpectrogramTransformer FE
* add audio_utils usage in AST
* add serialization tests and function to FEs
* make style
* remove use_torchaudio and move to_dict to FE
* test audio_utils usage
* make style and fix import (remove torchaudio dependency import)
* fix torch dependency for jax and tensor tests
* fix typo
* clean tests with suggestions
* add lines to test if is_speech_availble is False
* stronger GC tests
* better tests and skip failing tests
* break down into 3 sub-tests
* break down into 3 sub-tests
* refactor a bit
* more refactor
* fix
* last nit
* credits contrib and suggestions
* credits contrib and suggestions
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix test for bart. Order is correct now let's skip BPEs
* ouf
* styling
* fix bert....
* slow refactoring
* current updates
* massive refactoring
* update
* NICE!
* update to see where I am at
* updates
* update
* update
* revert
* updates
* updates
* start supporting legacy_save
* styling
* big update
* revert some changes
* nits
* nniiiiiice
* small fixes
* kinda fix t5 with new behaviour
* major update
* fixup
* fix copies
* today's updates
* fix byt5
* upfate
* update
* update
* updates
* update vocab size test
* Barthez does not use not need the fairseq offset ids
* super calll must be after
* calll super
* move all super init
* move other super init
* fixup
* nits
* more fixes
* nits
* more fixes
* nits
* more fix
* remove useless files
* ouch all of them are affected
* and more!
* small imporvements
* no more sanitize token
* more changes around unique no split tokens
* partially fix more things
* keep legacy save but add warning
* so... more fixes
* updates
* guess deberta tokenizer could be nuked
* fixup
* fixup did some bad things
* nuke it if it breaks
* remove prints and pretrain fast from slow with new format.
* fixups
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fiou
* nit
* by default specials should not be normalized?
* update
* remove brakpoint
* updates
* a lot of updates
* fixup
* fixes revert some changes to match fast
* small nits
* that makes it cleaner
* fix camembert accordingly
* update
* some lest breaking changes
* update
* fixup
* fix byt5 and whisper mostly
* some more fixes, canine's byte vocab
* fix gpt2
* fix most of the perceiver tests (4 left)
* fix layout lmv3
* fixup
* fix copies for gpt2 style
* make sure to only warn once
* fix perciever and gpt2 tests
* some more backward compatibility: also read special tokens map because some ppl use it........////.....
* fixup
* add else when reading
* nits
* fresh updates
* fix copies
* will this make everything faster?
* fixes
* more fixes
* update
* more fixes
* fixup
* is the source of truth right?
* sorry camembert for the troubles
* current updates
* fixup
* update led
* update
* fix regression
* fix single word
* more model specific fixes
* fix t5 tests
* fixup
* more comments
* update
* fix nllb
* rstrip removed
* small fixes
* better handle additional_special_tokens and vocab sizes
* fixing
* styling
* fix 4 / 21
* fixup
* fix nlbb's tests
* some fixes
* fix t5
* fixes
* style
* fix canine tests
* damn this is nice
* nits
* m2m100 nit
* fixups
* fixes!
* fixup
* stash
* fix merge
* revert bad change
* fixup
* correct order for code Llama
* fix speecht5 post merge
* styling
* revert source of 11 fails
* small nits
* all changes in one go
* fnet hack
* fix 2 more tests
* update based on main branch of tokenizers
* fixup
* fix VITS issues
* more fixes
* fix mgp test
* fix camembert issues
* oups camembert still has 2 failing tests
* mluke fixes
* decode fixes
* small nits
* nits
* fix llama and vits
* fix camembert
* smal nits
* more fixes when initialising a fast from a slow and etc
* fix one of the last test
* fix CPM tokenizer test
* fixups
* fix pop2piano
* fixup
* ⚠️ Change tokenizers required version ⚠️
* ⚠️ Change tokenizers required version ⚠️
* "tokenizers>=0.14,<0.15", don't forget smaller than
* fix musicgen tests and pretraiendtokenizerfast
* fix owlvit and all
* update t5
* fix 800 red
* fix tests
* fix the fix of the fix of t5
* styling
* documentation nits
* cache _added_tokens_encoder
* fixups
* Nit
* fix red tests
* one last nit!
* make eveything a lot simpler
* Now it's over 😉
* few small nits
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates that work for now
* tests that should no be skipped / changed and fixed next
* fixup
* i am ashamed
* pushe the fix
* update
* fixups
* nits
* fix added_tokens_encoder
* fix canine test
* fix pegasus vocab
* fix transfoXL
* fixup
* whisper needs to be fixed for train new
* pegasus nits
* more pegasus fixes
* minor update
* better error message in failed test
* fix whisper failing test
* fix whisper failing test
* fix pegasus
* fixup
* fix **** pegasus
* reset things
* remove another file
* attempts to fix the strange custome encoder and offset
* nits here and there
* update
* fixup
* nit
* fix the whisper test
* nits nits
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates based on review
* some small update to potentially remove
* nits
* import rlu cache
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* move warning to `from_pretrained`
* update tests results now that the special tokens are always added
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Fix one BLIP arg not being optional, remove misspelled arg
* Remove the lxmert test overrides and just use the base test_saved_model_creation
* saved_model_creation fixes and re-enabling tests across the board
* Remove unnecessary skip
* Stop caching sinusoidal embeddings in speech_to_text
* Fix transfo_xl compilation
* Fix transfo_xl compilation
* Fix the conditionals in xglm
* Set the save spec only when building
* Clarify comment
* Move comment correctly
* Correct embeddings generation for speech2text
* Mark RAG generation tests as @slow
* Remove redundant else:
* Add comment to clarify the save_spec line in build()
* Fix size tests for XGLM at last!
* make fixup
* Remove one band_part operation
* Mark test_keras_fit as @slow
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Don't forget the imports
* Add the imports to tests too
* make fixup
* Refactor tests that depended on get_type_hints
* Better test refactor
* Fix an old hidden bug in the test_keras_fit input creation code
* Fix for the Deit tests
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object
* Support for Bart and LayoutLM, and partial support for XLNet
* Support for mbart
* A lot of new models supported
* Support for other models
* LayoutLM fix
* Use strings instead of classes