* moved `ctrl` to `Salesforce/ctrl`
redirects should theoretically work, but still updating those repo references for clarity
* Fixup
* Slow doc tests
* Add modeling file
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
* add Bros boilerplate
* copy and pasted modeling_bros.py from official Bros repo
* update copyright of bros files
* copy tokenization_bros.py from official repo and update import path
* copy tokenization_bros_fast.py from official repo and update import path
* copy configuration_bros.py from official repo and update import path
* remove trailing period in copyright line
* copy and paste bros/__init__.py from official repo
* save formatting
* remove unused unnecessary pe_type argument - using only crel type
* resolve import issue
* remove unused model classes
* remove unnecessary tests
* remove unused classes
* fix original code's bug - layer_module's argument order
* clean up modeling auto
* add bbox to prepare_config_and_inputs
* set temporary value to hidden_size (32 is too low because of the of the
Bros' positional embedding)
* remove decoder test, update create_and_check* input arguemnts
* add missing variable to model tests
* do make fixup
* update bros.mdx
* add boilerate plate for no_head inference test
* update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)
* add prepare_bros_batch_inputs function
* update modeling_common to add bbox inputs in Bros Model Test
* remove unnecessary model inference
* add test case
* add model_doc
* add test case for token_classification
* apply fixup
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* - update class name
* - add BrosSpadeOutput
- update BrosConfig arguments
* add boilerate plate for no_head inference test
* add prepare_bros_batch_inputs function
* add test case
* add test case for token_classification
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* apply masking on the fly
* add BrosSpadeForTokenLinking
* update class name
put docstring to the beginning of the file
* separate the logits calculation logic and loss calculation logic
* update logic for loss calculation so that logits shape doesn't change
when return
* update typo
* update prepare_config_and_inputs
* update dummy node initialization
* update last_hidden_states getting logic to consider when return_dict is False
* update box first token mask param
* bugfix: remove random attention mask generation
* update keys to ignore on load missing
* run make style and quality
* apply make style and quality of other codes
* update box_first_token_mask to bool type
* update index.md
* apply make style and quality
* apply make fix-copies
* pass check_repo
* update bros model doc
* docstring bugfix fix
* add checkpoint for doc, tokenizer for doc
* Update README.md
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update bros.md
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* apply suggestions from code review
* apply suggestions from code review
* revert test_processor_markuplm.py
* Update test_processor_markuplm.py
* apply suggestions from code review
* apply suggestions from code review
* apply suggestions from code review
* update BrosSpadeELForTokenClassification head name to entity linker
* add doc string for config params
* update class, var names to more explicit and apply suggestions from code review
* remove unnecessary keys to ignore
* update relation extractor to be initialized with config
* add bros processor
* apply make style and quality
* update bros.md
* remove bros tokenizer, add bros processor that wraps bert tokenizer
* revert change
* apply make fix-copies
* update processor code, update itc -> initial token, stc -> subsequent token
* add type hint
* remove unnecessary condition branches in embedding forward
* fix auto tokenizer fail
* update docstring for each classes
* update bbox input dimension as standard 2 points and convert them to 4
points in forward pass
* update bros docs
* apply suggestions from code review : update Bros -> BROS in bros.md
* 1. box prefix var -> bbox
2. update variable names to be more explicit
* replace einsum with torch matmul
* apply style and quality
* remove unused argument
* remove unused arguments
* update docstrings
* apply suggestions from code review: add BrosBboxEmbeddings, replace
einsum with classical matrix operations
* revert einsum update
* update bros processor
* apply suggestions from code review
* add conversion script for bros
* Apply suggestions from code review
* fix readme
* apply fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* intiial commit
* updates
* nits
* update conversion script
* update conversion script
* use path to load
* add tips etc
* some modeling logic
* modeling update
* more nits
* nits
* normal layer norm
* update config and doc
* nits
* update doc remove unused
* update
* fix inits and stuff
* fixup
* revert wrong changes
* updates
* more nits
* add default config values to the configuration file
* fixup happy
* update
* 2 tests left
* update readmes
* more nits
* slow test and more documentation
* update readme
* fix licences
* styling
* use fast if possible when saving tokenizer
* remove todo
* remove tokenization tests
* small last nits
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* nits to skip the timout doctest
* fix integration test
* fix test
* update eos token
* update to allow fast tokenization
* styling
* fix codeLlama as well for the update post processor
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more copied from statements
* update
* doc passes doctest
* remove `# final layer norm?`
* change docstring prompot
* update
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* don't doctest the conversion script as it requires more packages
* don't init a model in the config
* oups
* fix doctest
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add FlaxClipTextModelWithProjection
This is necessary to support the Flax port of Stable Diffusion XL: fb6d705fb5/text_encoder_2/config.json (L3)
Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>
* Use FlaxCLIPTextModelOutput
* make fix-copies again
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Use `return_dict` for consistency with other uses.
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Fix docstring example.
* Add new model to FlaxCLIPTextModelTest
* Add to IGNORE_NON_AUTO_CONFIGURED list
* Fix naming convention.
---------
Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* init commit
* config updated also some modeling
* Processor and Model config combined
* extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested
* model loading successful!
* feature extractor done!
* FE can now be called from HF
* postprocessing added in fe file
* same as prev commit
* Pop2PianoConfig doc done
* cfg docs slightly changed
* fe docs done
* batched
* batched working!
* temp
* v1
* checking
* trying to go with generate
* with generate and model tests passed
* before rebasing
* .
* tests done docs done remaining others & nits
* nits
* LogMelSpectogram shifted to FeatureExtractor
* is_tf rmeoved from pop2piano/init
* import solved
* tokenization tests added
* minor fixed regarding modeling_pop2piano
* tokenizer changed to only return midi_object and other changes
* Updated paper abstract(Camera-ready version) (#2)
* more comments and nits
* ruff changes
* code quality fix
* sg comments
* t5 change added and rebased
* comments except batching
* batching done
* comments
* small doc fix
* example removed from modeling
* ckpt
* forward it compatible with fe and generation done
* comments
* comments
* code-quality fix(maybe)
* ckpts changed
* doc file changed from mdx to md
* test fixes
* tokenizer test fix
* changes
* nits done main changes remaining
* code modified
* Pop2PianoProcessor added with tests
* other comments
* added Pop2PianoProcessor to dummy_objects
* added require_onnx to modeling file
* changes
* update .md file
* remove extra line in index.md
* back to the main index
* added pop2piano to index
* Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too
* changes
* added return types to 2 tokenizer methods
* the PR build test might work now
* added backends
* PR build fix
* vocab added
* comments
* refactored vocab into 1 file
* added conversion script
* comments
* essentia version changed in .md
* comments
* more tokenizer tests added
* minor fix
* tests extended for outputs acc check
* small fix
---------
Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>
* add AutoModelForTextToSpeech class
* add TTS pipeline and tessting
* add docstrings to text_to_speech pipeline
* fix torch dependency
* corrector 'processor is None' case in Pipeline
* correct repo id
* modify text-to-speech -> text-to-audio
* remove processor
* rename text_to_speech pipelines files to text_audio
* add textToWaveform and textToSpectrogram instead of textToAudio classes
* update TTS pipeline to the bare minimum
* update tests TTS pipeline
* make style and erase useless import torch in TTS pipeline tests
* modify how to check if generate or forward in TTS pipeline
* remove unnecessary extra new lines
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* refactor input_texts -> text_inputs
* correct docstrings of TTS.__call__
* correct the shape of generated waveform
* take care of Bark tokenizer special case
* correct run_pipeline_test TTS
* make style
* update TTS docstrings
* address Sylvain nit refactors
* make style
* refactor into one liners
* correct squeeze
* correct way to test if forward or generate
* Update output audio waveform shape
* make style
* correct import
* modify how the TTS pipeline test if a model can generate
* align shape output of TTS pipeline with consistent shape
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Resolve typo in check_repo.py
* Specify encoding when opening modeling files
* Deprecate the OpenLlama architecture
* Add disclaimer pointing to Llama
I'm open to different wordings here
* Match the capitalisation of LLaMA
* first raw version of the bark integration
* working code on small models with single run
* add converting script from suno weights 2 hf
* many changes
* correct past_kv output
* working implementation for inference
* update the converting script according to the architecture changes
* add a working end-to-end inference code
* remove some comments and make small changes
* remove unecessary comment
* add docstrings and ensure no unecessary intermediary output during audio generation
* remove done TODOs
* make style + add config docstrings
* modification for batch inference support on the whole model
* add details to .generation_audio method
* add copyright
* convert EncodecModel from original library to transformers implementation
* add two class in order to facilitate model and sub-models loading from the hub
* add support of loading the whole model
* add BarkProcessor
* correct modeling according to processor output
* Add proper __init__ and auto support
* Add up-to-date copyright/license message
* add relative import instead of absolute
* cleaner head_dim computation
* small comment removal or changes
* more verbose LayerNorm init method
* specify eps for clearer comprehension
* more verbose variable naming in the MLP module
* remove unecessary BarkBlock parameter
* clearer code in the forward pass of the BarkBlock
* remove _initialize_modules method for cleaner code
* Remove unnecessary methods from sub-models
* move code to remove unnecessary function
* rename a variable for clarity and change an assert
* move code and change variable name for clarity
* remove unnecessary asserts
* correct small bug
* correct a comment
* change variable names for clarity
* remove asserts
* change import from absolute to relative
* correct small error due to comma missing + correct import
* Add attribute Bark config
* add first version of tests
* update attention_map
* add tie_weights and resize_token_embeddings for fineModel
* correct getting attention_mask in generate_text_semantic
* remove Bark inference trick
* leave more choices in barkProcessor
* remove _no_split_modules
* fixe error in forward of block and introduce clearer notations
* correct converting script with last changes
* make style + add draft bark.mdx
* correct BarkModelTest::test_generate_text_semantic
* add Bark in main README
* add dummy_pt_objects for Bark
* add missing models in the main init
* correct test_decoder_model_past_with_large_inputs
* disable torchscript test
* change docstring of BarkProcessor
* Add test_processor_bark
* make style
* correct copyrights
* add bark.mdx + make style, quality and consistency
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Remove unnecessary test method
* simply logic of a test
* Only check first ids for slow audio generation
* split full end-to-end generation tests
* remove unneccessary comment
* change submodel names for clearer naming
* remove ModuleDict from modeling_bark
* combine two if statements
* ensure that an edge misued won't happen
* modify variable name
* move code snippet to the right place (coarse instead of semantic)
* change BarkSemanticModule -> BarkSemanticModel
* align BarkProcessor with transformers paradigm
* correct BarkProcessor tests with last commit changes
* change _validate_voice_preset to an instance method instead of a class method
* tie_weights already called with post_init
* add codec_model config to configuration
* update bark modeling tests with recent BarkProcessor changes
* remove SubModelPretrainedModel + change speakers embeddings prompt type in BarkModel
* change absolute imports to relative
* remove TODO
* change docstrings
* add examples to docs and docstrings
* make style
* uses BatchFeature in BarkProcessor insteads of dict
* continue improving docstrings and docs + make style
* correct docstrings examples
* more comprehensible speaker_embeddings load/Save
* rename speaker_embeddings_dict -> speaker_embeddings
* correct bark.mdx + add bark to documentation_tests
* correct docstrings configuration_bark
* integrate last nit suggestions
* integrate BarkGeneration configs
* make style
* remove bark tests from documentation_tests.txt because timeout - tested manually
* add proper generation config initialization
* small bark.mdx documentation changes
* rename bark.mdx -> bark.md
* add torch.no_grad behind BarkModel.generate_audio()
* replace assert by ValueError in convert_suno_to_hf.py
* integrate a series of short comments from reviewer
* move SemanticLogitsProcessors and remove .detach() from Bark docs and docstrings
* actually remove SemanticLogitsProcessor from modeling_bark.oy
* BarkProcessor returns a single output instead of tuple + correct docstrings
* make style + correct bug
* add initializer_range to BarkConfig + correct slow modeling tests
* add .clone() to history_prompt.coarse_prompt to avoid modifying input array
* Making sure no extra "`" are present
* remove extra characters in modeling_bark.py
* Correct output if history_prompt is None
* remove TODOs
* remove ravel comment
* completing generation_configuration_bark.py docstrings
* change docstrings - number of audio codebooks instead of Encodec codebooks
* change 'bias' docstrings in configuration_bark.py
* format code
* rename BarkModel.generate_audio -> BarkModel.generate_speech
* modify AutoConfig instead of EncodecConfig in BarkConfig
* correct AutoConfig wrong init
* refactor BarkModel and sub-models generate_coarse, generate_fine, generate_text_semantic
* remove SemanticLogitsProcessor and replace it with SuppressTokensLogitsProcessor
* move nb_codebook related config arguments to BarkFineConfig
* rename bark.mdx -> bark.md
* correcting BarkModelConfig from_pretrained + remove keys_to_ignore
* correct bark.md with correct hub path
* correct code bug in bark.md
* correct list tokens_to_suppress
* modify Processor to load nested speaker embeddings in a safer way
* correct batch sampling in BarkFineModel.generate_fine
* Apply suggestions from code review
Small docstrings correction and code improvements
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* give more details about num_layers in docstrings
* correct indentation mistake
* correct submodelconfig order of docstring variables
* put audio models in alphabetical order in utils/check_repo.my
* remove useless line from test_modeling_bark.py
* makes BarkCoarseModelTest inherits from (ModelTesterMixin, GenerationTesterMixin, unittest.TestCase) instead of BarkSemanticModelTest
* make a Tester class for each sub-model instead of inheriting
* add test_resize_embeddings=True for Bark sub-models
* add Copied from transformers.models.gpt_neo.modeling_gpt_neo.GPTNeoSelfAttention._split_heads
* remove 'Copied fom Bark' comment
* remove unneccessary comment
* change np.min -> min in modeling_bark.py
* refactored all custom layers to have Bark prefix
* add attention_mask as an argument of generate_text_semantic
* refactor sub-models start docstrings to have more precise config class definition
* move _tied_weights_keys overriding
* add docstrings to generate_xxx in modeling_bark.py
* add loading whole BarkModel to convert_suno_to_hf
* refactor attribute and variable names
* make style convert_suno
* update bark checkpoints
* remove never entered if statement
* move bark_modeling docstrings after BarkPretrainedModel class definition
* refactor modeling_bark.py: kv -> key_values
* small nits - code refactoring and removing unecessary lines from _init_weights
* nits - replace inplace method by variable assigning
* remove *optional* when necessary
* remove some lines in generate_speech
* add default value for optional parameter
* Refactor preprocess_histories_before_coarse -> preprocess_histories
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct usage after refactoring
* refactor Bark's generate_xxx -> generate and modify docstrings and tests accordingly
* update docstrings python in configuration_bark.py
* add bark files in utils/documentation_test.txt
* correct docstrings python snippet
* add the ability to use parameters in the form of e.g coarse_temperature
* add semantic_max_new_tokens in python snippet in docstrings for quicker generation
* Reformate sub-models kwargs in BakModel.generate
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct kwargs in BarkModel.generate
* correct attention_mask kwarg in BarkModel.generate
* add tests for sub-models args in BarkModel.generate and correct BarkFineModel.test_generate_fp16
* enrich BarkModel.generate docstrings with a description of how to use the kwargs
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>