* first raw commit
* still POC
* tentative convert script
* almost working speech encoder conversion scripts
* intermediate code for encoder/decoders
* add modeling code
* first version of speech encoder
* make style
* add new adapter layer architecture
* add adapter block
* add first tentative config
* add working speech encoder conversion
* base model convert works now
* make style
* remove unnecessary classes
* remove unecessary functions
* add modeling code speech encoder
* rework logics
* forward pass of sub components work
* add modeling codes
* some config modifs and modeling code modifs
* save WIP
* new edits
* same output speech encoder
* correct attention mask
* correct attention mask
* fix generation
* new generation logics
* erase comments
* make style
* fix typo
* add some descriptions
* new state
* clean imports
* add tests
* make style
* make beam search and num_return_sequences>1 works
* correct edge case issue
* correct SeamlessM4TConformerSamePadLayer copied from
* replace ACT2FN relu by nn.relu
* remove unecessary return variable
* move back a class
* change name conformer_attention_mask ->conv_attention_mask
* better nit code
* add some Copied from statements
* small nits
* small nit in dict.get
* rename t2u model -> conditionalgeneration
* ongoing refactoring of structure
* update models architecture
* remove SeamlessM4TMultiModal classes
* add tests
* adapt tests
* some non-working code for vocoder
* add seamlessM4T vocoder
* remove buggy line
* fix some hifigan related bugs
* remove hifigan specifc config
* change
* add WIP tokenization
* add seamlessM4T working tokenzier
* update tokenization
* add tentative feature extractor
* Update converting script
* update working FE
* refactor input_values -> input_features
* update FE
* changes in generation, tokenizer and modeling
* make style and add t2u_decoder_input_ids
* add intermediate outputs for ToSpeech models
* add vocoder to speech models
* update valueerror
* update FE with languages
* add vocoder convert
* update config docstrings and names
* update generation code and configuration
* remove todos and update config.pad_token_id to generation_config.pad_token_id
* move block vocoder
* remove unecessary code and uniformize tospeech code
* add feature extractor import
* make style and fix some copies from
* correct consistency + make fix-copies
* add processor code
* remove comments
* add fast tokenizer support
* correct pad_token_id in M4TModel
* correct config
* update tests and codes + make style
* make some suggested correstion - correct comments and change naming
* rename some attributes
* rename some attributes
* remove unecessary sequential
* remove option to use dur predictor
* nit
* refactor hifigan
* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config
* add tests
* change tgt_lang logic
* update generation ToSpeech
* add support import SeamlessM4TProcessor
* fix generate
* make tests
* update integration tests, add option to only return text and update tokenizer fast
* fix wrong function call
* update import and convert script
* update integration tests + update repo id
* correct paths and add first test
* update how new attention masks are computed
* update tests
* take first care of batching in vocoder code
* add batching with the vocoder
* add waveform lengths to model outputs
* make style
* add generate kwargs + forward kwargs of M4TModel
* add docstrings forward methods
* reformate docstrings
* add docstrings t2u model
* add another round of modeling docstrings + reformate speaker_id -> spkr_id
* make style
* fix check_repo
* make style
* add seamlessm4t to toctree
* correct check_config_attributes
* write config docstrings + some modifs
* make style
* add docstrings tokenizer
* add docstrings to processor, fe and tokenizers
* make style
* write first version of model docs
* fix FE + correct FE test
* fix tokenizer + add correct integration tests
* fix most tokenization tests
* make style
* correct most processor test
* add generation tests and fix num_return_sequences > 1
* correct integration tests -still one left
* make style
* correct position embedding
* change numbeams to 1
* refactor some modeling code and correct one test
* make style
* correct typo
* refactor intermediate fnn
* refactor feedforward conformer
* make style
* remove comments
* make style
* fix tokenizer tests
* make style
* correct processor tests
* make style
* correct S2TT integration
* Apply suggestions from Sanchit code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct typo
* replace torch.nn->nn + make style
* change Output naming (waveforms -> waveform) and ordering
* nit renaming and formating
* remove return None when not necessary
* refactor SeamlessM4TConformerFeedForward
* nit typo
* remove almost copied from comments
* add a copied from comment and remove an unecessary dropout
* remove inputs_embeds from speechencoder
* remove backward compatibiliy function
* reformate class docstrings for a few components
* remove unecessary methods
* split over 2 lines smthg hard to read
* make style
* replace two steps offset by one step as suggested
* nice typo
* move warnings
* remove useless lines from processor
* make generation non-standard test more robusts
* remove torch.inference_mode from tests
* split integration tests
* enrich md
* rename control_symbol_vocoder_offset->vocoder_offset
* clean convert file
* remove tgt_lang and src_lang from FE
* change generate docstring of ToText models
* update generate docstring of tospeech models
* unify how to deal withtext_decoder_input_ids
* add default spkr_id
* unify tgt_lang for t2u_model
* simplify tgt_lang verification
* remove a todo
* change config docstring
* make style
* simplify t2u_tgt_lang_id
* make style
* enrich/correct comments
* enrich .md
* correct typo in docstrings
* add torchaudio dependency
* update tokenizer
* make style and fix copies
* modify SeamlessM4TConverter with new tokenizer behaviour
* make style
* correct small typo docs
* fix import
* update docs and add requirement to tests
* add convert_fairseq2_to_hf in utils/not_doctested.txt
* update FE
* fix imports and make style
* remove torchaudio in FE test
* add seamless_m4t.md to utils/not_doctested.txt
* nits and change the way docstring dataset is loaded
* move checkpoints from ylacombe/ to facebook/ orga
* refactor warning/error to be in the 119 line width limit
* round overly precised floats
* add stereo audio behaviour
* refactor .md and make style
* enrich docs with more precised architecture description
* readd undocumented models
* make fix-copies
* apply some suggestions
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* correct bug from previous commit
* refactor a parameter allowing to clean the code + some small nits
* clean tokenizer
* make style and fix
* make style
* clean tokenizers arguments
* add precisions for some tests
* move docs from not_tested to slow
* modify tokenizer according to last comments
* add copied from statements in tests
* correct convert script
* correct parameter docstring style
* correct tokenization
* correct multi gpus
* make style
* clean modeling code
* make style
* add copied from statements
* add copied statements
* add support with ASR pipeline
* remove file added inadvertently
* fix docstrings seamlessM4TModel
* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown
* add seamlessm4t to assisted generation ignored models
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial commit
* add processor, add fuyu naming
* add draft processor
* fix processor
* remove dropout to fix loading of weights
* add image processing fixes from Pedro
* fix
* fix processor
* add basic processing fuyu test
* add documentation and TODO
* address comments, add tests, add doc
* replace assert with torch asserts
* add Mixins and fix tests
* clean imports
* add model tester, clean imports
* fix embedding test
* add updated tests from pre-release model
* Processor: return input_ids used for inference
* separate processing and model tests
* relax test tolerance for embeddings
* add test for logit comparison
* make sure fuyu image processor is imported in the init
* fix formattingh
* more formatting issues
* and more
* fixups
* remove some stuff
* nits
* update init
* remove the fuyu file
* Update integration test with release model
* Update conversion script.
The projection is not used, as confirmed by the authors.
* improve geenration
* Remove duplicate function
* Trickle down patches to model call
* processing fuyu updates
* remove things
* fix prepare_inputs_for_generation to fix generate()
* remove model_input
* update
* add generation tests
* nits
* draft leverage automodel and autoconfig
* nits
* fix dtype patch
* address comments, update READMEs and doc, include tests
* add working processing test, remove refs to subsequences
* add tests, remove Sequence classification
* processing
* update
* update the conversion script
* more processing cleanup
* safe import
* take out ModelTesterMixin for early release
* more cl;eanup
* more cleanup
* more cleanup
* and more
* register a buffer
* nits
* add postprocessing of generate output
* nits
* updates
* add one working test
* fix test
* make fixup works
* fixup
* Arthur's updates
* nits
* update
* update
* fix processor
* update tests
* passe more fixups
* fix
* nits
* don't import torch
* skip fuyu config for now
* fixup done
* fixup
* update
* oups
* nits
* Use input embeddings
* no buffer
* update
* styling processing fuyu
* fix test
* update licence
* protect torch import
* fixup and update not doctested
* kwargs should be passed
* udpates
* update the impofixuprts in the test
* protect import
* protecting imports
* protect imports in type checking
* add testing decorators
* protect top level import structure
* fix typo
* fix check init
* move requires_backend to functions
* Imports
* Protect types
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
* llm prompting guide
* updated code examples
* an attempt to fix the code example tests
* set seed in examples
* added a doctest comment
* added einops to the doc_test_job
* string formatting
* string formatting, again
* added the toc to slow_documentation_tests.txt
* minor list fix
* string formatting + pipe renamed
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* replaced max_length with max_new_tokens and updated the outputs to match
* minor formatting fix
* removed einops from circleci config
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* removed einops and trust_remote_code parameter
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* fix typos in idefics.md
Two typos found in reviewing this documentation.
1) max_new_tokens=4, is not sufficient to generate "Vegetables" as indicated - you will get only "Veget". (incidentally - some mention of how to select this value might be useful as it seems to change in each example)
2) inputs = processor(prompts, return_tensors="pt").to(device) as inputs need to be on the same device (as they are in all other examples on the page)
* Update idefics.md
Change device to cuda explicitly to match other examples
* add Bros boilerplate
* copy and pasted modeling_bros.py from official Bros repo
* update copyright of bros files
* copy tokenization_bros.py from official repo and update import path
* copy tokenization_bros_fast.py from official repo and update import path
* copy configuration_bros.py from official repo and update import path
* remove trailing period in copyright line
* copy and paste bros/__init__.py from official repo
* save formatting
* remove unused unnecessary pe_type argument - using only crel type
* resolve import issue
* remove unused model classes
* remove unnecessary tests
* remove unused classes
* fix original code's bug - layer_module's argument order
* clean up modeling auto
* add bbox to prepare_config_and_inputs
* set temporary value to hidden_size (32 is too low because of the of the
Bros' positional embedding)
* remove decoder test, update create_and_check* input arguemnts
* add missing variable to model tests
* do make fixup
* update bros.mdx
* add boilerate plate for no_head inference test
* update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)
* add prepare_bros_batch_inputs function
* update modeling_common to add bbox inputs in Bros Model Test
* remove unnecessary model inference
* add test case
* add model_doc
* add test case for token_classification
* apply fixup
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* - update class name
* - add BrosSpadeOutput
- update BrosConfig arguments
* add boilerate plate for no_head inference test
* add prepare_bros_batch_inputs function
* add test case
* add test case for token_classification
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* apply masking on the fly
* add BrosSpadeForTokenLinking
* update class name
put docstring to the beginning of the file
* separate the logits calculation logic and loss calculation logic
* update logic for loss calculation so that logits shape doesn't change
when return
* update typo
* update prepare_config_and_inputs
* update dummy node initialization
* update last_hidden_states getting logic to consider when return_dict is False
* update box first token mask param
* bugfix: remove random attention mask generation
* update keys to ignore on load missing
* run make style and quality
* apply make style and quality of other codes
* update box_first_token_mask to bool type
* update index.md
* apply make style and quality
* apply make fix-copies
* pass check_repo
* update bros model doc
* docstring bugfix fix
* add checkpoint for doc, tokenizer for doc
* Update README.md
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update bros.md
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* apply suggestions from code review
* apply suggestions from code review
* revert test_processor_markuplm.py
* Update test_processor_markuplm.py
* apply suggestions from code review
* apply suggestions from code review
* apply suggestions from code review
* update BrosSpadeELForTokenClassification head name to entity linker
* add doc string for config params
* update class, var names to more explicit and apply suggestions from code review
* remove unnecessary keys to ignore
* update relation extractor to be initialized with config
* add bros processor
* apply make style and quality
* update bros.md
* remove bros tokenizer, add bros processor that wraps bert tokenizer
* revert change
* apply make fix-copies
* update processor code, update itc -> initial token, stc -> subsequent token
* add type hint
* remove unnecessary condition branches in embedding forward
* fix auto tokenizer fail
* update docstring for each classes
* update bbox input dimension as standard 2 points and convert them to 4
points in forward pass
* update bros docs
* apply suggestions from code review : update Bros -> BROS in bros.md
* 1. box prefix var -> bbox
2. update variable names to be more explicit
* replace einsum with torch matmul
* apply style and quality
* remove unused argument
* remove unused arguments
* update docstrings
* apply suggestions from code review: add BrosBboxEmbeddings, replace
einsum with classical matrix operations
* revert einsum update
* update bros processor
* apply suggestions from code review
* add conversion script for bros
* Apply suggestions from code review
* fix readme
* apply fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* intiial commit
* updates
* nits
* update conversion script
* update conversion script
* use path to load
* add tips etc
* some modeling logic
* modeling update
* more nits
* nits
* normal layer norm
* update config and doc
* nits
* update doc remove unused
* update
* fix inits and stuff
* fixup
* revert wrong changes
* updates
* more nits
* add default config values to the configuration file
* fixup happy
* update
* 2 tests left
* update readmes
* more nits
* slow test and more documentation
* update readme
* fix licences
* styling
* use fast if possible when saving tokenizer
* remove todo
* remove tokenization tests
* small last nits
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* nits to skip the timout doctest
* fix integration test
* fix test
* update eos token
* update to allow fast tokenization
* styling
* fix codeLlama as well for the update post processor
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more copied from statements
* update
* doc passes doctest
* remove `# final layer norm?`
* change docstring prompot
* update
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* don't doctest the conversion script as it requires more packages
* don't init a model in the config
* oups
* fix doctest
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add all
* Revert "Delete .github directory"
This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1.
* make conversion script backward compatible
* fixup
* more styling
* copy to llama changes
* fix repo consistency
* nits
* document correct classes
* updates
* more fixes
* nits
* update auto mappings
* add readmes
* smallupdates
* llama-code replace with llama_code
* make fixup
* updates to the testsing suite
* fix fast nits
* more small fixes
* fix decode
* fix template processing
* properly reset the normalizer
* nits processor
* tokenization tests pass
* styling
* last tests
* additional nits
* one test is left
* nits
Co-authored-by faabian <faabian@users.noreply.github.com>
* update failing test
* fixup
* remove decode infilling users should handle it on their onw after generation, padding can be a problem
* update
* make test slow and more meaningfull
* fixup
* doc update
* fixup
* Apply suggestions from code review
* add kwargs doc
* tokenizer requires `requires_backend`
* type requires_backends
* CodeLlama instead of LlamaCode
* more name cahnges
* nits
* make doctests happy
* small pipeline nits
* last nit
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update
* add codellama to toctree
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial addition of t5forsequenceclassification
* Adding imports and adding tests
* Formatting
* Running make fix-copies
* Adding mt5forseq
* Formatting
* run make fix-copies
* Adding to docs
* Add model_parallel
* Fix bug
* Fix
* Remove TODO
* Fixing tests for T5ForSequenceClassification
* Undo changes to dependency_versions_table.py
* Change classification head to work with T5Config directly
* Change seq length to let tests pass
* PR comments for formatting
* Formatting
* Initial addition of UMT5ForSequenceClassification
* Adding to inits and formatting
* run make fix-copies
* Add doc for UMT5ForSeqClass
* Update UMT5 config
* Fix docs
* Skip torch fx test for SequenceClassification
* Formatting
* Add skip to UMT5 tests as well
* Fix umt5 tests
* Running make fix-copies
* PR comments
* Fix for change to sentence_representation
* Rename seq_len to hidden_size since that's what it is
* Use base_model to follow format of the rest of the library
* Update docs
* Extract the decoder_input_ids changes and make one liner
* Make one-liner
* pull and push updates
* add docs
* fix modeling
* Add and run test
* make copies
* add task
* fix tests and fix small issues
* Checks on a Pull Request
* fix docs
* add desc pvt.md
* Initial commit
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Cleanup config docstring
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Convert to relative imports
* Remove torch < 1.8 warning
* Restructure cos_sin header
* qkv -> query, key, value
* Refactor attention calculation
* Add a couple of config variables to account for the different checkpoints
* Successful merging of the code paths!
* Fix misplaced line in the non-parallel attention path
* Update config and tests
* Add a pad_token_id when testing
* Support output_attentions when alibi is None
* make fixup
* Skip KV cache shape test
* No more _keys_to_ignore_on_load_missing
* Simplify self attention a bit
* Simplify self attention a bit
* make fixup
* stash commit
* Some more attention mask updates
* Should pass all tests except assisted generation!
* Add big model generation test
* make fixup
* Add temporary workaround for test
* Test overrides for assisted generation
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Test overrides for assisted generation
* Add generation demo
* Update copyright
* Make the docstring model actually small
* Add module-level docstring
* Remove all assertions
* Add copied from bloom
* Reformat the QKV layer
* Add copied from bloom
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove unused line and reformat
* No single letter variables
* Cleanup return names
* Add copied from line
* Remove the deprecated arguments blocks
* Change the embeddings test to an alibi on/off test
* Remove position_ids from FalconForQA
* Remove old check for token type IDs
* Fix the alibi path when multi_query is False
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update config naming
* Fix typo for new_decoder_architecture
* Add some comments
* Fix docstring
* Fix docstring
* Create range in the right dtype from the start
* Review comment cleanup
* n_head_kv -> num_kv_heads
* self.alibi -> self.use_alibi
* self.num_kv -> self.num_kv_heads
* Reorder config args
* Made alibi arguments Optional
* Add all model docstrings
* Add extra checkpoints
* Add author info for Falcon
* Stop removing token_type_ids because our checkpoints shouldn't return it anymore
* Add one hopeful comment for the future
* Fix typo
* Update tests, fix cache issue for generation
* Use -1e9 instead of -inf to avoid float overflow
* Recompute the rotary embeddings much less often
* Re-enable disabled tests
* One final fix to attention mask calculation, and update tests
* Cleanup targeting falcon-40b equivalency
* Post-rebase docs update
* Update docstrings, especially in the config
* More descriptive variable names, and comments where we can't rename them
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* import torch before it is used
* style
Signed-off-by: byhsu <byhsu@linkedin.com>
---------
Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
* First draft of RWKV-4
* Add support for generate
* Style post-rebase
* Properly use state
* Write doc
* Fix doc
* More math
* Add model to README, dummies and clean config
* Fix init
* multiple fixes:
- fix common tests
- fix configuraion default values
- add CI test for checking state computation
- fix some CI tests
* correct tokenizer
* some tweaks
- fix config docstring
- fix failing tests
* fix CI tests
- add output_attention / output_hidden_states
- override test_initialization
- fix failing CIs
* fix conversion script
- fix sharded case
- add new arguments
* add slow tests + more fixes on conversion script
* add another test
* final fixes
* change single name variable
* add mock attention mask for pipeline to work
* correct eos token id
* fix nits
* add checkpoints
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add `tie_word_embeddings` in docstring
* change tensor name
* fix final nits
* Trigger CI
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* initial commit
* formatting
* adding the class to many places
* towards less unhappy checks
* nearly there
* and gpt neox for qa
* use right model
* forgot this one
* base_model_prefix is "gpt_neox" for GPTNeoX* models
* unnecessary stuff
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* format
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* removed gpt2 stuff
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* initial commit
* formatting
* adding the class to many places
* towards less unhappy checks
* nearly there
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* avoid error
* moving to device of star/end_logits
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* make sure legacy code executes
* comment
* like this
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Adds FocalNet by Microsoft to transformers
---------
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: alaradirik <alaradirik@gmail.com>
* resolve conflicts
* rebase and make style
* test
* test
* test
* rebase and make style
* rebase and make style
* tests
* tests
* rewrite some functions
* rebase and make style
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* fix some bugs & docstring
* add models and tests
* solve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* tests
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* fix some bugs & docstring
* save resolution
* make style
* delete redefinition code
* reformat function
* reformat
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* tests
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* resolve conflicts
* make style
* fix bugs and refactor
* modify docstrings and make style
* unify import format in __init__.py
* fix import-altclp bug
* fix copies to update index.md
* fix unused config parameters
* fix unused config parameters
* fix unused config parameters
* update README_ja.md
* dummy commit for unit test
* fix attention mask
* add CPMAntTokenizer&-Fast to auto-mapping
* drop redundant changes in README_ko
* fix defaults in docstring
* fix use_cache and some docstring
* add missing args in tokenizer
* modify tester inheritance
* add is_jieba_available
* fix some bugs
* make style and fix-copies
* add doctests
* skip integration tests
* add is_jieba_available
* fix bugs in common tests
* adjust docstrings and make style
* add argument docstring
* adjust code to some specifications
* make style and fix-copies
* add fast tokenization test
* dummy commit for unit test
* dummy commit for unit test
* dummy commit for unit test
* normalize some comments and names
* Bert->CPMAnt
* camel names and drop redundant codes
* make style and fix-coies
* add CpmTokenizerFast _import_structure
* drop cpmanttokenizerfast in model_doc
* fix some problems
* fix CPMAnt tokenization for common test
* make style and fixup
* fix copies and fixup
* fix bugs in tokenization test
* dummy commit for connection failure in unittest
* fix copies
* drop trailing comma
* fix decorator in tests
* dummy commit for connection failure in unittest
---------
Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>
* Initial commit
* update modeling code
* update doc
* add functions necessary
* fix impotrs
* revert changes
* fixup
* more styling to get going
* remove standalone encoder
* update code
* styling
* fix config and model
* update code and some refactoring
* make more tests pass
* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes#21300
* fix mor common tests
* styke
* update testing file
* update
* update
* Router2 doc
* update check config with sparse layer
* add dummy router
* update current conversion script
* create on the fly conversion script
* Fixup
* style
* style 2
* fix empty return
* fix return
* Update default config sparse layers
* easier to create sparse layers
* update
* update conversion script
* update modeling
* add to toctree
* styling
* make ruff happy
* update docstring
* update conversion script
* update, will break tests but impelemting top2
* update
* ❗local groups are supported here
* ⚠️ Support for local groups is now removed ⚠️
This is because it has to work with model parallelism that we do not support
* finish simplificaiton
* Fix forward
* style
* fixup
* Update modelling and test, refactoring
* update tests
* remove final layer)norm as it is done in the FF
* routing works! Logits test added
* nit in test
* remove top1router
* style
* make sure sparse are tested. Had to change route_tokens a liottle bit
* add support for unslip models when converting
* fixup
* style
* update test s
* update test
* REFACTOR
* encoder outputs match!
* style
* update testing
* 🎉encoder and decoder logits match 🎉
* styleing
* update tests
* cleanup tests
* fix router test and CIs
* cleanup
* cleanup test styling
* fix tests
* Finally the generation tests match!
* cleanup
* update test
* style testing file
* remove script
* cleanup
* more cleanup
* nits
* update
* NLLB tokenizer is wrong and will be fixed soon
* use LongTensors
* update tests
* revert some small changes
* fix second expert sampling and batch prioritized routing
* update tests
* finish last tests
* make ruff happy
* update
* ruff again
* style
* Update docs/source/en/model_doc/nllb-moe.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updates based on review
* style and fix import issue
* nit
* more nits
* cleanup
* styling
* update test_seconde_expert_policy
* fix name
* last nit on the markdown examples
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add mega file structure and plain pytorch version of mega source code
* added config class with old naming conventions
* filled in mega documentation
* added config class and embeddings with optional token types
* updated notes
* starting the conversion process, deleted intermediate and added use_cache back to config
* renamed config attributes in modeling_mega.py
* checkpointing before refactoring incremental decoding functions
* removed stateful incremental key/values for EMA and self-attention
* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
* bug fix in attention mask handling in MovingAverageGatedAttention
* removed incremental state from GatedCrossAttention and removed IncrementalState class
* finished gated cross attention and got MegaLayer working
* fixed causal masking in mega decoder
* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching
* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids
* added optional dense hidden layer for masked and causal LM classes
* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention
* removed before_attn_fn in Mega class and updated docstrings and comments up to there
* bug fix in MovingAverageGatedAttention masking
* working conversion of MLM checkpoint in scratchpad script -- perfect matches
* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters
* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint
* finished checkpoint conversion script
* cleanup old class in mega config script
* removed 'copied from' statements and passing integration tests
* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing
* fixed tuple output of megamodel
* all common tests passing after fixing issues in decoder, gradient retention, and initialization
* added mega-specific tests, ready for more documentation and style checks
* updated docstrings; checkpoint before style fixes
* style and quality checks, fixed initialization problem in float_tensor, ready for PR
* added mega to toctree
* removed unnecessary arg in megaconfig
* removed unused arg and fixed code samples with leftover roberta models
* Apply suggestions from code review
Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA
* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms
* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention
* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()
* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files
* variable names in NFFN
* manual Mega->MEGA changes in docs
* Mega->MEGA in config auto
* style and quality fixes
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments
* commit before dealing with merge conflicts
* made new attention activation functions available in ACT2FN and added generation test from OPT
* style and quality in activations and tests
* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings
* style and quality fixes after latest updates, before rotary position ids
* causal mask in MegaBlock docstring + added missing device passing
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR
* style and quality fixes + readme updates pointing to main
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Whisper] Add model for audio classification
* make fix-copies
* add to docs
* add docstring
* empty returns
* add code example
* switch to fleurs
* stick everything on one line
* zero shot object detection part 1
* added batch prediction section
* added image guided object detection section
* make style
* added the task guide to the TOC
* minor polishing
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* added embedded owlvit demo
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* minor fix
* make style
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* config and tokenization(fast too) changed and ErnieEncoder added
* Slow Tokenization Added
* Tokenizer(slow) is now working and Fast Tokenizer removed
* Added Config code
* Added Base Model and utils
* ErnieMModel is now working
* All added except tests
* All tests passed except ErnieUIEM
* All tests passed
* all fixes done
* all fixes done
* fixed MAP
* fixed check_code_quality
* fixed Build PR Documentation issue
* Added changes(comments) and also updated to the latest upstream/main
* Added fixup
* Added # Copied comments
* Added fixup
* Added more comments and some nits
* Added fixup
* Fixed README_hd.md
* Added more fixes
* ErnieMTokenizer (being sentencepiece) protected and other docs edited
* Added code_quality fix
* Fixed for
* Added more fix
* modified AZ
* ernie-m tokenization test added!
* attention mask part fixed(with 0->self.config.pad_token_id)
* applied make fixup
* add: task guide on image cpationing.
* Empty commit to trigger CI
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* address additional comments from the PR.
* fix: wording.
* Update docs/source/en/tasks/image_captioning.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add X-MOD to Readme
* Add documentation for X-MOD
* Implement X-MOD
* Fix formatting of X-MOD docs
* Change signature of X-MOD forward methods to use lang_ids
* Minor changes
* Rebase with main and run make fix-copies
* Make suggested changes to docstrings
* Improve code readability
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Fix code style
* Conversion script: Remove asserts and type annotations
* Remove _TOKENIZER_FOR_DOC
* XMOD -> Xmod
* Update copyright note
* Fix doctests
* Fix docstring
* Add integration test for FillMaskPipeline
* Revert "Add integration test for FillMaskPipeline"
This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.
* Add end-to-end integration test for mask fill
* make style
* Rebase with main and make fix-copies
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* initial commit. added tip placeholders and a script
* removed unused imports, fixed paths
* fixed generated links
* make style
* split language modeling doc into two: causal language modeling and masked language modeling
* added check_task_guides.py to make fix-copies
* review feedback addressed
* Fixed the following:
pipe -> pipeline
out in pipe(data()) is a list of dict, not a dict
* Fixed the TypeError: __init__() missing 1 required positional argument: 'key'
* Added a tip: code sample requires additional libraries to run
* Fixed custom config's name
* added seqeval to the required libraries
* fixed a missing dependency,
fixed metric naming,
added checkpoint to fix the datacollator
* added checkpoint to fix the datacollator,
added missing dependency
* wip: adding tf example for semantic segmentation guide
* completed the working example in tf
* make style
* Update docs/source/en/tasks/semantic_segmentation.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/semantic_segmentation.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fixed a callback doc links
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Added TF example for image classification
* Code style polishing
* code style polishing
* minor polishing
* fixed a link in a tip, and a typo in the inference TF content
* Apply Amy's suggestions from review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/tasks/image_classification.mdx
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* review feedback addressed
* make style
* added PushToHubCallback with save_strategy="no"
* minor polishing
* added PushToHubCallback with save_strategy=no
* minor polishing
* Update docs/source/en/tasks/image_classification.mdx
* added data augmentation
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make style
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'
* docs: Resolve many typos in the English docs
Typos found via 'codespell ./docs/source/en'
* Update TF fine-tuning docs
* Fix formatting
* Add some section headers so the right sidebar works better
* Squiggly it
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/training.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Explain things in the text, not the comments
* Make the two dataset creation methods into a list
* Move the advice about collation out of a <Tip>
* Edits for clarity
* Edits for clarity
* Edits for clarity
* Replace `to_tf_dataset` with `prepare_tf_dataset` in the fine-tuning pages
* Restructure the page a little bit
* Restructure the page a little bit
* Restructure the page a little bit
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>