* try to stylify using ruff
* might need to remove these changes?
* use ruf format andruff check
* use isinstance instead of type comparision
* use # fmt: skip
* use # fmt: skip
* nits
* soem styling changes
* update ci job
* nits isinstance
* more files update
* nits
* more nits
* small nits
* check and format
* revert wrong changes
* actually use formatter instead of checker
* nits
* well docbuilder is overwriting this commit
* revert notebook changes
* try to nuke docbuilder
* style
* fix feature exrtaction test
* remve `indent-width = 4`
* fixup
* more nits
* update the ruff version that we use
* style
* nuke docbuilder styling
* leve the print for detected changes
* nits
* Remove file I/O
Co-authored-by: charliermarsh
<charlie.r.marsh@gmail.com>
* style
* nits
* revert notebook changes
* Add # fmt skip when possible
* Add # fmt skip when possible
* Fix
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* NIts
* more fixes
* fix tapas
* Another way to skip
* Recommended way
* Fix two more fiels
* Remove asynch
Remove asynch
---------
Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>
* only dir not even init
* init
* tokenizer removed and reference of codegen added
* modeling file updated a lot remaining app_rotary_emb
* conversion script done
* conversion script fixed, a lot of factoring done and most tests pass
* added token_clf and extractive_QA_head
* integration tests pass
* flash attn tests pass!
* config done
* more docs in modeling file
* some style fix
* style and others
* doc test error fix
* more doc fix
* some attention fixes
* most fixes
* style and other fixes
* docs fix and config
* doc fix
* some comments
* conversion script updated
* conversion script updated
* Revert "conversion script updated"
This reverts commit e92378c54084ec0747041b113083d1746ecb6c7f.
* final comments
* add Phi to language_modeling.md
* edit phi.md file
* rebase and fix
* removed phi-1.5 example
* changed model_type from 'phi'->'mixformer-sequential'
* small change
* small change
* revert \small change
* changed mixformer-sequential->phi
* small change
* added phi-1.5 example instead of phi-1
* doc test might pass now
* rebase and small change
* added the dropout layer
* more fixes
* modified .md file
* very very small doc change
I'm adding accelerate as one of the libraries to install because otherwise when running the Trainer, the model errorr out with the error.
ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.20.1`: Please run `pip install transformers[torch]` or `pip install accelerate -U`
Further context:
1. I've tried this across different environments so I believe that the environment is not the issue.
2. I had the latest transformers library version running.
3. Typically even after install accelerate and import it, it wouldn't resolve the issue until I restart the notebook and try again.
* first raw commit
* still POC
* tentative convert script
* almost working speech encoder conversion scripts
* intermediate code for encoder/decoders
* add modeling code
* first version of speech encoder
* make style
* add new adapter layer architecture
* add adapter block
* add first tentative config
* add working speech encoder conversion
* base model convert works now
* make style
* remove unnecessary classes
* remove unecessary functions
* add modeling code speech encoder
* rework logics
* forward pass of sub components work
* add modeling codes
* some config modifs and modeling code modifs
* save WIP
* new edits
* same output speech encoder
* correct attention mask
* correct attention mask
* fix generation
* new generation logics
* erase comments
* make style
* fix typo
* add some descriptions
* new state
* clean imports
* add tests
* make style
* make beam search and num_return_sequences>1 works
* correct edge case issue
* correct SeamlessM4TConformerSamePadLayer copied from
* replace ACT2FN relu by nn.relu
* remove unecessary return variable
* move back a class
* change name conformer_attention_mask ->conv_attention_mask
* better nit code
* add some Copied from statements
* small nits
* small nit in dict.get
* rename t2u model -> conditionalgeneration
* ongoing refactoring of structure
* update models architecture
* remove SeamlessM4TMultiModal classes
* add tests
* adapt tests
* some non-working code for vocoder
* add seamlessM4T vocoder
* remove buggy line
* fix some hifigan related bugs
* remove hifigan specifc config
* change
* add WIP tokenization
* add seamlessM4T working tokenzier
* update tokenization
* add tentative feature extractor
* Update converting script
* update working FE
* refactor input_values -> input_features
* update FE
* changes in generation, tokenizer and modeling
* make style and add t2u_decoder_input_ids
* add intermediate outputs for ToSpeech models
* add vocoder to speech models
* update valueerror
* update FE with languages
* add vocoder convert
* update config docstrings and names
* update generation code and configuration
* remove todos and update config.pad_token_id to generation_config.pad_token_id
* move block vocoder
* remove unecessary code and uniformize tospeech code
* add feature extractor import
* make style and fix some copies from
* correct consistency + make fix-copies
* add processor code
* remove comments
* add fast tokenizer support
* correct pad_token_id in M4TModel
* correct config
* update tests and codes + make style
* make some suggested correstion - correct comments and change naming
* rename some attributes
* rename some attributes
* remove unecessary sequential
* remove option to use dur predictor
* nit
* refactor hifigan
* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config
* add tests
* change tgt_lang logic
* update generation ToSpeech
* add support import SeamlessM4TProcessor
* fix generate
* make tests
* update integration tests, add option to only return text and update tokenizer fast
* fix wrong function call
* update import and convert script
* update integration tests + update repo id
* correct paths and add first test
* update how new attention masks are computed
* update tests
* take first care of batching in vocoder code
* add batching with the vocoder
* add waveform lengths to model outputs
* make style
* add generate kwargs + forward kwargs of M4TModel
* add docstrings forward methods
* reformate docstrings
* add docstrings t2u model
* add another round of modeling docstrings + reformate speaker_id -> spkr_id
* make style
* fix check_repo
* make style
* add seamlessm4t to toctree
* correct check_config_attributes
* write config docstrings + some modifs
* make style
* add docstrings tokenizer
* add docstrings to processor, fe and tokenizers
* make style
* write first version of model docs
* fix FE + correct FE test
* fix tokenizer + add correct integration tests
* fix most tokenization tests
* make style
* correct most processor test
* add generation tests and fix num_return_sequences > 1
* correct integration tests -still one left
* make style
* correct position embedding
* change numbeams to 1
* refactor some modeling code and correct one test
* make style
* correct typo
* refactor intermediate fnn
* refactor feedforward conformer
* make style
* remove comments
* make style
* fix tokenizer tests
* make style
* correct processor tests
* make style
* correct S2TT integration
* Apply suggestions from Sanchit code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct typo
* replace torch.nn->nn + make style
* change Output naming (waveforms -> waveform) and ordering
* nit renaming and formating
* remove return None when not necessary
* refactor SeamlessM4TConformerFeedForward
* nit typo
* remove almost copied from comments
* add a copied from comment and remove an unecessary dropout
* remove inputs_embeds from speechencoder
* remove backward compatibiliy function
* reformate class docstrings for a few components
* remove unecessary methods
* split over 2 lines smthg hard to read
* make style
* replace two steps offset by one step as suggested
* nice typo
* move warnings
* remove useless lines from processor
* make generation non-standard test more robusts
* remove torch.inference_mode from tests
* split integration tests
* enrich md
* rename control_symbol_vocoder_offset->vocoder_offset
* clean convert file
* remove tgt_lang and src_lang from FE
* change generate docstring of ToText models
* update generate docstring of tospeech models
* unify how to deal withtext_decoder_input_ids
* add default spkr_id
* unify tgt_lang for t2u_model
* simplify tgt_lang verification
* remove a todo
* change config docstring
* make style
* simplify t2u_tgt_lang_id
* make style
* enrich/correct comments
* enrich .md
* correct typo in docstrings
* add torchaudio dependency
* update tokenizer
* make style and fix copies
* modify SeamlessM4TConverter with new tokenizer behaviour
* make style
* correct small typo docs
* fix import
* update docs and add requirement to tests
* add convert_fairseq2_to_hf in utils/not_doctested.txt
* update FE
* fix imports and make style
* remove torchaudio in FE test
* add seamless_m4t.md to utils/not_doctested.txt
* nits and change the way docstring dataset is loaded
* move checkpoints from ylacombe/ to facebook/ orga
* refactor warning/error to be in the 119 line width limit
* round overly precised floats
* add stereo audio behaviour
* refactor .md and make style
* enrich docs with more precised architecture description
* readd undocumented models
* make fix-copies
* apply some suggestions
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* correct bug from previous commit
* refactor a parameter allowing to clean the code + some small nits
* clean tokenizer
* make style and fix
* make style
* clean tokenizers arguments
* add precisions for some tests
* move docs from not_tested to slow
* modify tokenizer according to last comments
* add copied from statements in tests
* correct convert script
* correct parameter docstring style
* correct tokenization
* correct multi gpus
* make style
* clean modeling code
* make style
* add copied from statements
* add copied statements
* add support with ASR pipeline
* remove file added inadvertently
* fix docstrings seamlessM4TModel
* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown
* add seamlessm4t to assisted generation ignored models
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial commit
* add processor, add fuyu naming
* add draft processor
* fix processor
* remove dropout to fix loading of weights
* add image processing fixes from Pedro
* fix
* fix processor
* add basic processing fuyu test
* add documentation and TODO
* address comments, add tests, add doc
* replace assert with torch asserts
* add Mixins and fix tests
* clean imports
* add model tester, clean imports
* fix embedding test
* add updated tests from pre-release model
* Processor: return input_ids used for inference
* separate processing and model tests
* relax test tolerance for embeddings
* add test for logit comparison
* make sure fuyu image processor is imported in the init
* fix formattingh
* more formatting issues
* and more
* fixups
* remove some stuff
* nits
* update init
* remove the fuyu file
* Update integration test with release model
* Update conversion script.
The projection is not used, as confirmed by the authors.
* improve geenration
* Remove duplicate function
* Trickle down patches to model call
* processing fuyu updates
* remove things
* fix prepare_inputs_for_generation to fix generate()
* remove model_input
* update
* add generation tests
* nits
* draft leverage automodel and autoconfig
* nits
* fix dtype patch
* address comments, update READMEs and doc, include tests
* add working processing test, remove refs to subsequences
* add tests, remove Sequence classification
* processing
* update
* update the conversion script
* more processing cleanup
* safe import
* take out ModelTesterMixin for early release
* more cl;eanup
* more cleanup
* more cleanup
* and more
* register a buffer
* nits
* add postprocessing of generate output
* nits
* updates
* add one working test
* fix test
* make fixup works
* fixup
* Arthur's updates
* nits
* update
* update
* fix processor
* update tests
* passe more fixups
* fix
* nits
* don't import torch
* skip fuyu config for now
* fixup done
* fixup
* update
* oups
* nits
* Use input embeddings
* no buffer
* update
* styling processing fuyu
* fix test
* update licence
* protect torch import
* fixup and update not doctested
* kwargs should be passed
* udpates
* update the impofixuprts in the test
* protect import
* protecting imports
* protect imports in type checking
* add testing decorators
* protect top level import structure
* fix typo
* fix check init
* move requires_backend to functions
* Imports
* Protect types
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
* llm prompting guide
* updated code examples
* an attempt to fix the code example tests
* set seed in examples
* added a doctest comment
* added einops to the doc_test_job
* string formatting
* string formatting, again
* added the toc to slow_documentation_tests.txt
* minor list fix
* string formatting + pipe renamed
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* replaced max_length with max_new_tokens and updated the outputs to match
* minor formatting fix
* removed einops from circleci config
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* removed einops and trust_remote_code parameter
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* fix typos in idefics.md
Two typos found in reviewing this documentation.
1) max_new_tokens=4, is not sufficient to generate "Vegetables" as indicated - you will get only "Veget". (incidentally - some mention of how to select this value might be useful as it seems to change in each example)
2) inputs = processor(prompts, return_tensors="pt").to(device) as inputs need to be on the same device (as they are in all other examples on the page)
* Update idefics.md
Change device to cuda explicitly to match other examples
* add Bros boilerplate
* copy and pasted modeling_bros.py from official Bros repo
* update copyright of bros files
* copy tokenization_bros.py from official repo and update import path
* copy tokenization_bros_fast.py from official repo and update import path
* copy configuration_bros.py from official repo and update import path
* remove trailing period in copyright line
* copy and paste bros/__init__.py from official repo
* save formatting
* remove unused unnecessary pe_type argument - using only crel type
* resolve import issue
* remove unused model classes
* remove unnecessary tests
* remove unused classes
* fix original code's bug - layer_module's argument order
* clean up modeling auto
* add bbox to prepare_config_and_inputs
* set temporary value to hidden_size (32 is too low because of the of the
Bros' positional embedding)
* remove decoder test, update create_and_check* input arguemnts
* add missing variable to model tests
* do make fixup
* update bros.mdx
* add boilerate plate for no_head inference test
* update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)
* add prepare_bros_batch_inputs function
* update modeling_common to add bbox inputs in Bros Model Test
* remove unnecessary model inference
* add test case
* add model_doc
* add test case for token_classification
* apply fixup
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* - update class name
* - add BrosSpadeOutput
- update BrosConfig arguments
* add boilerate plate for no_head inference test
* add prepare_bros_batch_inputs function
* add test case
* add test case for token_classification
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* apply masking on the fly
* add BrosSpadeForTokenLinking
* update class name
put docstring to the beginning of the file
* separate the logits calculation logic and loss calculation logic
* update logic for loss calculation so that logits shape doesn't change
when return
* update typo
* update prepare_config_and_inputs
* update dummy node initialization
* update last_hidden_states getting logic to consider when return_dict is False
* update box first token mask param
* bugfix: remove random attention mask generation
* update keys to ignore on load missing
* run make style and quality
* apply make style and quality of other codes
* update box_first_token_mask to bool type
* update index.md
* apply make style and quality
* apply make fix-copies
* pass check_repo
* update bros model doc
* docstring bugfix fix
* add checkpoint for doc, tokenizer for doc
* Update README.md
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update bros.md
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* apply suggestions from code review
* apply suggestions from code review
* revert test_processor_markuplm.py
* Update test_processor_markuplm.py
* apply suggestions from code review
* apply suggestions from code review
* apply suggestions from code review
* update BrosSpadeELForTokenClassification head name to entity linker
* add doc string for config params
* update class, var names to more explicit and apply suggestions from code review
* remove unnecessary keys to ignore
* update relation extractor to be initialized with config
* add bros processor
* apply make style and quality
* update bros.md
* remove bros tokenizer, add bros processor that wraps bert tokenizer
* revert change
* apply make fix-copies
* update processor code, update itc -> initial token, stc -> subsequent token
* add type hint
* remove unnecessary condition branches in embedding forward
* fix auto tokenizer fail
* update docstring for each classes
* update bbox input dimension as standard 2 points and convert them to 4
points in forward pass
* update bros docs
* apply suggestions from code review : update Bros -> BROS in bros.md
* 1. box prefix var -> bbox
2. update variable names to be more explicit
* replace einsum with torch matmul
* apply style and quality
* remove unused argument
* remove unused arguments
* update docstrings
* apply suggestions from code review: add BrosBboxEmbeddings, replace
einsum with classical matrix operations
* revert einsum update
* update bros processor
* apply suggestions from code review
* add conversion script for bros
* Apply suggestions from code review
* fix readme
* apply fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* intiial commit
* updates
* nits
* update conversion script
* update conversion script
* use path to load
* add tips etc
* some modeling logic
* modeling update
* more nits
* nits
* normal layer norm
* update config and doc
* nits
* update doc remove unused
* update
* fix inits and stuff
* fixup
* revert wrong changes
* updates
* more nits
* add default config values to the configuration file
* fixup happy
* update
* 2 tests left
* update readmes
* more nits
* slow test and more documentation
* update readme
* fix licences
* styling
* use fast if possible when saving tokenizer
* remove todo
* remove tokenization tests
* small last nits
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* nits to skip the timout doctest
* fix integration test
* fix test
* update eos token
* update to allow fast tokenization
* styling
* fix codeLlama as well for the update post processor
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more copied from statements
* update
* doc passes doctest
* remove `# final layer norm?`
* change docstring prompot
* update
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* don't doctest the conversion script as it requires more packages
* don't init a model in the config
* oups
* fix doctest
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add all
* Revert "Delete .github directory"
This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1.
* make conversion script backward compatible
* fixup
* more styling
* copy to llama changes
* fix repo consistency
* nits
* document correct classes
* updates
* more fixes
* nits
* update auto mappings
* add readmes
* smallupdates
* llama-code replace with llama_code
* make fixup
* updates to the testsing suite
* fix fast nits
* more small fixes
* fix decode
* fix template processing
* properly reset the normalizer
* nits processor
* tokenization tests pass
* styling
* last tests
* additional nits
* one test is left
* nits
Co-authored-by faabian <faabian@users.noreply.github.com>
* update failing test
* fixup
* remove decode infilling users should handle it on their onw after generation, padding can be a problem
* update
* make test slow and more meaningfull
* fixup
* doc update
* fixup
* Apply suggestions from code review
* add kwargs doc
* tokenizer requires `requires_backend`
* type requires_backends
* CodeLlama instead of LlamaCode
* more name cahnges
* nits
* make doctests happy
* small pipeline nits
* last nit
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update
* add codellama to toctree
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial addition of t5forsequenceclassification
* Adding imports and adding tests
* Formatting
* Running make fix-copies
* Adding mt5forseq
* Formatting
* run make fix-copies
* Adding to docs
* Add model_parallel
* Fix bug
* Fix
* Remove TODO
* Fixing tests for T5ForSequenceClassification
* Undo changes to dependency_versions_table.py
* Change classification head to work with T5Config directly
* Change seq length to let tests pass
* PR comments for formatting
* Formatting
* Initial addition of UMT5ForSequenceClassification
* Adding to inits and formatting
* run make fix-copies
* Add doc for UMT5ForSeqClass
* Update UMT5 config
* Fix docs
* Skip torch fx test for SequenceClassification
* Formatting
* Add skip to UMT5 tests as well
* Fix umt5 tests
* Running make fix-copies
* PR comments
* Fix for change to sentence_representation
* Rename seq_len to hidden_size since that's what it is
* Use base_model to follow format of the rest of the library
* Update docs
* Extract the decoder_input_ids changes and make one liner
* Make one-liner
* pull and push updates
* add docs
* fix modeling
* Add and run test
* make copies
* add task
* fix tests and fix small issues
* Checks on a Pull Request
* fix docs
* add desc pvt.md
* Initial commit
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Cleanup config docstring
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Convert to relative imports
* Remove torch < 1.8 warning
* Restructure cos_sin header
* qkv -> query, key, value
* Refactor attention calculation
* Add a couple of config variables to account for the different checkpoints
* Successful merging of the code paths!
* Fix misplaced line in the non-parallel attention path
* Update config and tests
* Add a pad_token_id when testing
* Support output_attentions when alibi is None
* make fixup
* Skip KV cache shape test
* No more _keys_to_ignore_on_load_missing
* Simplify self attention a bit
* Simplify self attention a bit
* make fixup
* stash commit
* Some more attention mask updates
* Should pass all tests except assisted generation!
* Add big model generation test
* make fixup
* Add temporary workaround for test
* Test overrides for assisted generation
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Test overrides for assisted generation
* Add generation demo
* Update copyright
* Make the docstring model actually small
* Add module-level docstring
* Remove all assertions
* Add copied from bloom
* Reformat the QKV layer
* Add copied from bloom
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove unused line and reformat
* No single letter variables
* Cleanup return names
* Add copied from line
* Remove the deprecated arguments blocks
* Change the embeddings test to an alibi on/off test
* Remove position_ids from FalconForQA
* Remove old check for token type IDs
* Fix the alibi path when multi_query is False
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update config naming
* Fix typo for new_decoder_architecture
* Add some comments
* Fix docstring
* Fix docstring
* Create range in the right dtype from the start
* Review comment cleanup
* n_head_kv -> num_kv_heads
* self.alibi -> self.use_alibi
* self.num_kv -> self.num_kv_heads
* Reorder config args
* Made alibi arguments Optional
* Add all model docstrings
* Add extra checkpoints
* Add author info for Falcon
* Stop removing token_type_ids because our checkpoints shouldn't return it anymore
* Add one hopeful comment for the future
* Fix typo
* Update tests, fix cache issue for generation
* Use -1e9 instead of -inf to avoid float overflow
* Recompute the rotary embeddings much less often
* Re-enable disabled tests
* One final fix to attention mask calculation, and update tests
* Cleanup targeting falcon-40b equivalency
* Post-rebase docs update
* Update docstrings, especially in the config
* More descriptive variable names, and comments where we can't rename them
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* import torch before it is used
* style
Signed-off-by: byhsu <byhsu@linkedin.com>
---------
Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
* First draft of RWKV-4
* Add support for generate
* Style post-rebase
* Properly use state
* Write doc
* Fix doc
* More math
* Add model to README, dummies and clean config
* Fix init
* multiple fixes:
- fix common tests
- fix configuraion default values
- add CI test for checking state computation
- fix some CI tests
* correct tokenizer
* some tweaks
- fix config docstring
- fix failing tests
* fix CI tests
- add output_attention / output_hidden_states
- override test_initialization
- fix failing CIs
* fix conversion script
- fix sharded case
- add new arguments
* add slow tests + more fixes on conversion script
* add another test
* final fixes
* change single name variable
* add mock attention mask for pipeline to work
* correct eos token id
* fix nits
* add checkpoints
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add `tie_word_embeddings` in docstring
* change tensor name
* fix final nits
* Trigger CI
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>