* Copy RoBERTa
* formatting
* implement RoBERTa with prelayer normalization
* update test expectations
* add documentation
* add convertion script for DinkyTrain weights
* update checkpoint repo
Unfortunately the original checkpoints assumes a hacked roberta model
* add to RoBERTa-PreLayerNorm docs to toc
* run utils/check_copies.py
* lint files
* remove unused import
* fix check_repo reporting wrongly a test is missing
* fix import error, caused by rebase
* run make fix-copies
* add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS
* Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup: Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add missing Flax header
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* expected_slice -> EXPECTED_SLICE
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update copies after rebase
* add missing copied from statements
* make fix-copies
* make prelayernorm explicit in code
* fix checkpoint path for the original implementation
* add flax integration tests
* improve docs
* update utils/documentation_tests.txt
* lint files
* Remove Copyright notice
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make fix-copies
* Remove EXPECTED_SLICE calculation comments
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* generate from config mvp
* fix failing tests
* max_time test
* Load default gen config at model load time; Update docs
* further documentation; add tests
* adapt rag to the new structure
* handle models not instantiated with from_pretained (like in tests)
* better default generation config
* add can_generate fn
* handle legacy use case of ad hoc model config changes
* initialize gen config from config in individual methods, if gen config is none
* fix _get_decoder_start_token_id when called outside GenerationMixin
* correct model config load order (set attr > model config > decoder config)
* update rag to match latest changes
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* load gen config from model config in model.from_pretrained
* fix can_generate fn
* handle generate calls without a previous from_pretrained (e.g. tests)
* add legacy behavior (and a warning)
* lower logger severity
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add templates for gpt-sw3
* Add templates for gpt-sw3
* Added sentencepiece tokenizer
* intermediate commit with many changes
* fixed conflicts
* Init commit for tokenization port
* Tokenization progress
* Remove fast tokenizer
* Clean up and rename spm.model -> spiece.model
* Remove TF -> PT conversion script template, Clean up Megatron -> PT script
* Optimize encode & decode performance
* added new attention
* added new attention
* attention for gpt-sw3 working
* attention good
* Cache is now working
* fixed attention mask so that it works with causal attention
* fixed badbmm bug for cpu and caching
* updated config with correct parameters
* Refactor and leave optimizations as separate functions to avoid breaking expected functionality
* Fix special tokens mapping for both tokenizers
* cleaning up of code and comments
* HF compatible attention outputs
* Tokenizer now passing tests, add documentation
* Update documentation
* reverted back to base implementation after checking that it is identical to pretrained model
* updated gpt-sw3 config
* updated conversion script
* aligned parameters with gpt-sw3 config
* changed default scale_attn_by_inverse_layer_idx to true
* removed flag from conversion script
* added temporary model path
* reverted back to functioning convert script
* small changes to default config
* updated tests for gpt-sw3
* make style, make quality, minor cleanup
* Change local paths to testing online repository
* Change name: GptSw3 -> GPTSw3
* Remove GPTSw3TokenizerFast references
* Use official model repository and add more model sizes
* Added reference to 6.7b model
* Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel
* Remove pointers to non-existing TFGPTSw3
* Add GPTSw3 to docs/_toctree.yml
* Remove TF artifacts from GPTSw3 in __init__ files
* Update README:s with 'make fix-copies'
* Add 20b model to archive list
* Add documentation for GPT-Sw3
* Fix typo in documentation for GPT-Sw3
* Do 'make fix-copies' again after having updated docs
* Fix some typos in docs
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Resolve comments from PR feedback
* Resolve more comments from PR feedback, also set use_cache=True in convert script
* Add '# Copied from' comments for GPTSw3 modeling
* Set 'is_parallelizable = False'
* Remove '# Copied from' where code was modified and add 'with x->y' when appropriate
* Remove parallelize in mdx
* make style, make quality
* Update GPTSw3Config default values and corresponding documentation
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available
* Make style, make quality
* Add dummy object for GPTSw3Tokenizer via 'make fix-copies'
* make fix-copies
* Remove GPTSw3 modeling classes
* make style, make quality
* Add GPTSw3 auto-mappings for other GPT2 heads
* Update docs/source/en/model_doc/gpt-sw3.mdx
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove old TODO-comment
* Add example usage to GPTSw3Tokenizer docstring
* make style, make quality
* Add implementation details and example usage to gpt-sw3.mdx
Co-authored-by: JoeyOhman <joeyoh@kth.se>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* read to load
* base functionality
* revert init
* fix dummy data
* moving right along
* moving right along
* finally
* cleanup
* pull out comment
* add test
* update docstring for main class
* flake comments and rewriting copies from make repo-consistency`
* remove irrelevant differences/accidental spaces
* put copies back after space removals
* mid
* final test pass
* stray comment
* update test file
* update test file
* fixup
* black
* missed
* black missed one more
* sytle
* add doc update
* fix order of output class
* comment
* Revert "comment"
This reverts commit 03f86b6948.
* remove redundant function, and redundant reshape
* move change out of common
* style
* put common spaces back
* reorder kwargs in output
* doc style
* [WIP] Rework the pipeline tutorial
- Switch to `asr` instead of another NLP task.
- It also has simpler to understand results.
- Added a section with interaction with `datasets`.
- Added a section with writing a simple webserver.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Addressing comments.
* Links.
* Fixing docs format.
* Adding pipeline_webserver to _toctree.
* Warnig -> Tip warnings={true}.
* Fix link ?
* Links ?
* Fixing link, adding chunk batching.
* Oops.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/pipeline_tutorial.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* biogpt initial commit
* updated init
* fix faster decoding with use_cache
* 1. fix input_ids and input_embeds with correct device
2. added _keys_to_ignore_on_load_missing
3. updated prepare_inputs_for_generation
* add activation_dropout and scale_embedding
* replace fsmt attention with bart attention
* added test
* run make fix-copies
* doc init and fix build
* updated README with proper information
* 1. added tips to docs
2. updated BioGptTokenizer func
* 1. added tokenizer test
2. refactor tokenizer
* make fixup
* add biogpt fairseq to hf converter
* updated layer names more
similar to original checkpoints
* config update doc string and set defaults
* added "#copied" from bart model and
updated doc strings
* enable model_input_names in tokenizer
* 1. positionalembedding depending on attention_mask
2. added attention mask to prepare for generation
* added test to verify past and generation
* BioGptLMHeadModel -> BioGptForCausalLM
* fix typo
* tokenization and test
Copyright and updated assertion
* updated Copyright and
one func at time in line
* Copyright updates and
minor doc fix
* replace assertion with ValueError
* rm extra space
* added code syntax
* revert cmnt position change
* add tokenizer to auto
* updated doc string
* tokenizer doc string update
* biogpt hub model update to microsoft/biogpt
* make fixup
* rm cmnt to fix flake8 5.0.4 vs 6 error
* add minimal working gpt2 tokenizer
* graph mode and output equivalence tests working
* not today tensorflow. serialization test passing!
* fix style, documentation, docstrings and all that jazz
* passing consistency checks
* move keras nlp to tf dependencies
* fix tf modeling utils and gpt2 attention to enable compiling
* fix (I hope) keras nlp dependencies
* rever changes on generation
* remove debug prints
* remove redundant tf dummy objects
* add from config, get config and max length settings to address review
* let flake ignore the error on distillation you are welcome
* test from config
* add padding test
* address sgugger review
* Add Donut image processor
* Update src/transformers/image_transforms.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Fix docstrings
* Full var names in docstring
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* First draft
* Fix backwards compatibility
* More fixes
* More fixes
* Make backbone more general
* Improve backbone
* Improve test
* Fix config checkpoint
* Address comments
* Use model_type
* Address more comments
* Fix special model names
* Remove MaskFormerSwinModel and MaskFormerSwinPreTrainedModel from main init
* Fix typo
* Update backbone
* Apply suggestion
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* First draft
* Make conversion script work
* Add id2label mapping, run code quality
* Fix copies
* Add first draft of feature extractor
* Update conversion script to use feature extractor
* Make more tests pass
* Add docs
* update input_features to input_values + pad by default to max length
* Fix doc tests
* Add feature extractor tests
* Add proper padding/truncation to feature extractor
* Add support for conversion of all audioset checkpoints
* Improve docs and extend conversion script
* Fix README
* Rename spectogram to spectrogram
* Fix copies
* Add integration test
* Remove dummy conv
* Update to ast
* Update organization
* Fix init
* Rename model to AST
* Add require_torchaudio annotator
* Move import of ASTFeatureExtractor under a is_speech_available
* Fix rebase
* Add pipeline config
* Update name of classifier head
* Rename time_dimension and frequency_dimension for clarity
* Remove print statement
* Fix pipeline test
* Fix pipeline test
* Fix index table
* Fix init
* Fix conversion script
* Rename to ForAudioClassification
* Fix index table
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>