* copy pytorch-t5
* init
* boom boom
* forward pass same
* make generation work
* add more tests
* make test work
* finish normal tests
* make fix-copies
* finish quality
* correct slow example
* correct slow test
* version table
* upload models
* Update tests/test_modeling_flax_t5.py
* correct incorrectly deleted line
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [run_clm.py] restore caching
* style
* [t5 doc] make the example work out of the box
This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.
* Update docs/source/model_doc/t5.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* expand the other example
Co-authored-by: Suraj Patil <surajp815@gmail.com>
- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`
This silences all warnings.
* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Start working on FlaxBart
* Create modeling_flax_bart.py
* Write FlaxBartAttention
* Add FlaxBartEncoderLayer
* Add FlaxBartDecoderLayer and some typing
* Add helepr function for FlaxBart
* shift_tokens_right
* _make_causal_mask
* _expand_mask
* Add PositionalEmbedding and fix init_std naming
* Add FlaxBartPretrainedModel
* Add FlaxBartEncoder
* Add FlaxBartEncoder
* Add FlaxBartEncoder among modules to be imported
* YET WE CANNOT INITIALIZE THAT!! :(
* Make BartEncoder working
Change BartEncoder to instance of nn.Module so far
* Add FlaxBartDecoder
* Add FlaxBartModel
* TODO to make model run -> Prepapre model inputs
* Resolve padding
* Add FlaxBartModel
* Add FlaxBartModel into importable modules
* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
* make style; not properly working
* make style; make quality not pass due to some import I left
* Remove TODO for padding_idx in nn.Embed so far
* Add FlaxBartForConditionalGeneration
* Incorporate Flax model output classes, i.e. return_dict
* Add another models and incorporate use_cache arg
* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
* Incorporate use_cache arg from PyTorch implementation
* Add all necessary Flax output utils
* Add FlaxBartForCausalLM; not working yet'
* Add minor improvements; still lacks some functionality
* Update docs, src and tests
* Add support of FlaxBart to docs/source
* Fix some bugs in FlaxBart souce code
* Add some neccessary tests for FlaxBart models - jit_compilation not passing
* Fix tests and add test_head_masking
* Fix tests for @jax.jit computation
* Add test_head_masking
* Migrate FlaxBart tests from jax.numpy to numpy
* Remove FlaxBartForCausalLM
* Clean repo
* fix bart model weight structure
* Fix FlaxBartForSequenceClassification
Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.
* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
* Allow testing for FlaxBartForQA for pt_flax equivalence
* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
* remove past_key_values
* remove inputs_mebeds and make input_ids required
* add position ids
* re-write attention layer
* fix dataclass
* fix pos embeds and attention output
* fix pos embeds
* expose encode method
* expose decode method
* move docstring to top
* add cache for causal attn layer
* remove head masking for now
* s2s greedy search first pass
* boom boom
* fix typos
* fix greedy generate for bart
* use encoder, decoder layers instead of num_hidden_layers
* handle encoder_outputs
* cleanup
* simplify decoding
* more clean-up
* typos
* Change header + add {decoder_,}position_ids into 2 models
* add BartConfig
* fix existing tests
* add encode, decode methods
* Fix shift_tokens_right for JIT compilation + clarify one condition
* fix decode
* encoder => encode
* simplify generate
* add tests for encode and decode
* style
* add tests for cache
* fix equivalence tests
* sample generate now works with seq2seq
* generation tests
* initialize dense layers
* docstring and cleanup
* quality
* remove get/set input_embeddings
* address Patricks suggestions
* decode for every model, remove encoder_outputs from call
* update tests accordingly
* decode returns only decoder outputs and logits
* fix arguments
* doc encode, decode methods
* correct base_model_prefix
* fix test for seq classif model
* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* adding vit for flax
* added test for Flax-vit and some bug-fixes
* overrided methods where variable changes were necessary for flax_vit test
* added FlaxViTForImageClassification for test
* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* made changes suggested in PR
* Adding jax-vit models for autoimport
* swapping num_channels and height,width dimension
* fixing the docstring for torch-like inputs for VIT
* add model to main init
* add docs
* doc, fix-copies
* docstrings
* small test fixes
* fix docs
* fix docstr
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Improve docs of DeiT and ViT, add community notebook
* Add gitignore for test_samples
* Add notebook with Trainer
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Added Big Bird Fast Tokenizer initial file
* style fixes
* flake fixes
* Added big bird fast tokenizer to init files
* Added big bird fast to Auto tokenization
* fix styles
* minor quality fixes
* Added initial test code
* Fix SpmConverter when precompiled_charsmap doesn't exist
* fixed post processor
* minor style fix
* minor fix input names
* Actually fix identity normalization
* style
* Added token type ids to fast tokenizer
* style
* flake fix
* fix copies
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
* Add the ImageClassificationPipeline
* Code review
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Have `load_image` at the module level
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* add electra model to flax
* Remove Electra Next Sentence Prediction model added by mistake
* fix parameter sharing and loosen equality threshold
* fix styling issues
* add mistaken removen imports
* fix electra table
* Add FlaxElectra to automodels and fixe docs
* fix issues pointed out the PR
* fix flax electra to comply with latest changes
* remove stale class
* add copied from
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add flax roberta
* make style
* correct initialiazation
* modify model to save weights
* fix copied from
* fix copied from
* correct some more code
* add more roberta models
* Apply suggestions from code review
* merge from master
* finish
* finish docs
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Rebase with master
* Minor bug fix in docs
* Copy files from adding_luke_v2 and improve docs
* change the default value of use_entity_aware_attention to True
* remove word_hidden_states
* fix head models
* fix tests
* fix the conversion script
* add integration tests for the pretrained large model
* improve docstring
* Improve docs, make style
* fix _init_weights for pytorch 1.8
* improve docs
* fix tokenizer to construct entity sequence with [MASK] entity when entities=None
* Make fix-copies
* Make style & quality
* Bug fixes
* Add LukeTokenizer to init
* Address most comments by @patil-suraj and @LysandreJik
* rename _compute_extended_attention_mask to get_extended_attention_mask
* add comments to LukeSelfAttention
* fix the documentation of the tokenizer
* address comments by @patil-suraj, @LysandreJik, and @sgugger
* improve docs
* Make style, quality and fix-copies
* Improve docs
* fix docs
* add "entity_span_classification" task
* update example code for LukeForEntitySpanClassification
* improve docs
* improve docs
* improve the code example in luke.rst
* rename the classification layer in LukeForEntityClassification from typing to classifier
* add bias to the classifier in LukeForEntitySpanClassification
* update docs to use fine-tuned hub models in code examples of the head models
* update the example sentences
* Make style & quality
* Add require_torch to tokenizer tests
* Add require_torch to tokenizer tests
* Address comments by @sgugger and add community notebooks
* Make fix-copies
Co-authored-by: Ikuya Yamada <ikuya@ikuya.net>
* prep for deepspeed==0.3.16
* new version
* too soon
* support and test fp32 mode
* troubleshooting doc start
* workaround no longer needed
* add fp32 doc
* style
* cleanup, add tf32 note
* clarify
* release was made
* Adding `AutomaticSpeechRecognitionPipeline`.
- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.
Future work:
- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.
* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Adding docs + main import + type checking + LICENSE.
* Doc style !.
* Fixing TYPE_HINT.
* Specifying waveform shape in the docs.
* Adding asserts + specify in the documentation the shape of the input
np.ndarray.
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding require to tests + move the `feature_extractor` doc.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Initial support for upload to hub
* push -> upload
* Fixes + examples
* Fix torchhub test
* Torchhub test I hate you
* push_model_to_hub -> push_to_hub
* Apply mixin to other pretrained models
* Remove ABC inheritance
* Add tests
* Typo
* Run tests
* Install git-lfs
* Change approach
* Add push_to_hub to all
* Staging test suite
* Typo
* Maybe like this?
* More deps
* Cache
* Adapt name
* Quality
* MOAR tests
* Put it in testing_utils
* Docs + torchhub last hope
* Styling
* Wrong method
* Typos
* Update src/transformers/file_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address review comments
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Base move
* Examples reorganization
* Update references
* Put back test data
* Move conftest
* More fixes
* Move test data to test fixtures
* Update path
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Start writing BERT-Japanese doc
* Fix typo, Update toctree
* Modify model file to use comment for document, Add examples
* Clean bert_japanese by make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Split a big code block into two
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add prefix >>> to all lines in code blocks
* Clean bert_japanese by make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft of deit
* More improvements
* Remove DeiTTokenizerFast from init
* Conversion script works
* Add DeiT to ViT conversion script
* Add tests, add head model, add support for deit in vit conversion script
* Update model checkpoint names
* Update image_mean and image_std, set resample to bicubic
* Improve docs
* Docs improvements
* Add DeiTForImageClassificationWithTeacher to init
* Address comments by @sgugger
* Improve feature extractors
* Make fix-copies
* Minor fixes
* Address comments by @patil-suraj
* All models uploaded
* Fix tests
* Remove labels argument from DeiTForImageClassificationWithTeacher
* Fix-copies, style and quality
* Fix tests
* Fix typo
* Multiple docs improvements
* More docs fixes
* Added documentation for data collator.
* Update docs/source/data_collator.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added documentation for data collator.
* Added documentation for the data collator.
* Merge branch 'doc_DataCollator' of C:\Users\mahii\PycharmProjects\transformers with conflicts.
* Update documentation for the data collator.
* Update documentation for the data collator.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Amna <A.A.Ahmad@student.tudelft.nl>
* Add a special tokenizer for CPM model
* make style
* fix
* Add docs
* styles
* cpm doc
* fix ci
* fix the overview
* add test
* make style
* typo
* Custom tokenizer flag
* Add REAMDE.md
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* make fairscale and deepspeed setup extras
* fix default
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* no reason not to ask for the good version
* update the CIs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* AutoFeatureExtractor
* Init and first tests
* Tests
* Damn you gitignore
* Quality
* Defensive test for when not all backends are here
* Use pattern for Speech2Text models
* Documentation about loading a fast tokenizer within Transformers
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Squash all commits into one
* Update ViTFeatureExtractor to use image_utils instead of torchvision
* Remove torchvision and add Pillow
* Small docs improvement
* Address most comments by @sgugger
* Fix tests
* Clean up conversion script
* Pooler first draft
* Fix quality
* Improve conversion script
* Make style and quality
* Make fix-copies
* Minor docs improvements
* Should use fix-copies instead of manual handling
* Revert "Should use fix-copies instead of manual handling"
This reverts commit fd4e591bce.
* Place ViT in alphabetical order
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added embeddings layer
* Added layoutlm layers, main model, maskedlm and token classification classes
* Added model classes to tf auto models
* Added model to PT to TF conversion script
* Added model to doc README
* Added tests
* Removed unused imports
* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
* Made tests pass!
* Fixed typos in imports and docs
* Fixed a typo in embeddings layer
* Removed imports
* Fixed formatting issues, imports, tests
* Added layoutlm layers, main model, maskedlm and token classification classes
* Added model classes to tf auto models
* Added model to PT to TF conversion script
* Removed unused imports
* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
* Made tests pass!
* Fixed typos in imports and docs
* Removed imports
* Fixed small formatting issues
* Removed duplicates import from main __init__.py
* Chnaged deafult arg to true for adding pooling layer to tf layoutlm
* Fixed formatting issues
* Style
* Added copied from to classes copied from bert
* Fixed doc strings examples to work with layoutlm inputs
* Removed PyTorch reference in doc strings example
* Added integration tests
* Cleaned up initialization file
* Updated model checkpoint identifiers
* Fixed imports
Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [doc] [testing] extend -k section
This PR adds more examples on using `pytest -k` - I always forget that I want to use `-k A OR B` when I want several tests - I keep trying AND and it doesn't match any.
* style
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* update
* make init_deepspeed support config dict
* fix docstring formatting
* clean up trainer's comments
* add new tests
* fix type
* composit argparse doesn't work
* style
* add a new test, rename others
* document new functionality
* complete tests, add docs
* style
* correct level
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add new methods to the doc
* must tell DS we are using a non-native optimizer
* add protection against cpu_offload + HF optimizer combo
* fix the cli overrides
* sync docs + tests
* restore AdamW
* better docs
* need new version
* no longer needed
* remove outdate information
* refactor duplicated code
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [WIP] Adding new parameter to `generate`: `max_time`.
Generation by tokens number is sometimes a bit clunky because we don't
know how many tokens are good enough or even how many tokens are in
the payload (for pipelines users for instance). This leads to hard
to understand behavior.
This PR proposes a new argument `max_time` which is a float of seconds
for the allowed time for `generate` to run on.
Ideally combinations of `max_tokens=None`, `max_time=2` could be used to
generate as many tokens as possible within time budget.
NB: Another possible approach consists of passing a callback to `generate`
putting the caller in charge of the actual decision of when to stop
generating tokens. It opens the door to 'which args should we pass'
to this callback. It's hard to imagine other use-cases for this
early stopping behavior than time (that are not already covered by
parameters of generate)
* Revamp with StoppingCriteria
* Removing deprecated mentions.
* Forgot arguments to stopping criteria.
* Readding max_length it's not just used as a stopping criteria.
* Default value for `stopping_criteria`.
* Address @patrickvonplaten comments.
- More docstrings
- Actual doc
- Include in global namespace
- Remove TF work.
* Put back `max_length` (deprecation different PR).
* Doc quality.
* Fixing old behavior without `stopping_criteria` but with `max_length`.
Making sure we don't break that in the future.
* Adding more tests for possible inconsistencies between
`max_length` and `stopping_criteria`.
* Fixing the torch imports.
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
* create modeling_tf_rag
* add tests for tf
* add tf tests
* revert wrong pt commit
* further refactor
* further refactor
* refactor
* Update modeling_tf_rag.py
- input_processing
- fix prepare_input_for_generation (mostly fix generate bug)
- bring back from_pretrained hack in order to test generate
* delete colab pieces of code
* Show case of greedy "generate"
Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.
* cosmetic update
* correct typos
* update
* push some progress
* make easy check
* fix rag save from pretrained
* Update src/transformers/modeling_tf_utils.py
* remove commented out lines
* delete unnecessary lines
* add simple test case for nq_checkpoint
Add nq_checkpoint test to show that current version without hack still fails
* temporarily put ugly hack back again
* Add TFRagSequenceForGeneration!!
* __init__.py , import TFRagSequenceForGeneration
* Add TFRagSequence tests!
* rag init.py - add TFRagSequenceForGeneration
* fix from_pretrained
* fix prepare_inputs_for_generation
* Beam search for RagToken!
* minor clean up
* add tf.cast in TFRagModel
* More tf.cast
* Add all remaining tests (still have issues)
* delete all T5 related
* make style
* fix load weight prefix
* fix bart
* fix return_dict for tf_rag
make all tests pass .. Hooray
* fix some tests
* fix code quality
* fix qualtiy check
* finish tests tf rag
* add tf rag to docs
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Delete outdated comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* improve doc strings
* add generative model classes
* fix adjust token logic
* refactor generate for TFRag
* using shape_list, not _get_shape
Co-authored-by: Julien Plu <plu.julien@gmail.com>
* axis=[1]->axis=1
* delete NEED_HELP comment
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Indicating model is in a developing state in docstrings
As suggested by Julien
* small last changes
* apply sylvains suggestions
* finish tf rag
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>