* chore: initial commit
Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.
* chore: porting the rest of the modules to tensorflow
did not change the documentation yet, yet to try the playground on the model
* Fix initilizations (#1)
* fix: code structure in few cases.
* fix: code structure to align tf models.
* fix: layer naming, bn layer still remains.
* chore: change default epsilon and momentum in bn.
* chore: styling nits.
* fix: cross-loading bn params.
* fix: regnet tf model, integration passing.
* add: tests for TF regnet.
* fix: code quality related issues.
* chore: added rest of the files.
* minor additions..
* fix: repo consistency.
* fix: regnet tf tests.
* chore: reorganize dummy_tf_objects for regnet.
* chore: remove checkpoint var.
* chore: remov unnecessary files.
* chore: run make style.
* Update docs/source/en/model_doc/regnet.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* chore: PR feedback I.
* fix: pt test. thanks to @ydshieh.
* New adaptive pooler (#3)
* feat: new adaptive pooler
Co-authored-by: @Rocketknight1
* chore: remove image_size argument.
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
* Empty-Commit
* chore: remove image_size comment.
* chore: remove playground_tf.py
* chore: minor changes related to spacing.
* chore: make style.
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
* chore: refactored __init__.
* chore: copied from -> taken from./g
* adaptive pool -> global avg pool, channel check.
* chore: move channel check to stem.
* pr comments - minor refactor and add regnets to doc tests.
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* minor fix in the xlayer.
* Empty-Commit
* chore: removed from_pt=True.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add a TF in-graph tokenizer for BERT
* Add from_pretrained
* Add proper truncation, option handling to match other tokenizers
* Add proper imports and guards
* Add test, fix all the bugs exposed by said test
* Fix truncation of paired texts in graph mode, more test updates
* Small fixes, add a (very careful) test for savedmodel
* Add tensorflow-text dependency, make fixup
* Update documentation
* Update documentation
* make fixup
* Slight changes to tests
* Add some docstring examples
* Update tests
* Update tests and add proper lowercasing/normalization
* make fixup
* Add docstring for padding!
* Mark slow tests
* make fixup
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* make fixup
* Properly handle tensorflow-text dummies
* Add CodeGen model
* Add missing key and switch order of super()
* Fix torch.ones init with uint8 instead of bool
* Address comments: copy statements and doc
* update tests
* remove old model parallel
* fix batch gen tests
* fix batch gen test
* update test_gpt2_sample_max_time
* fix codgen test and revert gpt2 test change
* Fix incorrect tie_word_embedding value, typo, URL
* Fix model order in README and styling
* Reorder model list alphabetically
* Set tie_word_embedding to False by default
* Apply suggestions from code review
* Better attn mask name & remove attn masked_bias
* add tokenizer for codegen
* quality
* doc tokenizer
* fix-copies
* add CodeGenTokenizer in converter
* make truncation optional
* add test for truncation
* add copyright
* fix-copies
* fix fast tokenizer decode
* Update src/transformers/models/codegen/tokenization_codegen.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* increase vocab_size in tests
Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* few fixes:
- hardcode tokenizer padding side
- remove unused args
* few fixes:
- added new attribute on TokenizerTesterMixin
- added new slow test
- remove unused arg on tokenizer class
* make style
* Update src/transformers/models/bloom/tokenization_bloom_fast.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* make quality
* apply changes
- remove new attribute
- redefine test on the class
* add comments
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* Add final_layer_norm to OPT model
* Add JAX and TF version
* Fix Keras name
* Woops
* Allow for non breaking change
* Apply suggestions from code review
* add tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* rename to check_pt_flax_outputs
* update check_pt_flax_outputs
* use 5e-5 for BigBird PT/Flax test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Prepare CI for v0.8.0
* pin hfh (revert before merge)
* Revert "pin hfh (revert before merge)"
This reverts commit a0103140e1.
* Test rc3
* Test latest rc
* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Fix docstrings and variable names
* Rename x to something better
* Improve messages
* Fix docstrings and add test for greyscale images
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Use torch.finfo(self.dtype).min
* for GPTNeoX
* for Albert
* For Splinter
* Update src/transformers/models/data2vec/modeling_data2vec_audio.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix -inf used in Bart-like models
* Fix a few remaining -inf
* more fix
* clean up
* For CLIP
* For FSMT
* clean up
* fix test
* Add dtype argument and use it for LayoutLMv3
* update FlaxLongT5Attention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add new bloom classes
* (feat) add bloom classification tests; make style
* style: change import in test
* add some typehints to bloom classes
* merge main into branch
* fix: input checking in bloom seq classification
* fix tests
* change model class tests
* fix few tests
- more tests should pass
- one test left
* make token classifier return hidden states
* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Initial commit
* Make some fixes
* Make PT model full forward pass
* Drop TF & Flax implementation, fix copies etc
* Add Flax model and update some corresponding stuff
* Drop some TF things
* Update config and flax local attn
* Add encoder_attention_type to config
* .
* Update docs
* Do some cleansing
* Fix some issues -> make style; add some docs
* Fix position_bias + mask addition + Update tests
* Fix repo consistency
* Fix model consistency by removing flax operation over attn_mask
* [WIP] Add PT TGlobal LongT5
* .
* [WIP] Add flax tglobal model
* [WIP] Update flax model to use the right attention type in the encoder
* Fix flax tglobal model forward pass
* Make the use of global_relative_attention_bias
* Add test suites for TGlobal model
* Fix minor bugs, clean code
* Fix pt-flax equivalence though not convinced with correctness
* Fix LocalAttn implementation to match the original impl. + update READMEs
* Few updates
* Update: [Flax] improve large model init and loading #16148
* Add ckpt conversion script accoring to #16853 + handle torch device placement
* Minor updates to conversion script.
* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
* gpu support + dtype fix
* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies
* Remove caching logic for local & tglobal attention
* Apply another batch of suggestions from code review
* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx
* Fix converting script + revert config file change
* Revert "Remove caching logic for local & tglobal attention"
This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
* Stash caching logic in Flax model
* Make side relative bias used always
* Drop caching logic in PT model
* Return side bias as it was
* Drop all remaining model parallel logic
* Remove clamp statements
* Move test files to the proper place
* Update docs with new version of hf-doc-builder
* Fix test imports
* Make some minor improvements
* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray
* Fix TGlobal for ONNX conversion + update docs
* fix _make_global_fixed_block_ids and masked neg value
* update flax model
* style and quality
* fix imports
* remove load_tf_weights_in_longt5 from init and fix copies
* add slow test for TGlobal model
* typo fix
* Drop obsolete is_parallelizable and one warning
* Update __init__ files to fix repo-consistency
* fix pipeline test
* Fix some device placements
* [wip]: Update tests -- need to generate summaries to update expected_summary
* Fix quality
* Update LongT5 model card
* Update (slow) summarization tests
* make style
* rename checkpoitns
* finish
* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
* Raise RepoNotFoundError in case of 401
* Include changes from revert-17646-skip_repo_not_found
* Add a comment
* 💄 Code quality
* 💚 Update `get_from_cache` test
* 💚 Code quality & skip failing test
* adding template
* update model
* model update
* update conf for debug model
* update conversion
* update conversion script
* update conversion script
* fix missing keys check
* add tests to test the tokenizer in the local machine
* Change variable name
* add tests on xnli dataset
* add more description
* add descriptions + clearer code
* clearer code
* adding new tests + skipping few tests because of env problems
* change comment
* add dtype on the configuration
* add test embeddings
* add hardcoded test
* fix dtype issue
* adding torch.float16 to config
* adding more metrics (min, max, mean)
* add sum
* now the test passes with almost equal
* add files for conversion - test passes on cpu gpu
* add final changes
* cleaning code
* add new args in the docstring
* fix one liner function
* remove macros
* remove forward attention
* clean up init funtion
* add comments on the issue
* rm scale mask softmax
* do make style
* fix dtype in init
* fixing for loop on att probs
* fix style with black
* fix style + doc error
* fix and debug CI errors (docs + style)
* some updates
- change new operations
- finally add scaled softmax
- added new args in the config
* make use cache working
* add changes
- save sharded models
- final changes on the modeling script
* add changes
- comment on alibi
- add TODO on seq length
* test commit
- added a text to test the commit
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* final changes
- attention mask change
- generation works on BS176b
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* changes - model + conversion
* move to correct dir
* put ,
* fex fixes
* fix tokenizer autodoc
* fix minor CI issues
* fix minor CI issues
* fix minor CI issues
* fix style issue
* fix minor import issues
* fix few issues
* remove def main on the test
* add require torch
* replace decorator with 'with'
* fix style
* change to bloom
* add quick fix tokenizer
* fix tokenizer file
* fix tokenizer
- merge tests
- small fixes
* fix import issue
* add bloom to readme
* fix consistency
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
fix comment issues on file headers
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix doc issue
* small fix - modeling test
* some changes
- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests
* remove useless division
* more tests should pass
* more tests should pass
* more tests should pass
* let's try this one
-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed
* refactor
- refactor code
- style changes
- add new threshold for test
* major changes
- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test
* modify readme
* small fixes
* small fix
- better threshold for a test
* remove old test file from fetcher
* fix small typo
* major change
- change BloomLMHead to BloomForCausalLM
* remove onnx config
* major changes
- refactor the code
- remove asserts
- change tol for test
* make style
* small change
* adding a slow test + commenting old ones for now
* make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
* fix duplicates
* cleaning comments on config
* clean a bit conversion file
* refacor a bit modeling file
* refactor tokenizer file
* fix tokenization test issue
* fix tokenization issue #2
* fix tokenization issue second try
* fix test issue
* make style + add suggestions
* change test fetcher
* try this one
- slow tests should pass
- finger crossed
* possible final changes
* make style
* try fix padding side issue
* fix side
* fix padding issue
* fix ko-readme
* fix config auto
* cleaning modeling file
* keep bloom in caps in ko
* update config docs
* remove pretraining_pp
* remove model parallel
* update config
- add correct config files
* fix duplicates
* fix fetcher
* fix refactor issue
- remove divide function
* try to remove alibi
* small fixes
- fix alibi
- remove seq length
- refactor a bit the code
* put correct values
- fix bos and eos token ids
* fix attention mask loop
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* small fixes:
- remove skip bias add
* small fixes
- fix typo in readme
- fix typos in config
* small changes
- remove a test
- add reconstruction test
- change config
* small changes
- change Scaled Softmax to BloomScaledSoftmax
* small fixes
- fix alibi dtype
* major changes
- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring
* fix readmes
* major changes
- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now
* refactor a bit
* refactor a bit
* put correct name on test
* change docstring
* small changes
- fix docstring modeling
- fix test tolerance
* fix small nit
- take dtype from tensors in the conversion script
* minor fix
- fix mdx issue
* minor fix
- change config docstring
* forward contrib credits from PR14084
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply modifications
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* resolve softmax upcast
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
* final changes modeling
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'
* merge commit
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply suggestions
Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix gradient checkpointing
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add slow but exact
* add accelerate compatibility
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
* forward contrib credits
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix torch device on tests
* make style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix nits
Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>
* remove final nits
* fix doc
- add more details on the doc
- add links to checkpoints
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
* put test torchscript to false
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: justheuristic <justheuristic@gmail.com>
* fix alibi
- create alibi only once
* add small doc
* make quality
* replace torch.nn
* remove token type emb
* fix fused op + output bias
* add fused op
- now can control fused operation from config
* remove fused op
* make quality
* small changes
- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests
* Update src/transformers/models/bloom/modeling_bloom.py
* fix slow
* make style
* add accelerate support
* add bloom to deepspeed tests
* minor changes
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* minor change
* slow tests pass
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* minor changes:
- change docstring
- add link to paper
Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
* feat: initial implementation of data2vec segmentation model in TF.
* chore: minor corrections to make the segmenter work.
* chore: removed unncessary files.
* chore: add tests and other modifications.
* fix: loss computation for segmentation.
* chore: remove unused variable.
* chore: formatting.
* added a dummy adaptive pooling layer.
* removed unnecessary file.
* potentially add identifiers to layer names.
* fix: layer naming.
* chore: removed unnecessary print.
* Skipping unneeded test
* chore: add logging to debug tolerance.
* fix: segmentation tests for tfdata2vecvision
* chore: make style.
* fix: layer names, assertion to be resolved.
* Bumping test tolerance a bit
* chore: bump the tol in PT test.
Co-authored-by: matt <rocketknight1@gmail.com>
* added cbs to notebooks, made copy-paste error fix in generation_utils
* initial push for mctc model
* mctc feature extractor done
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* passing attention, now struggling to figure out how attention masks make sense here
* works when excluding attention masks. ask later how one would integrate attention maskshere
* bizarre configuration error (model prefix comes first in config dict json and messes up the order)
* all passing but bizzarre config dict ordering issue when to_dict
* passing all major tests
* feature extraction, processor, tokenizer added & tests passing
* style & consistency & other logistical fixes
* copy paste fix
* model after feature extraction working
* commiting final feature extraction results; need to fix normalization
* feature extraction passing tests; probably should add tests on the specific flashlight-copied functions?
* delete print ; format code a bit
* fixing tests
* passing major tests
* fixing styles
* completed tokenization test with real example; not sure if these values are entirely correct.
* last test fixes from local
* reverting accidentally included custom setup configs
* remove load tf weights; fix config error
* testing couldnt import featureextractor
* fix docs
* fix docs
* resolving comments
* style fixes
* style fixes
* Update to MCTCConv1dSubSampler
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* relposemb fixes
* conv1d name issue; expecting config fail with paraentheses
* fix config issue
* fix config issue
* fix config issue
* change everything to MCTCT
* fixing naming change errors
* archive list
* copyrights and docs
* copyrights and docs
* copyrights and docs
* merge resolution
* move tests, fix to changed optionaldependency structure
* test directories changed
* fixing tests
* how to avoid tf tests?
* how to avoid tf tests?
* tests passing locally
* allow mctctprocessor imported any env
* allow mctctprocessor imported any env
* fixed second round of feedback, need to fix docs
* doc changes not being applied
* all fixed
* style fix
* feedback fixes
* fix copies and feature extraction style fix
* Update tests/models/visual_bert/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* copy paste huggingface:main visual bert
* added eof newline to visual bert; all tests are passing otherwise
* fix slow tests by adding attention mask
* change model id to speechbrain
* make fix-copies
* fix readme unwanted deletes
* fixing readmes, make fix-copies
* consistent M-CTC-T naming
* Update src/transformers/models/mctct/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* all fixed but variable naming
* adjust double quotes
* fixed variable names
* copyright and mr quilter
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct slow tests
* make fix-copies
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* m-ctc-t not mctct
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add method to call to_tf_dataset() with column inference
* Add test for dataset creation
* Add a default arg for data collator
* Fix test
* Fix call with non-dev version of datasets
* Test correct column removal too
* make fixup
* More tests to make sure we remove unwanted columns
* Fix test to avoid predicting on unbuilt models
* Fix test to avoid predicting on unbuilt models
* Fix test to remove unwanted head mask columns from inputs
* Stop pushing your debug breakpoints to the main repo of the $2bn company you work for
* Skip the test in convnext because no grouped conv support
* Drop bools from the dataset dict
* Make style
* Skip the training test for models whose input dicts don't give us labels
* Skip transformerXL in the test because it doesn't return a simple loss
* Skip TFTapas because of some odd NaN losses
* make style
* make fixup
* Add docstring
* fixup
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove breakpoint from tests
* Fix assert, add requires_backends
* Protect tokenizer import with if TYPE_CHECKING
* make fixup
* Add noqa, more fixup
* More rearranging for ~* aesthetics *~
* Adding defaults for shuffle and batch_size to match to_tf_dataset()
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add gated-silu to t5 architecture to support UL2
* Fix error message
* formatting
* formatting again
* refactor
* fix classnames in _init_weights
* remove is_gated
* add test
* fix test
* Try without the test?
* Add back the test.
* Improve error message.
Co-authored-by: Daniel Hesslow <daniel@lighton.ai>