* add minimal working gpt2 tokenizer
* graph mode and output equivalence tests working
* not today tensorflow. serialization test passing!
* fix style, documentation, docstrings and all that jazz
* passing consistency checks
* move keras nlp to tf dependencies
* fix tf modeling utils and gpt2 attention to enable compiling
* fix (I hope) keras nlp dependencies
* rever changes on generation
* remove debug prints
* remove redundant tf dummy objects
* add from config, get config and max length settings to address review
* let flake ignore the error on distillation you are welcome
* test from config
* add padding test
* address sgugger review
* Add DiNAT
* Adds DiNAT + tests
* Minor fixes
* Added HF model
* Add natten to dependencies.
* Cleanup
* Minor fixup
* Reformat
* Optional NATTEN import.
* Reformat & add doc to _toctree
* Reformat (finally)
* Dummy objects for DiNAT
* Add NAT + minor changes
Adds NAT as its own independent model + docs, tests
Adds NATTEN to ext deps to ensure ci picks it up.
* Remove natten from `all` and `dev-torch` deps, add manual pip install to ci tests
* Minor fixes.
* Fix READMEs.
* Requested changes to docs + minor fixes.
* Requested changes.
* Add NAT/DiNAT tests to layoutlm_job
* Correction to Dinat doc.
* Requested changes.
* Try PT1.13 by removing torch scatter
* Skip failing tests
* Style
* Remvoe testing extras for repo utils
* Try with all decorators
* Try to wipe the cache
* Fix all tests?
* Try this way
* Fix comma
* Update to main
* Try with less deps
* Quality
* Change the import of kenlm from github to pypi
* Change the import of kenlm from github to pypi in circleci config
* Fix code quality issues
* Fix isort issue, add kenlm in extras for audio
* Add kenlm to deps
* Add kenlm to deps
* Commit 'make fixup' changes
* Remove version from kenlm deps
* commit make fixup changes
* Remove manual installation of kenlm
* Remove manual installation of kenlm
* Remove manual installation of kenlm
* add sudachipy and jumanpp tokenizers for bert_japanese
* use ImportError instead of ModuleNotFoundError in SudachiTokenizer and JumanppTokenizer
* put test cases of test_tokenization_bert_japanese in one line
* add require_sudachi and require_jumanpp decorator for testing
* add sudachi and pyknp(jumanpp) to dependencies
* remove sudachi_dict_small and sudachi_dict_full from dependencies
* empty commit for ci
* Poc to use safetensors
* Typo
* Final version
* Add tests
* Save with the right name!
* Update tests/test_modeling_common.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Support for sharded checkpoints
* Test from Hub part 1
* Test from hub part 2
* Fix regular checkpoint sharding
* Bump for fixes
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Tests conditional run
* Syntax
* Deps
* Try early exit
* Another way
* Test with no tests to run
* Test all
* Typo
* Try this way
* With tests to run
* Mostly finished
* Typo
* With a modification in one file only
* No change, no tests
* Final cleanup
* Address review comments
* First draft
* More improvements
* Improve model, add custom CUDA code
* Import torch before
* Add script that imports custom layer
* Add everything in new ops directory
* Import custom layer in modeling file
* Fix ARCHIVE_MAP typo
* Creating the custom kernel on the fly.
* Import custom layer in modeling file
* More improvements
* Fix CUDA loading
* More improvements
* Improve conversion script
* Improve conversion script
* Make it work until encoder_outputs
* Make forward pass work
* More improvements
* Make logits match original implementation
* Make implementation also support single_scale model
* Add support for single_scale and dilation checkpoint
* Add support for with_box_refine model
* Support also two stage model
* Improve tests
* Fix more tests
* Make more tests pass
* Upload all models to the hub
* Clean up some code
* Improve decoder outputs
* Rename intermediate hidden states and reference points
* Improve model outputs
* Move tests to dedicated folder
* Improve model outputs
* Fix retain_grad test
* Improve docs
* Clean up and make test_initialization pass
* Improve variable names
* Add copied from statements
* Improve docs
* Fix style
* Improve docs
* Improve docs, move tests to model folder
* Fix rebase
* Remove DetrForSegmentation from auto mapping
* Apply suggestions from code review
* Improve variable names and docstrings
* Apply some more suggestions from code review
* Apply suggestion from code review
* better docs and variables names
* hint to num_queries and two_stage confusion
* remove asserts and code refactor
* add exception if two_stage is True and with_box_refine is False
* use f-strings
* Improve docs and variable names
* Fix code quality
* Fix rebase
* Add require_torch_gpu decorator
* Add pip install ninja to CI jobs
* Apply suggestion of @sgugger
* Remove DeformableDetrForObjectDetection from auto mapping
* Remove DeformableDetrModel from auto mapping
* Add model to toctree
* Add model back to mappings, skip model in pipeline tests
* Apply @sgugger's suggestion
* Fix imports in the init
* Fix copies
* Add CPU implementation
* Comment out GPU function
* Undo previous change
* Apply more suggestions
* Remove require_torch_gpu annotator
* Fix quality
* Add logger.info
* Fix logger
* Fix variable names
* Fix initializaztion
* Add missing initialization
* Update checkpoint name
* Add model to doc tests
* Add CPU/GPU equivalence test
* Add Deformable DETR to pipeline tests
* Skip model for object detection pipeline
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Add a TF in-graph tokenizer for BERT
* Add from_pretrained
* Add proper truncation, option handling to match other tokenizers
* Add proper imports and guards
* Add test, fix all the bugs exposed by said test
* Fix truncation of paired texts in graph mode, more test updates
* Small fixes, add a (very careful) test for savedmodel
* Add tensorflow-text dependency, make fixup
* Update documentation
* Update documentation
* make fixup
* Slight changes to tests
* Add some docstring examples
* Update tests
* Update tests and add proper lowercasing/normalization
* make fixup
* Add docstring for padding!
* Mark slow tests
* make fixup
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* make fixup
* Properly handle tensorflow-text dummies
* Migrate HFDeepSpeedConfig from trfrs to accelerate
* add `accelerate` to testing dep
* addressing comments
* addressing comments
Using `_shared_state` and avoiding object creation. This is necessary as `notebook_launcher` in `launcers.py` checks `len(AcceleratorState._shared_state)>0` to throw an error.
* resolving comments
1. Use simple API from accelerate to manage the deepspeed config integration
2. Update the related documentation
* reverting changes and addressing comments
* docstring correction
* addressing nits
* addressing nits
* addressing nits 3
* bumping up the accelerate version to 0.10.0
* resolving import
* update setup.py to include deepspeed dependencies
* Update dependency_versions_table.py
* fixing imports
* reverting changes to CI dependencies for "run_tests_pipelines_tf*" tests
These changes didn't help with resolving the failures and I believe this needs to be addressed in another PR.
* removing `accelerate` as hard dependency
Resolves issues related to CI Tests
* adding `accelerate` as dependency for building docs
resolves failure in Build PR Documentation test
* adding `accelerate` as dependency in "dev" to resolve doc build issue
* resolving comments
1. adding `accelerate` to extras["all"]
2. Including check for accelerate too before import HFDeepSpeedConfig from there
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* resolving comments
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial work
* More or less finished with first draft
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix randomly initialized weights
* Update src/transformers/modeling_utils.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Address review comments
* Rename DeepSpeed folder to temporarily fix the test issue?
* Revert to try if Accelerate fix works
* Use latest Accelerate release
* Quality and fixes
* Style
* Quality
* Add doc
* Test + fix
* More blocks
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
- Adds auto_batch_size finder
- Moves training loop to an inner training loop
* [trainer / deepspeed] fix hyperparameter_search
* require optuna
* style
* oops
* add dep in the right place
* create deepspeed-testing dep group
* Trigger CI
* Updates the default branch from master to main
* Links from `master` to `main`
* Typo
* Update examples/flax/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add PT + TF automatic builds
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Wrap up
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Very big changes concerning the tokenizer fast of CLIP which did not correspond to the tokenizer slow of CLIP
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [deepspeed] saving checkpoint fallback when fp16 weights aren't saved
* Bump required deepspeed version to match usage when saving checkpoints
* update version
Co-authored-by: Mihai Balint <balint.mihai@gmail.com>
* Enabling `tokenizers` upgrade.
* Moved ugly comment.
* Tokenizers==0.11.1 needs an update to keep borrow checker
happy in highly contiguous calls.
* Support both 0.11.1 and 0.11.0
* up
* up
* up
* make it cleaner
* correct
* make styhahalal
* add more tests
* finish
* small fix
* make style
* up
* tryout to solve cicrle ci
* up
* fix more tests
* fix more tests
* apply sylvains suggestions
* fix import
* correct docs
* add pyctcdecode only to speech tests
* fix more tests
* add tf, flax and pt tests
* add pt
* fix last tests
* fix more tests
* Apply suggestions from code review
* change lines
* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* correct tests
* correct tests
* add doc string
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* [deepspeed] zero inference
* only z3 makes sense for inference
* fix and style
* docs
* rework
* fix test
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* responding to suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add sigopt hpo to transformers.
Signed-off-by: Ding, Ke <ke.ding@intel.com>
* extend sigopt changes to test code and others..
Signed-off-by: Ding, Ke <ke.ding@intel.com>
* Style.
* fix style for sigopt integration.
Signed-off-by: Ding, Ke <ke.ding@intel.com>
* Add necessary information to run unittests on SigOpt.
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct outdated function signatures on website.
* Upgrade sphinx to 3.5.4 (latest 3.x)
* Test
* Test
* Test
* Test
* Test
* Test
* Revert unnecessary changes.
* Change sphinx version to 3.5.4"
* Test python 3.7.11
* Create py.typed
This creates a [py.typed as per PEP 561](https://www.python.org/dev/peps/pep-0561/#packaging-type-information) that should be distributed to mark that the package includes (inline) type annotations.
* Update setup.py
Include py.typed as package data
* Update setup.py
Call `setup(...)` with `zip_safe=False`.
* Base test
* More test
* Fix mistake
* Add a docstring change
* Add doc ignore
* Add changes
* Add recursive dep search
* Add recursive dep search
* save
* Finalize test mapping
* Fix bug
* Print prettier
* Ignore comments and empty lines
* Make script runnable from anywhere
* Need dev install
* Like that
* Adapt
* Add as artifact
* Try on torch tests
* Fix yaml error
* Install GitPython
* Apply everywhere
* Be more defensive
* Revert to all tests if something is wrong
* Install GitPython
* Test if there are tests before launching.
* Fixes
* Fixes
* Fixes
* Fixes
* Bash syntax is horrible
* Be less stupid
* Try differently
* Typo
* Typo
* Typo
* Style
* Better name
* Escape quotes
* Ignore black unhelpful re-formatting
* Not a docstring
* Deal with inits in dependency map
* Run all tests once PR is merged.
* Add last job
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Stronger dependencies gather
* Ignore empty lines too!
* Clean up
* Fix quality
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Clean push to hub API
* Create working dir if it does not exist
* Different tweak
* New API + all models + test Flax
* Adds the Trainer clean up
* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* (nit) output types
* No need to set clone_from when folder exists
* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Add generated_from_trainer tag
* Update to new version
* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* copy pytorch-t5
* init
* boom boom
* forward pass same
* make generation work
* add more tests
* make test work
* finish normal tests
* make fix-copies
* finish quality
* correct slow example
* correct slow test
* version table
* upload models
* Update tests/test_modeling_flax_t5.py
* correct incorrectly deleted line
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Added logic to return attention from flax-bert model and added test cases to check that
* Added new line at the end of file to test_modeling_flax_common.py
* fixing code style
* Fixing Roberta and Elextra models too from cpoying bert
* Added temporary hack to not run test_attention_outputs for FlaxGPT2
* Returning attention weights from GPT2 and changed the tests accordingly.
* last fixes
* bump flax dependency
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* prep for deepspeed==0.3.16
* new version
* too soon
* support and test fp32 mode
* troubleshooting doc start
* workaround no longer needed
* add fp32 doc
* style
* cleanup, add tf32 note
* clarify
* release was made
* Add a special tokenizer for CPM model
* make style
* fix
* Add docs
* styles
* cpm doc
* fix ci
* fix the overview
* add test
* make style
* typo
* Custom tokenizer flag
* Add REAMDE.md
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* make fairscale and deepspeed setup extras
* fix default
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* no reason not to ask for the good version
* update the CIs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Squash all commits into one
* Update ViTFeatureExtractor to use image_utils instead of torchvision
* Remove torchvision and add Pillow
* Small docs improvement
* Address most comments by @sgugger
* Fix tests
* Clean up conversion script
* Pooler first draft
* Fix quality
* Improve conversion script
* Make style and quality
* Make fix-copies
* Minor docs improvements
* Should use fix-copies instead of manual handling
* Revert "Should use fix-copies instead of manual handling"
This reverts commit fd4e591bce.
* Place ViT in alphabetical order
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* init
* first working test
* added todo for setup.py
* working test for single node multi node ddp and smd
* added tensorflow single node test
* added directory for pytorch and tensorflow due to different requirements.txt
* added directory for pytorch and tensorflow
* added comment for run_glue until it is available
* added output_dir to it
* smaller dataset to make test running faster
* adjust HP and script
* adjusted parameter for tensorflow
* refactored test scripts
* adjusted make file
* init
* first working test
* added todo for setup.py
* working test for single node multi node ddp and smd
* added tensorflow single node test
* added directory for pytorch and tensorflow due to different requirements.txt
* added directory for pytorch and tensorflow
* added comment for run_glue until it is available
* added output_dir to it
* smaller dataset to make test running faster
* adjust HP and script
* adjusted parameter for tensorflow
* refactored test scripts
* adjusted make file
* updated dlc container
* commented in all tests
* added both ecr images
* added new master branches
* debug
* added new datasets version
* init
* strange rebase bug
* removed changes
* changed min version for tests to work
* updated DLC
* added model parallel test
* removed test files
* removed test files
* tested with ned dlc
* added correct sagemaker sdk version
* adjust DLCs for official one
* reworked tests
* quality
* removed default profile added documentation to it
* added step in release for sagemaker tests
* reverted version for example script removed duplicated script and added install from master to requirements.txt
* removed mistaken .DS_Stores from mac
* fixed tests
* added Sylvains feedback
* make style
* added lysandre's feedback
* Apply black before checking copies
* Fix for class methods
* Deal with lonely brackets
* Remove debug and add forward changes
* Separate copies and fix test
* Add black as a test dependency
* Examples version update
* Refactor a bit
* All version updates
* Fixes
* README cleanup
* Post-release/patch
* Fixes
* More fixes
* Tests
* More fixes
* Moar fixes
* Make commands and update setup
* Replace spaces with weird tabs
* Fix test
* Style
* Tests run on Docker
Co-authored-by: Morgan <funtowiczmo@gmail.com>
* Comments from code review
* Reply to itself
* Dependencies
Co-authored-by: Morgan <funtowiczmo@gmail.com>
* Authorize last version of tokenizer
* Update version table
* Fix conversion of spm tokenizers and fix some hub links
* Bump tokenizers version to 0.10.1rc1
* Add script to check tokenizers conversion with XNLI
* Add some more mask_token lstrip support
* Must modify mask_token in slow tokenizers too
* Keep using the old method for Pegasus
* add missing import
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
* note on how to get to deps from shell
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix text
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Don't import libs to check they are available
* Don't import integrations at init
* Add importlib_metdata to deps
* Remove old vars references
* Avoid syntax error
* Adapt testing utils
* Try to appease torchhub
* Add dependency
* Remove more private variables
* Fix typo
* Another typo
* Refine the tf availability test