* rename to check_pt_flax_outputs
* update check_pt_flax_outputs
* use 5e-5 for BigBird PT/Flax test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Prepare CI for v0.8.0
* pin hfh (revert before merge)
* Revert "pin hfh (revert before merge)"
This reverts commit a0103140e1.
* Test rc3
* Test latest rc
* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Fix docstrings and variable names
* Rename x to something better
* Improve messages
* Fix docstrings and add test for greyscale images
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* deduplication draft
* update style
* update style test
* dummy test main
* rename modules
* rename functions
* return extremes in deduplicate_clusters
* update style
* cast str for gzip
* update doc string
* time processing
* use dataset map to compute minhash
* fill value for short token
* remove da map method
* update style
* use share object to multiprocess
* update style
* use f-string and minor fix
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
* update style
* use module parameters
* change ds_dedup to ds_filter
* save ds_dedup
* mv test to script tests
* make jaccard threshold a parameter of deduplicate_dataset
* update style
* add doc strings
* update style
* add doc string for DuplicationIndex
* save files into data dir
* update readme
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
* make near deduplication optional
* move near deduplication in README
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* use f string
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
On line 180, `torch.tensor(-1.0, xxx)` gives the error "TypeError: 'float' object cannot be interpreted as an integer"
This is because the dtype here is `int64`. For `dtype=int64`, this needs to simply be `-1`.
This impacts the long-t5-tglogbal-x model. It does not impact the long-t5-local-x version which does not appear to call this line.
* Use torch.finfo(self.dtype).min
* for GPTNeoX
* for Albert
* For Splinter
* Update src/transformers/models/data2vec/modeling_data2vec_audio.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix -inf used in Bart-like models
* Fix a few remaining -inf
* more fix
* clean up
* For CLIP
* For FSMT
* clean up
* fix test
* Add dtype argument and use it for LayoutLMv3
* update FlaxLongT5Attention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py
* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.
* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.
[ pipeline_tutorial.mdx ] - Grammar changes.
* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.
* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.
[ training.mdx ] - Added portuguese translation for training tutorial.
* [ preprocessing.mdx ] - WIP
* Update _toctree.yml
* Adding Pré-processamento to _toctree.yml
* Update accelerate.mdx
* Nits and eliminate preprocessing file while it is ready
* [ index.mdx ] - Translated to Portuguese the index apresentation page.
* [ docs/source/pt ] - Updated _toctree.yml to match newest translations.
* Fix build_pr_documentation.yml
* Fix index nits
* nits in _toctree
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Fix eval to compute rouge correctly for rouge_score
* styling
* moving sentence tokenization to utils from run_eval
* saving ckpt in mlflow
* use existing format of args
* fix documentation
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
* Migrate HFDeepSpeedConfig from trfrs to accelerate
* add `accelerate` to testing dep
* addressing comments
* addressing comments
Using `_shared_state` and avoiding object creation. This is necessary as `notebook_launcher` in `launcers.py` checks `len(AcceleratorState._shared_state)>0` to throw an error.
* resolving comments
1. Use simple API from accelerate to manage the deepspeed config integration
2. Update the related documentation
* reverting changes and addressing comments
* docstring correction
* addressing nits
* addressing nits
* addressing nits 3
* bumping up the accelerate version to 0.10.0
* resolving import
* update setup.py to include deepspeed dependencies
* Update dependency_versions_table.py
* fixing imports
* reverting changes to CI dependencies for "run_tests_pipelines_tf*" tests
These changes didn't help with resolving the failures and I believe this needs to be addressed in another PR.
* removing `accelerate` as hard dependency
Resolves issues related to CI Tests
* adding `accelerate` as dependency for building docs
resolves failure in Build PR Documentation test
* adding `accelerate` as dependency in "dev" to resolve doc build issue
* resolving comments
1. adding `accelerate` to extras["all"]
2. Including check for accelerate too before import HFDeepSpeedConfig from there
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* resolving comments
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* rembert: fix python codeblock
* rembert: use correct google/rembert checkpoint name in documentation
* rembert: use correct google/rembert checkpoint name in TF documentation
* add new bloom classes
* (feat) add bloom classification tests; make style
* style: change import in test
* add some typehints to bloom classes
* merge main into branch
* fix: input checking in bloom seq classification
* fix tests
* change model class tests
* fix few tests
- more tests should pass
- one test left
* make token classifier return hidden states
* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Initial commit
* Make some fixes
* Make PT model full forward pass
* Drop TF & Flax implementation, fix copies etc
* Add Flax model and update some corresponding stuff
* Drop some TF things
* Update config and flax local attn
* Add encoder_attention_type to config
* .
* Update docs
* Do some cleansing
* Fix some issues -> make style; add some docs
* Fix position_bias + mask addition + Update tests
* Fix repo consistency
* Fix model consistency by removing flax operation over attn_mask
* [WIP] Add PT TGlobal LongT5
* .
* [WIP] Add flax tglobal model
* [WIP] Update flax model to use the right attention type in the encoder
* Fix flax tglobal model forward pass
* Make the use of global_relative_attention_bias
* Add test suites for TGlobal model
* Fix minor bugs, clean code
* Fix pt-flax equivalence though not convinced with correctness
* Fix LocalAttn implementation to match the original impl. + update READMEs
* Few updates
* Update: [Flax] improve large model init and loading #16148
* Add ckpt conversion script accoring to #16853 + handle torch device placement
* Minor updates to conversion script.
* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
* gpu support + dtype fix
* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies
* Remove caching logic for local & tglobal attention
* Apply another batch of suggestions from code review
* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx
* Fix converting script + revert config file change
* Revert "Remove caching logic for local & tglobal attention"
This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
* Stash caching logic in Flax model
* Make side relative bias used always
* Drop caching logic in PT model
* Return side bias as it was
* Drop all remaining model parallel logic
* Remove clamp statements
* Move test files to the proper place
* Update docs with new version of hf-doc-builder
* Fix test imports
* Make some minor improvements
* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray
* Fix TGlobal for ONNX conversion + update docs
* fix _make_global_fixed_block_ids and masked neg value
* update flax model
* style and quality
* fix imports
* remove load_tf_weights_in_longt5 from init and fix copies
* add slow test for TGlobal model
* typo fix
* Drop obsolete is_parallelizable and one warning
* Update __init__ files to fix repo-consistency
* fix pipeline test
* Fix some device placements
* [wip]: Update tests -- need to generate summaries to update expected_summary
* Fix quality
* Update LongT5 model card
* Update (slow) summarization tests
* make style
* rename checkpoitns
* finish
* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
* enable cpu distribution training using mpirun
*command like
* mpirun -n 2 python3 run_qa.py --no_cuda --xpu_backend ccl xxxx
*MASTER_ADDR and MASTER_PORT should be set as env
*export MASTER_ADDR=127.0.0.1
*export MASTER_PORT=29500
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* fix according to the review comment
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* use accelerate logic for cpu distribution training to set "RANK","LOCAL_RANK","WORLD_SIZE" environment
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* allow scope from trainer arg
* add ray_scope to training args
* escape double quotes
* make style && quality
* attempt to solve doc style issues
* splitting up URLs for style
* make fixup
* Update src/transformers/training_args.py
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* make style
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
I'm guessing that the intention was to have the `_no_split_modules` class attribute for `GPTNeoXPreTrainedModel` to be set to `["GPTNeoXLayer"]`, akin to how its set as `["GPTJBlock"]` for `GPTJPreTrainedModel`.
If this is incorrect, please feel free to just close the PR.
Thanks!