* Added Deberta model type for 'add_prefix_space' functionality
* housekeeping
---------
Co-authored-by: Filippos Ventirozos <filippos.ventirozos@autotrader.co.uk>
* Trainer - deprecate tokenizer for processing_class
* Extend chage across Seq2Seq trainer and docs
* Add tests
* Update to FutureWarning and add deprecation version
* fix redundant checkpointing in example scripts
* Update examples/pytorch/image-classification/run_image_classification_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/translation/run_translation_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/token-classification/run_ner_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/text-classification/run_glue_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/summarization/run_summarization_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/language-modeling/run_mlm_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/language-modeling/run_fim_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/language-modeling/run_clm_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/image-pretraining/run_mim_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/multiple-choice/run_swag_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/question-answering/run_qa_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/object-detection/run_object_detection_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* Remove deprecated logic and warnings
* Add back some code that seems to be important...
* Let's just add all he nllb stuff back; removing it is a bit more involved
* Remove kwargs
* Remove more kwargs
* Update legacy Repository usage in `examples/pytorch/text-classification/run_glue_no_trainer.py`
Marked for deprecation here https://huggingface.co/docs/huggingface_hub/guides/upload#legacy-upload-files-with-git-lfs
* Fix import order
* Replace all example usage of deprecated Repository
* Fix remaining repo call and rename args variable
* Revert removing creation of gitignore files and don't change research examples
While using `run_clm.py`,[^1] I noticed that some files were being added
to my global cache, not the local cache. I set the `cache_dir` parameter
for the one call to `evaluate.load()`, which partially solved the
problem. I figured that while I was fixing the one script upstream, I
might as well fix the problem in all other example scripts that I could.
There are still some files being added to my global cache, but this
appears to be a bug in `evaluate` itself. This commit at least moves
some of the files into the local cache, which is better than before.
To create this PR, I made the following regex-based transformation:
`evaluate\.load\((.*?)\)` -> `evaluate\.load\($1,
cache_dir=model_args.cache_dir\)`. After using that, I manually fixed
all modified files with `ruff` serving as useful guidance. During the
process, I removed one existing usage of the `cache_dir` parameter in a
script that did not have a corresponding `--cache-dir` argument
declared.
[^1]: I specifically used `pytorch/language-modeling/run_clm.py` from
v4.34.1 of the library. For the original code, see the following URL:
acc394c4f5/examples/pytorch/language-modeling/run_clm.py.
* Fix TypeError: Object of type int64 is not JSON serializable
* Convert numpy.float64 and numpy.int64 to float and int for json serialization
* Black reformatted examples/pytorch/token-classification/run_ner_no_trainer.py
* * make style