* Update legacy Repository usage in `examples/pytorch/text-classification/run_glue_no_trainer.py`
Marked for deprecation here https://huggingface.co/docs/huggingface_hub/guides/upload#legacy-upload-files-with-git-lfs
* Fix import order
* Replace all example usage of deprecated Repository
* Fix remaining repo call and rename args variable
* Revert removing creation of gitignore files and don't change research examples
* add: initial script to train clm fim
* fix: if training model from scratch, new tokens will be added and embeddings resized
* fix: fixed attention_mask errors when generating FIM data
* fix: file formatted using black
* add: run_fim_no_trainer.py and fixed some comments in run_fim.py
* add: added fim examples to the README.md and ran code fixup
* fix: little bug in both fim training scripts
* fix: remove comment from notebook and added a note on fim related params
* fix: minor typo in README
* add: suggested minor changes to README and run_fim.py
* add: gradient_accumulation_steps and gradient_checkpointing args
* add: improved model embedding resizing
* add: pad_to_multiple_of and attn_implementation params
* add: requested minor changes
* add: deepspeed zero compatibility
* add: resize embeddings layer with zero3 support for fim model initialization
* change version
* nuke
* this doesn't make sense
* update some requirements.py
* revert + no main
* nits
* change cache number
* more pin
* revert
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add missing entries to the language selector
* Add links to the Colab and AWS Studio notebooks for ONNX
* Use anchor links in CONTRIBUTING.md
* Fix broken hyperlinks due to spaces
* Fix links to OpenAI research articles
* Remove confusing footnote symbols from author names, as they are also considered invalid markup
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Pin torch to <2.2.0
* Pin torchvision and torchaudio as well
* Playing around with versions to see if this helps
* twiddle something to restart the CI
* twiddle it back
* Try changing the natten version
* make fixup
* Revert "Try changing the natten version"
This reverts commit de0d6592c3.
* make fixup
* fix fix fix
* fix fix fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Port core files + ESM (because ESM code is odd)
* Search-replace in modelling code
* Fix up transfo_xl as well
* Fix other core files + tests (still need to add correct import to tests)
* Fix cookiecutter
* make fixup, fix imports in some more core files
* Auto-add imports to tests
* Cleanup, add imports to sagemaker tests
* Use correct exception for importing tf_keras
* Fixes in modeling_tf_utils
* make fixup
* Correct version parsing code
* Ensure the pipeline tests correctly revert to float32 after each test
* Ensure the pipeline tests correctly revert to float32 after each test
* More tf.keras -> keras
* Add dtype cast
* Better imports of tf_keras
* Add a cast for tf.assign, just in case
* Fix callback imports
* first commit
* correct default value non causal
* update config and modeling code
* update converting checkpoint
* clean modeling and fix tests
* make style
* add new config parameters to docstring
* fix copied from statements
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* make position_embeddings_type docstrings clearer
* clean converting script
* remove function not used
* clean modeling file
* apply suggestion for test file + add convert script to not_doctested
* modify tests according to review - cleaner logic and more tests
* Apply nit suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add checker of valid position embeddings type
* instantiate new layer norm layer with the right eps
* fix freeze_feature_encoder since it can be None in some cases
* add test same output in convert script
* restore wav2vec2conformer and add new model
* create processor and FE + clean
* add new model code
* fix convert script and set default config parameters
* correct model id paths
* make style
* make fix-copies and cleaning files
* fix copied from statements
* complete .md and fixe copies
* clean convert script argument defaults
* fix config parameters docstrings
* fix config docstring
* add copied from and enrich FE tests
* fix copied from and repo-consistency
* add autotokenizer
* make test input length shorter and change docstring code
* fix docstrings and copied from
* add add_adapter to ASR training example
* make testing of adapters more robust
* adapt to multi adapter layers
* refactor input_values->input_features and remove w2v2-bert feature extractor
* remove pretraining model
* remove depreciated features and useless lines
* add copied from and ignore statements to modeling tests
* remove pretraining model #2
* change import in convert script
* change default in convert script
* update readme and remove useless line
* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* refactor BERT to Bert for consistency
* remove useless ignore copy statement
* add persistent to buffer in rotary
* add eps in LayerNorm init and remove copied from
* add adapter activation parameters and add copied from statements
* Fix copied statements and add unitest.skip reasons
* add copied statement in test_processor
* refactor processor
* make style
* replace numpy random by torch rand
* remove expected output CTC
* improve converting script with processor class
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove gumbel class
* remove tests related to previously deleted class
* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct typos
* remove uused parameters
* update processor to takes both text and audio
* update checkpoints
* update expected output and add ctc expected output
* add label_attention_mask
* replace pt with np in processor tests
* fix typo
* revert to behaviour with labels_attention_mask
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove `task` arg in `load_dataset` in image-classification example
* Manage case where "train" is not in dataset
* Add new args to manage image and label column names
* Similar to audio-classification example
* Fix README
* Update tests
While using `run_clm.py`,[^1] I noticed that some files were being added
to my global cache, not the local cache. I set the `cache_dir` parameter
for the one call to `evaluate.load()`, which partially solved the
problem. I figured that while I was fixing the one script upstream, I
might as well fix the problem in all other example scripts that I could.
There are still some files being added to my global cache, but this
appears to be a bug in `evaluate` itself. This commit at least moves
some of the files into the local cache, which is better than before.
To create this PR, I made the following regex-based transformation:
`evaluate\.load\((.*?)\)` -> `evaluate\.load\($1,
cache_dir=model_args.cache_dir\)`. After using that, I manually fixed
all modified files with `ruff` serving as useful guidance. During the
process, I removed one existing usage of the `cache_dir` parameter in a
script that did not have a corresponding `--cache-dir` argument
declared.
[^1]: I specifically used `pytorch/language-modeling/run_clm.py` from
v4.34.1 of the library. For the original code, see the following URL:
acc394c4f5/examples/pytorch/language-modeling/run_clm.py.
* docs: replace torch.distributed.run by torchrun
`transformers` now officially support pytorch >= 1.10.
The entrypoint `torchrun`` is present from 1.10 onwards.
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
* Update src/transformers/trainer.py
with @ArthurZucker's suggestion
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Renamed variable extension to builder_name
* If builder name is jsonl change to json to align with load_datasets
* Apply suggestions from code review
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
---------
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* try to stylify using ruff
* might need to remove these changes?
* use ruf format andruff check
* use isinstance instead of type comparision
* use # fmt: skip
* use # fmt: skip
* nits
* soem styling changes
* update ci job
* nits isinstance
* more files update
* nits
* more nits
* small nits
* check and format
* revert wrong changes
* actually use formatter instead of checker
* nits
* well docbuilder is overwriting this commit
* revert notebook changes
* try to nuke docbuilder
* style
* fix feature exrtaction test
* remve `indent-width = 4`
* fixup
* more nits
* update the ruff version that we use
* style
* nuke docbuilder styling
* leve the print for detected changes
* nits
* Remove file I/O
Co-authored-by: charliermarsh
<charlie.r.marsh@gmail.com>
* style
* nits
* revert notebook changes
* Add # fmt skip when possible
* Add # fmt skip when possible
* Fix
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* NIts
* more fixes
* fix tapas
* Another way to skip
* Recommended way
* Fix two more fiels
* Remove asynch
Remove asynch
---------
Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>
* Remove the torch main_process_first context manager from TF examples
* Correctly set num_beams=1 in our examples, and add a guard in GenerationConfig.validate()
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Normalize only if needed
* Update examples/pytorch/image-classification/run_image_classification.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* if else in one line
* within block
* one more place, sorry for mess
* import order
* Update examples/pytorch/image-classification/run_image_classification.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update examples/pytorch/image-classification/run_image_classification_no_trainer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
`jnp.array` is a function, not a type:
https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.array.html
so it never makes sense to use `jnp.array` in a type annotation. Presumably the intent was to write `jnp.ndarray` aka `jax.Array`.
Co-authored-by: Peter Hawkins <phawkins@google.com>
* remove SharedDDP as it was drepracated
* apply review suggestion
* make style
* Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer.
* remove the unnecessary conditional statement
* keep the logic of IPEX
* clean code
* mix precision setup & make fixup
---------
Co-authored-by: statelesshz <jihuazhong1@huawei.com>
* refactor: change default block_size
* fix: return tf to origin
* fix: change files to origin
* rebase
* rebase
* rebase
* rebase
* rebase
* rebase
* rebase
* rebase
* refactor: add min block_size to files
* reformat: add min block_size for run_clm tf
* from seq2seq speech
* [Flax] Example script for speech seq2seq
* tests and fixes
* make style
* fix: label padding tokens
* fix: label padding tokens over list
* update ln names for Whisper
* try datasets iter loader
* create readme and append results
* style
* make style
* adjust lr
* use pt dataloader
* make fast
* pin gen max len
* finish
* add pt to requirements for test
* fix pt -> torch
* add accelerate
* Make training args fully immutable
* Working tests, PyTorch
* In test_trainer
* during testing
* Use proper dataclass way
* Fix test
* Another one
* Fix tf
* Lingering slow
* Exception
* Clean
* make run_generation more generic for other devices
* use Accelerate to support any device type it supports.
* make style
* fix error usage of accelerator.prepare_model
* use `PartialState` to make sure everything is running on the right device
---------
Co-authored-by: statelesshz <jihuazhong1@huawei.com>
* Add text classification example
* set the problem type and finetuning task
* ruff reformated
* fix bug for unseting label_to_id for regression
* update README.md
* fixed finetuning task
* update comment
* check if label exists in feature before removing
* add useful logging
* Fix TypeError: Object of type int64 is not JSON serializable
* Convert numpy.float64 and numpy.int64 to float and int for json serialization
* Black reformatted examples/pytorch/token-classification/run_ner_no_trainer.py
* * make style
* An end to accursed version-specific imports
* No more K.is_keras_tensor() either
* Update dependency tables
* Use a cleaner call context function getter
* Add a cap to <2.14
* Add cap to examples requirements too
* Add mms ctc fine tuning
* make style
* More fixes that are needed
* make fix-copies
* make draft for README
* add new file
* move to new file
* make style
* make style
* add quick test
* make style
* make style
* convert numpy array to list before writing to json
per_category_iou and per_category_accuracy are ndarray in the eval_metrics
* code reformatted with make style
* Proposed fix for TF example now running on safetensors.
* Adding more warnings and returning keys.
* Trigger CI
* Trigger CI
---------
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Add run_mim_no_trainer.py draft from #20412
Add parse_args method and copy over other dependencies
Add Method call for sending telemetry
Initialize Accelerator
Make one log on every process
Set seed and Handle repository creation
Initialize dataset and Set validation split
Create Config
Adapt Config
Update Config
Create Feature Extractor
Create model
Set column names
Create transforms
Create mask generator
Create method to preprocess images
Shuffle datasets if needed and set transforms
Create Dataloaders
Add optimizer
Add learning rate scheduler
Prepare everything with our accelerator
Tie weights for TPU training
Recalculate training steps and training epochs
Set accelerator checkpointing steps
Initialize trackers and store configuration
Set total batch size
Fix typo: mlm -> mim
Log info at the start of training
Load in the weights and states from previous save
update the progress_bar if load from checkpoint
Define train loop
Add evaluation loop to training
Add to parse_args method
Push repo to hub
Save accelerator state
End training and save model and feature extractor
Remove unused imports
Fix trailing whitespace
* Update code based on comments, Rename feature_extractor to image_processor
* Fix linting
* Add argument for learning rate
* Add argument for setting number of training epochs
* Remove incorrect logger argument
* Convert max_train_steps to int for tqdm
---------
Co-authored-by: Saad Mahmud <shuvro.mahmud79@gmail.com>
* add: tokenizer training script for TF TPU LM training.
* add: script for preparing the TFRecord shards.
* add: sequence of execution to readme.
* remove limit from the tfrecord shard name.
* Add initial train_model.py
* Add basic training arguments and model init
* Get up to the point of writing the data collator
* Pushing progress so far!
* Complete first draft of model training code
* feat: grouping of texts efficiently.
Co-authored-by: Matt <rocketknight1@gmail.com>
* Add proper masking collator and get training loop working
* fix: things.
* Read sample counts from filenames
* Read sample counts from filenames
* Draft README
* Improve TPU warning
* Use distribute instead of distribute.experimental
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Modularize loading and add MLM probability as arg
* minor refactoring to better use the cli args.
* readme fillup.
* include tpu and inference sections in the readme.
* table of contents.
* parallelize maps.
* polish readme.
* change script name to run_mlm.py
* address PR feedback (round I).
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update run_speech_recognition_ctc.py
Make sure all processes wait until data is saved before loading the processor from the output_dit
* Make sure all processes wait until data is saved before loading the processor from the output_dit
* Update run_speech_recognition_ctc.py
* Update run_speech_recognition_seq2seq.py
* Add initial remote hardware auto-setup docs
* Fix a few typos and clarify some language
* Add missing dependency
* Update self-hosted launch script with Sylvain's comments.
* Formatting.
* Trigger CI
* Style
* add low_cpu_mem_usage option in run_clm.py example which will benefit LLM loading
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update all the example and README under language-modeling
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Override the decoding parameters of Seq2SeqTrainer
* Fix quality
* Fix max_length parameter
* Fix quality
* Remove redundant parameter max_length
* Separate the preprocess of train and validation to use different max_target_length