* Make training args fully immutable
* Working tests, PyTorch
* In test_trainer
* during testing
* Use proper dataclass way
* Fix test
* Another one
* Fix tf
* Lingering slow
* Exception
* Clean
* make run_generation more generic for other devices
* use Accelerate to support any device type it supports.
* make style
* fix error usage of accelerator.prepare_model
* use `PartialState` to make sure everything is running on the right device
---------
Co-authored-by: statelesshz <jihuazhong1@huawei.com>
* Add text classification example
* set the problem type and finetuning task
* ruff reformated
* fix bug for unseting label_to_id for regression
* update README.md
* fixed finetuning task
* update comment
* check if label exists in feature before removing
* add useful logging
* Fix TypeError: Object of type int64 is not JSON serializable
* Convert numpy.float64 and numpy.int64 to float and int for json serialization
* Black reformatted examples/pytorch/token-classification/run_ner_no_trainer.py
* * make style
* Add mms ctc fine tuning
* make style
* More fixes that are needed
* make fix-copies
* make draft for README
* add new file
* move to new file
* make style
* make style
* add quick test
* make style
* make style
* convert numpy array to list before writing to json
per_category_iou and per_category_accuracy are ndarray in the eval_metrics
* code reformatted with make style
* Add run_mim_no_trainer.py draft from #20412
Add parse_args method and copy over other dependencies
Add Method call for sending telemetry
Initialize Accelerator
Make one log on every process
Set seed and Handle repository creation
Initialize dataset and Set validation split
Create Config
Adapt Config
Update Config
Create Feature Extractor
Create model
Set column names
Create transforms
Create mask generator
Create method to preprocess images
Shuffle datasets if needed and set transforms
Create Dataloaders
Add optimizer
Add learning rate scheduler
Prepare everything with our accelerator
Tie weights for TPU training
Recalculate training steps and training epochs
Set accelerator checkpointing steps
Initialize trackers and store configuration
Set total batch size
Fix typo: mlm -> mim
Log info at the start of training
Load in the weights and states from previous save
update the progress_bar if load from checkpoint
Define train loop
Add evaluation loop to training
Add to parse_args method
Push repo to hub
Save accelerator state
End training and save model and feature extractor
Remove unused imports
Fix trailing whitespace
* Update code based on comments, Rename feature_extractor to image_processor
* Fix linting
* Add argument for learning rate
* Add argument for setting number of training epochs
* Remove incorrect logger argument
* Convert max_train_steps to int for tqdm
---------
Co-authored-by: Saad Mahmud <shuvro.mahmud79@gmail.com>
* Update run_speech_recognition_ctc.py
Make sure all processes wait until data is saved before loading the processor from the output_dit
* Make sure all processes wait until data is saved before loading the processor from the output_dit
* Update run_speech_recognition_ctc.py
* Update run_speech_recognition_seq2seq.py
* add low_cpu_mem_usage option in run_clm.py example which will benefit LLM loading
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update all the example and README under language-modeling
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Override the decoding parameters of Seq2SeqTrainer
* Fix quality
* Fix max_length parameter
* Fix quality
* Remove redundant parameter max_length
* Separate the preprocess of train and validation to use different max_target_length
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
[NumPy] Remove references to deprecated NumPy type aliases.
This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.
Co-authored-by: Peter Hawkins <phawkins@google.com>
Co-authored-by: Peter Hawkins <phawkins@google.com>
* [run_clm example] add torch_dtype option for model load.
for BLOOM 175B model. peak memory will reduce about 350G for inference. the weight of BLOOM in model hub is bfloat16
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add other type in option
* fix style
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* merge conflicts
* bos and eos in datacollator
* (temp) hardcode removal of attention mask
* freeze encoder
* actually freeze encoder
* set max length / num beams according to gen kwargs
* (temp) fix tests
* don't pop attn mask
* override return attention mask config from Hub
* Hub configs updated 🤗
* final fixes
* update type annotations
* backward comp
* add: the contrastive search for generaton_utils
* add: testing scripts for contrastive search under examples/text-generation
* update the quality of codes
* revise the docstring; make the generation_contrastive_search.py scripts;
* revise the examples/pytorch/text-generation/run_generation_contrastive_search.py to the auto-APIs format
* revise the necessary documents
* fix: revise the docstring of generation_contrastive_search.py
* Fix the code indentation
* fix: revise the nits and examples in contrastive_search docstring.
* fix the copyright
* delete generation_contrastive_search.py
* revise the logic in contrastive_search
* update the intergration test and the docstring
* run the tests over
* add the slow decorate to the contrastive_search intergrate test
* add more test
* do the style, quality, consistency checks
* fixed typo for SQuAD
* Fixed the preprocess_validation_function function for the labels to reflect the remaining truncated instances
* Rolled back the trainer_seq2seq_qa.py for UnboundLocalError: local variable 'metrics' referenced before assignment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>