* Add minor doc-string change to include hp_name
* fix: missing type-information for kwargs
* fix: missing white-space in hyperparameter_search doc-strings
* add examples subfolder
* mention examples in codeparrot readme
* use Trainer optimizer and scheduler type and add output_dir as argument
* add example of text-to-python and python-to-text models
* mention the downstream examples in the readme
* fix typo
* Update methods to optionally rescale
This is necessary to allow for casting our images / videos to numpy arrays within the feature extractors' call. We want to do this to make sure the behaviour is as expected when flags like are False. If some transformations aren't applied, then the output type can't be unexpected e.g. a list of PIL images instead of numpy arrays.
* Cast images to numpy arrays in call to enable consistent behaviour with different configs
* Remove accidental clip changes
* Update tests to reflect the scaling logic
We write a generic function to handle rescaling of our arrays. In order for the API to be intuitive, we take some factor c and rescale the image values by that. This means, the rescaling done in normalize and to_numpy_array are now done with array * (1/255) instead of array / 255. This leads to small differences in the resulting image. When testing, this was in the order of 1e-8, and so deemed OK
* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)
* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)
* bnb minor modifications
- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* put in one block
- put bash instructions in one block
* update readme
- refactor a bit hardware requirements
* change text a bit
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add link to paper
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update tests/mixed_int8/README.md
* Apply suggestions from code review
* refactor a bit
* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add A6000
* clarify a bit
* remove small part
* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Update run_translation_no_trainer.py
found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint
* fixs `no_decay` and `resume_step` issue
1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
* [fsmt] deal with -100 indices in decoder ids
Fixes: https://github.com/huggingface/transformers/issues/17945
decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index.
For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.
Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.
* style
* Supporting seq2seq models for `bitsandbytes` integration
- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check
* small modification
- tie the weights before looking at tied weights!